Exploratory Data Analysis
This page explores data in the form of plots and visuals to give a better idea about the data that was cleaned.
2 Proportion of ridesper rider status
A positive sign of growth in members over years 2016 to 2020 is seen in Figure 5 above. The fact that CaBi had the highest number of members in 2020 signifies that the number of casual riders, mainly tourists and folks residing in D.C for a few months at maximum, declined significantly due to the imposed lockdown. As a result of the lockdown, we notice a drop in members from 2020 to 2021 by almost 25%! Therefore, the pandemic indubitably affected CaBi’s revenues and has now put them in a period of recovery.
3 Number of start rides per hour of the day by Rider Status (2016-2022)
Figure 6 highlights that there is relatively no activity on CaBi bikes amongst members and casual riders at 3AM and 4AM across the years 2016 to 2022. However, at 8AM we see a spike in rides started for members and not for casual riders, implying that members mainly commute for work using CaBi. Another peak is seen at 5PM, this time for both members and casual riders, which denotes that members mainly commute from their workplaces back home but casual riders set off for a leisurely trip, reinforcing our observation from Figure 6 that casual riders comprise mainly tourists.
4 Ridge plot of log ride duration
|
|
|
|---|
From Figure 7, we now know that not only did frequency of rides reduce in 2020, but so did duration of each ride as the peak of 2020 is slightly shifted left relative to other ridges.
Figure 8 accounts for outliers by using a log-transformation on duration in minutes. The main takeaway from this plot is the heavier tails for hours 10AM to 11PM, implying riders use CaBi for greater durations half the day compared to the other half.
5 Ridge plot of number of start rides per hour of the day
Analogous to Figure 8, Figure 9 above helps us visualize the distribution of the number of rides started over a day across all years and members in our data. At 5AM, there is relatively no activity for CaBi, but once the clock strikes 6AM and riders start their day, the activity increases gradually until it reaches a peak at 8AM. Due to work hours, activity is low from 9AM to 4PM, but we see even greater activity at 5PM and 6PM as riders leave their workplaces. Activity after 6PM starts reducing gradually and the cycle begins again at 6AM the next day.
6 Bar plot for proportions
|
|
|
|---|
Figure 10 above signifies that CaBi members account for approximately 73% of the whole data and casual riders account for 27%.
Figure 11 above encapsulates another categorical variable, bike type. Classic bikes were present since the inception of CaBi and comprise approximately 90% of the data. Electric bikes were introduced in 2020 and because we could not find conclusive information about the meaning of docked bikes, we decided to drop it entirely for statistical analyses.
7 Boxplots for trip duration
From Figure 12, the numeric variables of duration in minutes and distance in miles still had a right-skewed distribution after removing heavy outliers. Because the median is closer to the left of the box (lower duration or distance) and the whisker is shorter on the left end of the box, Their distributions are right-skewed. The median duration is approximately 12 minutes and the median distance covered is approximately one mile among members and casual riders. Therefore, using log- transformations on these features is better for t-Tests as they would then follow normality strictly.
8 Geospatial
The Folium package in Python helped us generate the geospatial visualizations because our data contained latitude and longitude features for start as well as end dates of each trip. Therefore, we created a function that takes in the station address as a string and outputs a heatmap of stations where rides ended. From Figure 13 above, most trips ended around the Rosslyn Metro Station and Dupont Circle, indicating that the Georgetown community uses CaBi as a substitute for the Georgetown University Transportation Shuttle. Moreover, 8000 trips, including 5500 members and 2500 casual riders, were started from 37th & O St NW across 2016-2022.