Conclusions
Our study focused on exploratory data analysis and the implementation of several statistical tests, both of which have furthered our understanding of Capital Bikeshare, the components that affect its overall usage, and the behavioral differences between casual riders and members.
1 Hypothesis testing
The first hypothesis test proved that a casual user will spend, on average, more time riding the bikes than that of membership holders. Possible implementations CaBi could work on given these results are developing strategies that focus on converting casual riders to members. This would operate not only with the company’s benefit, but with the user as well since long-term spending for casual users would equate to a greater expenditure than if they take part in being a member.
The second hypothesis test demonstrated that all members spend greater time and greater distances traveling electric bikes than the time spent and distance traveled riding classic bikes. This finding not only helps identify the bike preferences of riders, but it also would be in the company’s benefit to increase the stock of electric bikes as, eventually, its prevalence across the D.C metro area. Although a cost-benefit analysis would need to be conducted, we can clearly see this as a limitation on the study as no revenue or pricing data was provided.
The last hypothesis test showed that the type of bike used is dependent on membership status. This makes it vital for the company to take necessary steps in order to make bike-sharing more convenient for members’ use.
2 Linear Regression
Finally the use of Linear Regression showed how some systems are too complex for our chosen setting. Since one of the variables revolves around membership types, it is safe to say that classification would suit the project a lot more than regression, which is something we aim to work on and further develop in the near future.
3 Limitations
Limitations associated with our study involve not being able to evaluate cost benefit analysis because a revenue or pricing feature was not provided for each ride in our dataset. Some columns were dropped in order to “clean the data”, as done with the Bootstrap test. Specifically, the docked bike type was removed because of its bike type misidentification possibility, as this would disrupt the known bike type values while conducting the hypothesis tests. Although only 10% of the data was used, the data used was adjusted by representing the same proportional ratio of data within each column of the initial 20 million row dataset covering 2016 through 2022. Moreover, the study could have been approached with the help of ANOVA models, as the comparison of means with different variable groups could be more significant and help further identify other relationships that are meaningful in a Multiple Linear Regression Model.