CabiGeoStats
  • Home
  • Code
  • Data
    • Data Gathering
    • Data Cleaning
    • EDA
  • Statistical Method and Results
    • Hypothesis Testing
    • Clustering
    • Linear Regression
  • Conclusions
  • Works Cited

On this page

  • 1 Two sample t-test (Membership status v Bike ride duration)
  • 2 Two sample t-test (Bike type v Bike ride duration)
  • 3 Bootstrap test
  • 4 Chi Squared test

Hypothesis Testing

Four distinct hypothesis tests were completed throughout the project in the order presented below:

❖ 2 Two Sample t-tests

❖ 1 Bootstrap test

❖ 1 Chi squared test

Each test, its respective hypotheses, and its results are explained below:

1 Two sample t-test (Membership status v Bike ride duration)

The first hypothesis test included an analysis of seeing what statistical inference can be extracted from the comparison of Membership status and Bike ride duration result in. A t-test was chosen because its measurement is a direct result of comparing the means of the two populations that are set as variables, as in this case between membership status and bike ride duration. This comparison was chosen specifically to identify which members are using the bikes the most and as a direct result identify any decisions that could be taken to address this relationship. The null hypothesis and alternative hypothesis that were determined are listed below:

H0: The mean trip duration of casual riders is the same as that of members across 2016-2022.
H1: The mean trip duration of casual riders is more than that of members across 2016-2022.

With these entities set, the statistical inferences gathered from the test can be used to help identify which statement we can accept, above in Table 2, since the p-value is less than 0.05, we ran reject the null hypothesis with 95% confidence and conclude that that Mean trip duration of casual riders is Mean trip duration of members, which can be visualized in Figure 14 below:

2 Two sample t-test (Bike type v Bike ride duration)

The second comparison was again conducted using another two sample t-Test. Its relation included bike type and bike ride duration as we sought to investigate what ways users would prefer when commuting across the local area. Although it seems that the company seemed to overwhelmingly have classic bikes in stock (89%) in our exploratory analysis as depicted with Figure 11, it was advisable to make this comparison to see what distinction can be drawn if there are so many classic bikes compared to electric bikes. The null hypothesis and alternative hypothesis that were determined are listed below:

H0: The mean trip duration on classic bikes is the same as that on electric bikes.
H1: The mean trip duration on classic bikes is not the same as that on electric bikes.

As Table 3 above shows, the p-value is less than 0.05, with that being said, we can reject the null hypothesis with 95% confidence and conclude that the mean trip duration on electric bikes is greater than the mean trip duration on classic bikes, which can be visualized in Figure 15 below:

3 Bootstrap test

With the third hypothesis test, it was chosen to conduct a bootstrap comparing the distance traveled with the type of bike used by the user. The motivation behind this test came from Figure 12. Its findings show that most bikes traveled at a median of 0.93 miles with outliers removed. This was the initial outlook based on all bikes used, but a distinction needed to be made to see which type of bike was used the most and in order to find that, the bootstrap method allowed the variables to be sampled without replacement and create a distribution that would imitate the data. The null hypothesis and alternative hypothesis that were determined are listed below:

H0: The mean distance covered per trip on classic bikes is the same as that on electric bikes.
H1: The mean distance covered per trip on classic bikes is not the same as that on electric bikes.

After sampling, as seen in Table 4 above, it was found that 0 is not within the confidence interval, and we can further reject the null hypothesis and conclude the mean distance traveled on electric bikes is greater than the mean distance traveled on classic bikes. Furthermore, in Figure 16, the classic bike mean distance traveled from the sampled distribution was around 1.116 miles whereas the electric bike sample distribution was much larger at 1.540 miles. Both distributions seemed to follow a normal gaussian distribution, yet the electric bike was more representative of that statement.

4 Chi Squared test

For the last hypothesis test, a chi square test of independence was performed to determine the relationship between bike rider status and bike type. This relationship has many implications as riders may be more inclined to use specific bikes based on their personal needs. The chi-square test can show a level of interdependence, which was why this test was examined. The chi-square test looks to evaluate experimental and theoretical values where the sum of all those differences will be compared to the test statistic. The null hypothesis and alternative hypothesis that were determined are listed below:

H0: Bike rider status and bike type used are independent.
H1: Bike rider status and bike type used are dependent.

The chi-square test value was 105,876 highlighting the difference between the observed and expected frequencies of the outcome, signifying how well the sample data matches the known characteristics of the larger population. The statistical results also presented us with a p-value less than 0.05 where we could reject the null hypothesis and conclude with 95% confidence that Bike Type and Rider Status are dependent.

Source Code
---
title: Hypothesis Testing
---

Four distinct hypothesis tests were completed throughout the project in the order presented below:

❖ 2 Two Sample t-tests

❖ 1 Bootstrap test

❖ 1 Chi squared test

Each test, its respective hypotheses, and its results are explained below:

## Two sample t-test (Membership status v Bike ride duration)

The first hypothesis test included an analysis of seeing what statistical inference can be extracted from the comparison of Membership status and Bike ride duration result in. A t-test was chosen because its measurement is a direct result of comparing the means of the two populations that are set as variables, as in this case between membership status and bike ride duration. This comparison was chosen specifically to identify which members are using the bikes the most and as a direct result identify any decisions that could be taken to address this relationship. The null hypothesis and alternative hypothesis that were determined are listed below:

<b>H0: The mean trip duration of casual riders is the same as that of members across 2016-2022.</b><br>
<b>H1: The mean trip duration of casual riders is more than that of members across 2016-2022.</b>

<img src="../../images/methods/Hypothesis Testing/Table2.png" style="width:1000px;" align="center">

With these entities set, the statistical inferences gathered from the test can be used to help identify which statement we can accept, above in Table 2, since the p-value is less than 0.05, we ran reject the null hypothesis with 95% confidence and conclude that that Mean trip duration of casual riders is Mean trip duration of members, which can be visualized in Figure 14 below:

<img src="../../images/methods/Hypothesis Testing/Fig14.png" style="width:1000px;" align="center">

## Two sample t-test (Bike type v Bike ride duration)

The second comparison was again conducted using another two sample t-Test. Its relation included bike type and bike ride duration as we sought to investigate what ways users would prefer when commuting across the local area. Although it seems that the company seemed to overwhelmingly have classic bikes in stock (89%) in our exploratory analysis as depicted with Figure 11, it was advisable to make this comparison to see what distinction can be drawn if there are so many classic bikes compared to electric bikes. The null hypothesis and alternative hypothesis that were determined are listed below:

<b>H0: The mean trip duration on classic bikes is the same as that on electric bikes.</b><br>
<b>H1: The mean trip duration on classic bikes is not the same as that on electric bikes.</b>

<img src="../../images/methods/Hypothesis Testing/Table3.png" style="width:1000px;" align="center">

As Table 3 above shows, the p-value is less than 0.05, with that being said, we can reject the null hypothesis with 95% confidence and conclude that the mean trip duration on electric bikes is greater than the mean trip duration on classic bikes, which can be visualized in Figure 15 below:

<img src="../../images/methods/Hypothesis Testing/Fig15.png" style="width:1000px;" align="center">

## Bootstrap test

With the third hypothesis test, it was chosen to conduct a bootstrap comparing the distance traveled with the type of bike used by the user. The motivation behind this test came from Figure 12. Its findings show that most bikes traveled at a median of 0.93 miles with outliers removed. This was the initial outlook based on all bikes used, but a distinction needed to be made to see which type of bike was used the most and in order to find that, the bootstrap method allowed the variables to be sampled without replacement and create a distribution that would imitate the data. The null hypothesis and alternative hypothesis that were determined are listed below:

<b>H0: The mean distance covered per trip on classic bikes is the same as that on electric bikes.</b><br>
<b>H1: The mean distance covered per trip on classic bikes is not the same as that on electric bikes.</b>

<img src="../../images/methods/Hypothesis Testing/Fig16.png" style="width:1000px;" align="center">

<img src="../../images/methods/Hypothesis Testing/Table4.png" style="width:1000px;" align="center">

After sampling, as seen in Table 4 above, it was found that 0 is not within the confidence interval, and we can further reject the null hypothesis and conclude the mean distance traveled on electric bikes is greater than the mean distance traveled on classic bikes. Furthermore, in Figure 16, the classic bike mean distance traveled from the sampled distribution was around 1.116 miles whereas the electric bike sample distribution was much larger at 1.540 miles. Both distributions seemed to follow a normal gaussian distribution, yet the electric bike was more representative of that statement.

<img src="../../images/methods/Hypothesis Testing/Fig17.png" style="width:1000px;" align="center">

## Chi Squared test

For the last hypothesis test, a chi square test of independence was performed to determine the relationship between bike rider status and bike type. This relationship has many implications as riders may be more inclined to use specific bikes based on their personal needs. The chi-square test can show a level of interdependence, which was why this test was examined. The chi-square test looks to evaluate experimental and theoretical values where the sum of all those differences will be compared to the test statistic. The null hypothesis and alternative hypothesis that were determined are listed below:

<b>H0: Bike rider status and bike type used are independent.</b><br>
<b>H1: Bike rider status and bike type used are dependent.</b>

<img src="../../images/methods/Hypothesis Testing/Table5.png" style="width:1000px;" align="center">

The chi-square test value was 105,876 highlighting the difference between the observed and expected frequencies of the outcome, signifying how well the sample data matches the known characteristics of the larger population. The statistical results also presented us with a p-value less than 0.05 where we could reject the null hypothesis and conclude with 95% confidence that Bike Type and Rider Status are dependent.

<img src="../../images/methods/Hypothesis Testing/Fig18.png" style="width:1000px;" align="center">