CabiGeoStats
  • Home
  • Code
  • Data
    • Data Gathering
    • Data Cleaning
    • EDA
  • Statistical Method and Results
    • Hypothesis Testing
    • Clustering
    • Linear Regression
  • Conclusions
  • Works Cited

On this page

  • 1 Dataset screenshot
  • 2 Details of useful columns

Data Gathering

This page talks about the data gathering techniques that have been used to collect raw data that would be further used in developing the ML model.



The data was found and collected from the CaBi Amazon S3 Bucket. All the datasets were organized yearly from 2016 to September 2022 and, for each year, the CSV files were either present quarterly or monthly. For our analysis, we appended geospatial features to the yearly datasets not containing those features (2016-March 2020) by sampling on the feature of Station ID and then concatenated yearly data frames into a combined CSV file that we used for Descriptive and Inferential Statistics. More information about each of these steps is described in detail in the Data Cleaning tab.

1 Dataset screenshot


2 Details of useful columns

Source Code
---
title: Data Gathering
---

<b>This page talks about the data gathering techniques that have been used to collect raw data that would be further used in developing the ML model.</b>

```{=html}
<table cellspacing="10" cellpadding="10">
    <tbody>
        <tr>
            <th>
                <img src="../../images/data_gathering/Data-Collection.png"
                    style="width:450px;height:300px;" align="right">
            </th>


            <th>
                <img src="../../images/data_gathering/data-gathering.jpeg"
                    style="width:435px;height:300px;" align="center">
            </th>

        </tr>
    </tbody>
</table>
<br><br>
```


The data was found and collected from the CaBi Amazon S3 Bucket. All the datasets were organized yearly from 2016 to September 2022 and, for each year, the CSV files were either present quarterly or monthly. For our analysis, we appended geospatial features to the yearly datasets not containing those features (2016-March 2020) by sampling on the feature of Station ID and then concatenated yearly data frames into a combined CSV file that we used for Descriptive and Inferential Statistics. More information about each of these steps is described in detail in the Data Cleaning tab.

## Dataset screenshot

<img src="../../images/data_gathering/combined_data.png" style="width:1000px;" align="center"><br>

## Details of useful columns

<img src="../../images/data_gathering/Table1.png" style="width:1000px;" align="center">