The final module of the IBM Applied Data Science Specialization course involved using all the knowledge and skills you gained to create your own data capstone project. In this project I decided to analyze neighborhoods in Toronto and I used K-Means Clustering to cluster neighborhoods based on the impact of COVID-19.
I cover all phases in the data science life cycle to resolve a problem. I used the following tools/library in data science:
- Scatterplot
- Pandas, Numpy
- Folium library, including pairplot, heatmap in map view
- Interactive Leaflet Map using Folium
- Google Geocoding APIs
- Foursquare APIs
- K-Means Clustering Algorithm
After all the analysis I obtained the following clusters:
- Cluster 1 (Least Infected)
- Cluster 3 (Considerably Infected)
- Cluster 0 (Highly Infected)
- Cluster 2 (Extremely Infected)
The full report, notebook and powerpoint presentation is attached below and here is the link to my Github.
Full Report
Full Report by Akarsh on Scribd
JupyterNotebook
PowerPoint Presentation
Battle of Neighborhoods by Akarsh on Scribd