Skip to content
This repository has been archived by the owner on May 20, 2024. It is now read-only.

Latest commit

 

History

History
61 lines (36 loc) · 4.45 KB

README.md

File metadata and controls

61 lines (36 loc) · 4.45 KB

Data_Visualization Final Challenge

Ride Share Analysis

Manrique Vargas [email protected]

Yavuz Sunor [email protected]

  1. Summary
  2. Q1
  3. Q2
  4. Q5
  5. Q6
  6. Q8

Summary

We used a combination of matplotlib and geopandas for data processing. Then we visualized the maps using Carto. We considered this approach the most cost-effective solution.

Q1

Image

For the first question, we've worked on the requests dataset. We did the necessary manipulations using Python in Jupyter Notebook. We changed the timestamp to datetime to see it in hourly way. We calculated succesful trip ratios aggrageting in hourly fashion and plotted the trend for one day period. As you can see, serving rate reaches its peak in the morning rush hours and follows a uniform pattern in other times.

Q2

For the second question, we also used the requests dataset but this time we merged it with manhattan.geojson to visualize on a map. We created a served/not_served column in Jupyter notebook for categorizing trips. We then browsed the merged dataframe to CartoDB as a csv file. We wrote a simple SQL query to filter not_served trips and aggregating trips by each geolocation. As you can see, most of not_served trips concentrated near Eastern Manhattan mostly in Midtown and Central Park area. In terms of temporality, we see early hours in WTC area and late hours near Midtown/Central Park area.

Q5

For the fifth question, we used vehicle_paths dataset. Using Jupyter Notebook, we filtered the data for the number of passengers below and equal 4 and created a geometry column out of Latitude and Longitudes. Then we browsed the dataframe to CartoDB as in the question 2. We applied a very similar SQL query and were able to visualize vehicles in terms of passenger numbers and time of the day. Because the maximum time limited up to 2pm for number of passengers below 4, we can only see a temporal pattern between 5am and 2pm. As you can see, both in terms of passenger numbers and time of the day, the city shows a uniform distribution.

Q6

Image Image

For the sixth question, we calculated the speed of every vehicle. We used pandas to analyze each vehicle individually. To increase the speed of our algorithm, we took subsamples every 5 records. Distance divided by time yields the speed. The plot shows the speeds that are 4.5 standard deviations below the mean speed. This included not only vehicles with a speed of zero (not moving at all), but also we observe some vehicles with very low speed. This approach allows us to account for uncertainty errors in the system that was used to record the location. 4.5 standard devations below the mean speed accounts for many cases where the vehicle might not be moving but the signal fluctuates slightly by a few meters causing the speed to be larger than zeros. We also provide a GIF to better visualize the temporal change per hour and the spatial change using the maps.

Q8

Image Image

The most important information are the paths and the pickups and drop offs. This map allows us to visualize them all. Some points are also shown which are not close to the path trajectory of the vehicle. It might be possible that these points were failed requests. We also provide a GIF to better visualize the temporal change per hour and the spatial change using the maps.

  • Jupyter Notebook here