Identifying Low-Performing Bus Routes and High Poverty Density in New York City

This project used MTA Bus Time data to analyze bus route speed of the entire bus system and couple those findings with census data of poverty density. This method could be utilized to identify areas where bus route performance is low for individuals that may depend on buses the most.

On any given, day, over five thousand buses are in service in NYC. For three months in the fall of 2014, every data point of each bus location and distance along its route was recorded. This dataset was composed of over five million data points. I used Python to determine the average speed of each route based on each unique route completion by every vehicle in the fleet over a 24 hour period on Tuesday, 08/05/2014. Then, I joined these average speeds to the MTA Bus Route Shapefile found here at the Newman Library.

These route speed values were then to census tracts with poverty density values, and I created a composite score ranging from high speed buses and low poverty density to low speed buses and high poverty density. The M4, M3, M101, Q66, and BX1 were routes that consistently served higher poverty density areas, and maintained the lowest average speed.

Sample Python Code

data0.loc[data0.distance_along_trip < 100] = data0.loc[data0.distance_along_trip<100].drop_duplicates (subset = ['distance_along_trip', 'inferred_trip_id'], keep = 'last')

data0.loc[data0.distance_along_trip > 100] = data0.loc[data0.distance_along_trip > 100].drop_duplicates(subset = ['distance_along_trip', 'inferred_trip_id'], keep = 'first')

tripDistance = data1.groupby('inferred_trip_id').distance_along_trip.max()

td = pd.DataFrame(tripDistance)

td["tid"] = td.index

import re

a = "MTA NYCT_B44+"

r = r'^.*_([^_]*)$'

a = re.sub(r, '\g<1>', a)

data10['inferred_route_id'] = data10['inferred_route_id'].str.replace(r'^.*_([^_]*)$', '\g<1>')