Archived MTA Bus Time data, Aug 2014- Oct 2014
As part of MTA's App Quest, 3 Months of MTA Bus Time data was released here.
Archived B63 MTA Bus Time Data
As requested by a number of developers on the MTA's Developer Resources list, we have posted an extract of data archived by the MTA Bus Time system. This first extract contains historical records for the B63 route in Brooklyn from April 3, 2011 through May 3, 2011. Each record in this data set contains, for a single bus, the time of observation, bus location, bus route, next stop, distance from that stop, and other variables described below.
Additional data from Staten Island are not yet available.
The data can be found here. This page describes caveats with respect to this data, and the format of the data itself.
The B63-only GTFS which powers MTA Bus Time and provides a reference baseline for this archive data is available here
Because of the prototype nature of the MTA Bus Time pilot system, it takes some effort to generate these extracts. Future extracts will be posted if demand and applications for this first extract merit the effort.
Caveats
 
There are a number of caveats to this data set.  MTA Bus Time on the B63 is a pilot project, where the gear on the bus and the backend server is exceedingly simple.
- There is no formal integration with the schedule, so trip ID's in this data can not be used at any point to infer whether a bus was early or late. The trip ID's only indicate a particular stopping pattern.
- The data itself, while voluminous, is not perfectly clean.
- This data does not indicate the particular time that a given bus served (or passed) a particular stop. It simply relates, for every observation the server received from a bus, the ID of the next stop on that bus' trip and the distance to that stop. To infer when a bus served a given stop, one can look at consecutive observations where the ID of the next stop changed.
We hope to fix all but the last of these issues in the forthcoming MTA Bus Time implementation on Staten Island (and beyond). Please let us know, via the email list, any issues you find.
Documentation
 
The data comes in a big zipped CSV file, with the following columns.  As much as possible, these columns are named to match the corresponding GTFS values.
- vehicle_id - the 4-digit ID of the bus
- timestamp - the date and time of the observation
- latitude - the latitude of the bus
- longitude - the longitude of the bus
- phase - the phase of the bus in its duty cycle; current extract includes only observations when the bus is inferred to be IN_PROGRESS (i.e. driving on the route) or LAYOVER_DURING (i.e. waiting at a terminal for a trip to begin)
- trip_id - a GTFS trip_id representing the stopping pattern inferred for the given bus at the given time
- direction_id - the GTFS direction_id for the direction the bus is traveling
- trip_headsign - the GTFS destination sign value for the inferred representative trip
- shape_dist_traveled - the distance the bus has traveled (in meters) along the precise geographic route of the inferred representative trip
- stop_id - the GTFS stop_id of the next stop the bus will serve
- stop_sequence - the GTFS stop_sequence of the next stop the bus will serve
- dist_from_stop - the distance of the bus (in meters) from that next stop
For description and hints on how to work with CSV files, please see Wikipedia, or this helpful guide.

