In the extra time at home I've found myself with during quarantine, I recently discovered Google's transit APIs on their developer's portal. These APIs provide hooks to track public transit in large cities across the US in real time. Many major city's transit systems have taken advantage of this and provide real-time tracking information of their trains, buses, etc. in Google's General Transit Feed Specification (GTFS) format.
GTFS is an open data format for public transportation schedules and associated geographic information. GTFS has an extension called GTFS Realtime, which is a feed that allows public transportation agencies to provide realtime updates about their fleet to application developers. New York City's Metropolitan Transportation Authority (MTA) has created a GTFS Realtime feed for each of its subway lines which are used in the subway stations for the live countdown displays until the next train, and also in GPS applications such as Google Maps to help compute total trip time when a user requests directions between two locations on public transportation. The MTA updates these feeds with fresh data approximately every 30 seconds.
As a pre-pandemic New Yorker, the most frustrating part of my morning commute was walking into the station right as the doors were closing on my train then having to wait 20+ minutes until the next one (the struggles of riding the C train in Brooklyn). As a post-pandemic New Yorker, I'd very much like to limit the amount of time I have to spend in the train station as much as humanly possible once my office opens back up and I am no longer working from home.
Since my Ultra96 has established itself as a permanent fixture on my home network with my various other SDR web applications, I decided to add another web application I could pull up on my phone that queries the realtime feed for the A/C/E line and parses out the specific alerts and tracking information for the C line so I can time when I leave my apartment to match when the northbound C train will arrive in my local station.
With a project idea in mind, there are main steps to developing a web app on the Ultra96 FPGA development board:
- Create the custom project (usually a Python script).
- Create the web app Python back end that integrates the custom project into the Ultra96's webserver.
- Create the web app HTML front end as a user interface web page for the project.
To access these realtime feeds for the subway lines, the MTA requires an API key generated from their developer portal. This is in an effort to keep the amount of traffic any of the feeds see at any point in time to a reasonable level. To comply with the MTA's usage guidelines, it's important to note that this project runs on the Ultra96's webserver and not on the MTA's server, it is in no way licensed by the MTA, it's not guaranteed accurate/timely, and mostly importantly this project is just for fun. If you plan on recreating this project/expanding on it using the MTA's live feeds, be sure you understand the MTA's Usage Rules & Guidelines.
For a little background before describing how exactly I set up my web app on the Ultra96 to query these realtime feeds, GTFS Realtime is based on the Protocol Buffers (proto2) syntax. The protocol buffer language was developed originally by Google as an optimized method for storing and interchanging structured information. It uses an interface description language (IDL) to describe the structure of a given data set, then it implements a program that generates source code from that description, which is then used to generate or parse a stream of bytes that represents the structured data. Overall, protocol buffers are a more straightforward solution for serializing and retrieving structured data compared to serializing the data to XML or Python pickling.
GTFS Realtime feeds contain three main Feed Entities (aka data types): trip updates, service alerts, and vehicle positions. These feed entities can be combined in any desired manner to create custom feeds. The MTA has created their own feeds in this way to add their own custom extensions to GTFS Realtime. Feeds are served via HTTP, updated in 30 second intervals as I mentioned before, and since the feed's output file is a regular binary file, any type of webserver can host and serve the file. This means that the Flask-based webserver of the Ultra96 can send a valid HTTP GET request to return the feed data for processing on the Ultra96.
The New York City Transit Subway has 12 feeds for all of the various data sets associated with the movement/status of the subway trains: FeedMessage, FeedHeader, NyctFeedHeader, TripReplacementPeriod, TripUpdate, TripDescriptor, NyctTripDescriptor, StopTimeUpdate, NyctStopTimeUpdate, StopTimeEvent, VehiclePosition, and Alert. For this project I chose to use the FeedMessage feed since it is the full dataset referenced by any of the other NYCT's (NYC Transit) extension feeds (NyctFeedHeader, NyctTripDescriptor, and NyctStopTimeUpdate). For a description of each of the 12 feeds, see the attached document from the MTA's developer portal titled 'GTFS-realtime Reference for the New York City Subway'.
The custom project on the Ultra96 will be a Python script querying the MTA realtime feed for the A/C/E subway lines and returning the arrival times for the Manhattan-bound (northbound) C trains at a station specified by the user at runtime. The corresponding custom web page will take this array of arrival times and list them for the user so the user can see not only when the next train is currently set to arrive, but all of the subsequent arrival times of the trains currently running on the C line that have not yet stopped at the specified station.
When the realtime feed for the A/C/E line is queried, it returns data for all three of the lines and its up to the Python script on the Ultra96 to parse through it and only pass on the desired data to the web page. For now, I've hardcoded it to only look for trip update entities for the Manhattan bound C trains since I live far enough into Brooklyn that I rarely ever take a Brooklyn/South bound C train from my local station.
Overall, there are two things needed from the MTA's developer portal prior to starting development on the Ultra96: the link for realtime feed for the A/C/E lines, and an API access key to be able to query the feed. After creating MTA developer account here and logging in, I generated my own API access key under the 'Access Key' menu option at the top of the screen where I was asked to fill out some basic contact information in exchange for the key (in the event the terms of use are violated, the MTA will deactivate the key). Then under the Feeds menu option and Subway Realtime Feeds submenu, I copied the link for the A/C/E lines.
To get started on the Ultra96 itself, the GTFS Realtime bindings module for Python need to be installed using the Python package manager, pip:
pip3 install --upgrade gtfs-realtime-bindings
While the Python modules for the HTTP client and sending HTTP/1.1 requests are already installed on the Ultra96 for the webserver, I found they needed to be upgraded/updated before I could get it to work with GTFS Realtime:
pip3 install --upgrade urllib
pip3 install --upgrade requests
On the Custom Content web page of the Ultra96, I created a new custom project which opened the text editor on the Ultra96 to write the Python script. The core function, c_train(), called by the main function, sends a simple GET request to the MTA realtime feed using my unique API key. Upon return of the feed, the script parses through each entity looking for route_id tag for the C trains. Each of these entities are further filtered by checking the trip_id for the "N" tag indicating that it is a Northbound train. Each of these feed entities contain a StopTimeUpdate sub-entity which houses an array of data corresponding to all the stations that train has yet to stop at and the current estimated arrival times of that train at those stations. Each station has a stop_id tag, which I found the entire list of on this site to translate each of the stop_ids to the corresponding station name (that I created the sub-function, stationId_to_stationName(), for).
Personally, I find that instead of listing the specific times that each train is currently set to arrive, it's a bit clearer for me to see a list of countdown times. So instead of displaying that the next train will arrive at 2:30pm for example, display that the next train will arrive in 25 minutes. This is how the countdown clock displays currently work in the stations themselves, so maybe that's why I'm biased towards this output data format. The script simply calculates the countdown time from the timestamp by using the datetime Python module to query the current time then subtracts the current time from the estimated arrival time to get the time delta to use as the countdown arrival time. An array of these countdown arrival times are returned to the main function.
Subway tracking project Python code:
import os
import re
import sys
import time
import pytz
import json
import urllib
import requests
import datetime
import subprocess
from pytz import timezone
from google.transit import gtfs_realtime_pb2
def stationId_to_stationName(stationId):
if (stationId == "A09N"):
stationName = "168 St"
elif (stationId == "A10N"):
stationName = "163 St - Amsterdam Av"
elif (stationId == "A11N"):
stationName = "155 St"
elif (stationId == "A12N"):
stationName = "145 St"
elif (stationId == "A14N"):
stationName = "135 St"
elif (stationId == "A15N"):
stationName = "125 St"
elif (stationId == "A16N"):
stationName = "116 St"
elif (stationId == "A17N"):
stationName = "Cathedral Pkwy - 110 St"
elif (stationId == "A18N"):
stationName = "118 St"
elif (stationId == "A19N"):
stationName = "96 St"
elif (stationId == "A20N"):
stationName = "86 St"
elif (stationId == "A21N"):
stationName = "81 St - Museum of Natural History"
elif (stationId == "A22N"):
stationName = "72 St"
elif (stationId == "A24N"):
stationName = "59 St - Columbus Circle"
elif (stationId == "A25N"):
stationName = "50 St"
elif (stationId == "A27N"):
stationName = "42 St - Port Authority Bus Terminal"
elif (stationId == "A28N"):
stationName = "34 St - Penn Station"
elif (stationId == "A30N"):
stationName = "23 St"
elif (stationId == "A31N"):
stationName = "14 St"
elif (stationId == "A32N"):
stationName = "W 4 St - Wash Sq"
elif (stationId == "A33N"):
stationName = "Spring St"
elif (stationId == "A34N"):
stationName = "Canal St"
elif (stationId == "A36N"):
stationName = "Chambers St"
elif (stationId == "A38N"):
stationName = "Fulton St"
elif (stationId == "A40N"):
stationName = "High St"
elif (stationId == "A41N"):
stationName = "Jay St - MetroTech"
elif (stationId == "A42N"):
stationName = "Hoyt - Schermerhorn Sts"
elif (stationId == "A43N"):
stationName = "Lafayette Av"
elif (stationId == "A44N"):
stationName = "Clinton-Washington Avs"
elif (stationId == "A45N"):
stationName = "Franklin Av"
elif (stationId == "A46N"):
stationName = "Nostrand Av"
elif (stationId == "A47N"):
stationName = "Kingston - Throop Avs"
elif (stationId == "A48N"):
stationName = "Utica Av"
elif (stationId == "A49N"):
stationName = "Ralph Av"
elif (stationId == "A50N"):
stationName = "Rockaway Av"
elif (stationId == "A51N"):
stationName = "Broadway Jct"
elif (stationId == "A52N"):
stationName = "Liberty Av"
elif (stationId == "A53N"):
stationName = "Van Siclen Av"
elif (stationId == "A54N"):
stationName = "Shepherd Av"
elif (stationId == "A55N"):
stationName = "Euclid Av"
else:
stationName = "Invalid C line station ID"
#print(stationName)
return stationName
def c_train(StationIdRequested):
#this just sets a default station in case passed argument is null
if (StationIdRequested == ""):
StationIdRequested == "A44N"
headers = {
"x-api-key": 'YOUR GENERATED API KEY FROM THE MTA HERE'
}
feed = gtfs_realtime_pb2.FeedMessage()
response = urllib.request.Request('https://api-endpoint.mta.info/Dataservice/mtagtfsfeeds/nyct%2Fgtfs-ace', headers=headers)
xml = urllib.request.urlopen(response)
feed.ParseFromString(xml.read())
arrival_times = []
for entity in feed.entity:
if (entity.trip_update.trip.route_id == "C"):
stopCntr = 0
dir_str = entity.trip_update.trip.trip_id
direction = dir_str.find("N")
if (direction != -1):
for x in entity.trip_update.stop_time_update:
StationId = entity.trip_update.stop_time_update[stopCntr].stop_id
if (StationId == StationIdRequested):
# Train has yet to stop at Clinton-Washington Avs
stationArr = entity.trip_update.stop_time_update[stopCntr].arrival
stationDpt = entity.trip_update.stop_time_update[stopCntr].departure
stationName = stationId_to_stationName(StationId)
arrString = str(stationArr)
arrNum = re.findall(r'\d+', arrString)
arrFloat = float(arrNum[0])
dptString = str(stationDpt)
dptNum = re.findall(r'\d+', dptString)
dptFloat = float(dptNum[0])
arrivalTime = datetime.datetime.utcfromtimestamp(arrFloat)
departTime = datetime.datetime.utcfromtimestamp(dptFloat)
currTime = datetime.datetime.now(datetime.timezone.utc)
currTimeNaive = currTime.replace(tzinfo=None)
time_delta = arrivalTime - currTimeNaive
total_seconds = time_delta.total_seconds()
minutes = total_seconds/60
#print statement for debugging purposes
#print('Manhattan bound C train:', entity.trip_update.trip.trip_id, 'arriving at', stationName,'in', minutes, 'minutes')
arrival_times.append(minutes)
stopCntr = stopCntr + 1
return arrival_times
def main(arg):
arrivalTimes = c_train(arg)
print(arrivalTimes)
if __name__ == "__main__":
main(sys.argv[1])
Returning to the Custom Content main web page, I selected the Edit Webapp option and then created a new backend and frontend for the custom web page that runs the project script. It is important to note that the file names of the frontend and backend files must be the same so the template on the Ultra96 can generate the new webserver Python script properly.
The Python backend will be responsible for handling the requests of the front end and passing the requested station ID as an argument to the main function of the Python script. The project script pulls and parses the C line countdown arrival times then it returns to the backend and the backend passes the data to the frontend to display to the user.
Python Backend code:
@app.route("/c_train_tracking.html", methods=["GET", "POST"])
def c_train_tracking():
if request.method == "POST":
station = request.form.get("stations", None)
if station!=None:
proc = subprocess.Popen('python3 /usr/share/ultra96-startup-pages/webapp/templates/CustomContent/custom/c_train_tracking.py '+station+'' ,stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
output,err = proc.communicate()
if(station == "A41N"):
name = "Jay St - MetroTech"
elif(station == "A42N"):
name = "Hoyt - Schermerhorn Sts"
elif(station == "A43N"):
name = "Lafayette Av"
elif(station == "A44N"):
name = "Clinton - Washington Avs"
return render_template("CustomContent/custom_front_end/c_train_tracking.html", output=output, station=station, name=name)
return render_template("CustomContent/custom_front_end/c_train_tracking.html")
The HTML front end serves as the user interface where a drop down menu allows for a user to select the desired station to query for all future train countdown to arrival times in minutes (and yes, I do plan to code the rest of the C train stations here in the future as I did in the project Python script).
HTML Frontend code:
{% extends "Default/default.html" %}
{% block content %}
<div class="page-header">
<h1 class="display-4"><b>{% block title %}C Train Tracker{% endblock %}</b></h1>
</div>
<!-- Start adding your code below here -->
<h1>Station Wait Times for Manhattan-bound C Trains</h1>
<meta http-equiv="explore" content="B" />
<p>Select station from drop down:</p>
<form id="form1" action="/c_train_tracking.html" method="POST" enctype="multipart/form-data">
<select id="stations_dropdown" name="stations">
<option disabled="disabled" selected="selected" value="A44N">Select Station</option>
<option value="A41N">Jay St - MetroTech</option> <!-- A41N -->
<option value="A42N">Hoyt - Schermerhorn Sts</option> <!-- A42N -->
<option value="A43N">Lafayette Av</option> <!-- A43N -->
<option value="A44N">Clinton - Washington Avs</option> <!-- A44N -->
</select>
<input type="submit" value="Submit">
</form>
<br><br>
<h2>Arrival Times for {{ name }}</h2>
<p id="times"></p>
<script>
var myObj, i, x = "";
myObj = {
"arrivalTimes":{{ output }}
};
for (i in myObj.arrivalTimes) {
x += myObj.arrivalTimes[i] + " minutes<br>";
}
document.getElementById("times").innerHTML = x;
</script>
<!-- Stop adding your code here -->
{% endblock %}
After completing the frontend and backend and returning to the Reload Webapp page, I checked the option to include my new page and then clicked the button to reload the webapp.
Once the Ultra96 had reboot and connected itself back to my Wi-Fi, this was the result of my first query for one of the stations in the drop down menu:
SUCCESS.
It's really awesome when a project turns out to be just as practical as it was fun, and I am excited to send a link to guests while they are connected to my Wi-Fi (whenever life is normal enough that friends can visit again) for them to use when they are ready to head home to time their walk from my apartment to the local station. Especially when it's later in the evening and the trains change to their more sporadic overnight schedule.
This will also give me some extra structure to my morning routine post-quarantine as I'll have a way to gauge exactly how slow or fast I need to be to catch a train at my station.
The only glitch I found was that since the MTA updates the feeds around every thirty seconds, it can take up to about that same amount of time for the underlying project Python script to run to query the feed. So I noticed that every so often, it would take up to 30 seconds for the arrival times to refresh after hitting the submit button for a selected station.
There is plenty of room to expand on this project. Eventually I'll make the hardcoded parameters configurable as well such as the direction (northbound vs southbound) and which train is queried (especially since the express A train takes over the local C train stops late at night). Until next time....
Comments