Background
Have you ever waited for a bus too long then ended up finding it without any free seats and you can't take it ?!
Our solution - Get Me There - uses IOT, Data Analytics, Machine Learning, Mobile App with Cloud Computing to help bus passengers get accurate estimates of arrival times and current number of available seats in each specific bus.
The Problem, The Need
For many people all over the world bus is one of the most important ways of transportation in cities. Everybody needs to reach his destination in time with a busy schedule waiting for him. Passenger needs to know how much he has to wait for a bus, will the bus he is waiting for has a free seat for him and finally should he wait or look for a workaround to reach in-time.
On the other hand, bus service companies need to collect all service activities and customer requests to build a robust and rich analytics platform, to help faster and appropriate decisions.
The Solution: Get Me There - BITS Solution
The Solution consists of the following components:
- The IOT device - the data collection end point: a Bus equipped with Raspberry PI 3 board and sensors to count number of passengers getting IN/OUT each bus.
- Microsoft Azure IOT hub: will communicate with IOT device (Raspberry PI 3) and receives telemetry.
- Azure Stream Analytics: by which the received data will be packed into tables .
- Azure Storage: tables then will stored in this storage to be used later in the following modules.
- Azure API App (Web Service): provide the mobile app with information like estimated arrival time, estimated travel time (using Google Maps APIs) and number of free seats.
- Mobile App: the passenger's interface to use the solution.
- Hadoop Cluster / Hive for data analytics platform
- MS Power BI for desktop reports
In this video we demonstrate a complete transaction from the first trigger when a bus open the door at certain stop "1111" and then 6 passengers get into the bus and one passenger comes out, all this is simulated by push buttons on the bread board - however in reality two beam sensors installed at bus doors can be used so that whenever a passenger pass through them we can detect wither he is getting in or out - and then door closed which triggers a procedure to send collected data to the IoT hub that records this activity in the cloud storage.
When a request is submitted on the mobile app that requests a bus from a certain stop to another one, the API app detects that same bus is passing by those stops, and checks its availability and calculate number of available seats according to all previous stops in the same journey, then using google APIs calculate the forecasted duration for the bus to arrive and the duration estimated to get to the requested destination, and send it back to mobile app.
We used five tools in the demo:
- API App portal (Web Service) to show number of requests received
- IoT hub portal, to show number of messages received
- Visual Studio to show where the message sent to the IoT hub
- Android Studion emulator, to run the mobile app
- MS Azure Storage Explorer, to monitor data records
Now lets explore details of each our solution components, why we use it, and how to replicate it.
The IOT deviceThe target version of the IOT device is to implement a Raspberry Pi 3 board in each bus. Raspberry Pi will use beam sensors to count number of passengers getting IN/OUT at each bus stop, A GPS sensor to get bus exact location (not used in this prototype due to time constraints. However, the solution is ready to integrate GPS sensors) and door open/close sensor.
For simulation we used the following circuit to generate data about passengers getting IN/OUT the bus and bus's door open/close status. Every press on the "passenger in" is simulating a passenger getting in the bus, every press on the "passenger out" is simulating a passenger getting out the bus and every press on "bus door open/close" is simulating that bus reaching/leaving a bus stop
Each Raspberry Pi 3 will use Windows 10 IOT Core. following are steps to install the OS on the Raspberry Pi:
- On PC/laptop, run "Windows IOT Core Dashboard" software got it from here.
- Insert Raspberry PI3 SD card into PC's card reader. then, choose "Set up a new device" tab.
- Fill form as below screen Image ..
- After install Win10 IOT core on the SD card, insert SD card in the Raspberry's card reader.
- Power on Raspberry and connect it with to the network Ethernet cable.
- On your PC/laptop, check Windows IOT Core Dashboard. Raspberry should be seen on "My Devices" tab as follows ..
In this part we will configure Azure different components that will be used through project. First step is to create Azure account. Microsoft offer a free subscription for one month. Thanks to Microsoft :)
Microsoft verify your email and credit card. Please note you can't create another free account using the same email, credit card.
Configuration for all components are stright forword and you can find detailed and clear steps in MSDN
Notes:
- while creating of all Azure resources ensure to choose the same "Location"
Resource Group
follow below screens to create new resource group
Microsoft IOT Hub
follow below screens to create new IOT hub, for More information check below link
https://github.com/Azure/azure-iot-sdks/blob/master/doc/setup_iothub.md
Microsoft Storage account
follow below screens to create new Storage Account
Stream Analytics job
follow below screens to create new Stream Analytics job, This job reads telemetry from IOT hub and pack data in "BusJourneyInfo" table
Bus Streaming Analytics query
SELECT
*
INTO
BusInfo
FROM
BusInfoStream
don't forget to start the job to ensure that received telemetry will be saved in the table.
No need to create table to save data, analytics job reads the telemetry from IOT hub and if output table isn't exist it will create it for you. Fields will be the same as fields sent to IOT hub. This table will be created in the storage area and can be accessed using "Microsoft Azure Storage Explorer".
Create Devices
Follow steps in below link to define the Raspberry Pi devices. Each Bus should has one raspberry and each raspberry should be defined using "Device Explorer"
Note: You can get "IOT hub connection string" from IOT hub shared access polices as below screen
- Open VS 2015. Select file Menu -> new Project -> Visual C# -> Windows IOT Core -> Background Application (IOT)
- Right click on project name in solution explorer -> "Manage NuGet Packages"
Download both "Microsoft.Azure.Devices" & "Microsoft.Azure.Devices.Client" packages
- Right click on References in solution explorer -> Add reference. Then select from Universal windows -> Extensions -> "Windows IOT Extensions for UWP "
- Code can be found in below link
https://github.com/mrahman4/BITSCode
- Compile code and deploy it on Raspberry. From Debug menu -> BITSDevice Properties -> from Debug tab choose
Target device to be Remote machine
write the name of Raspberry in Remote machine box and click find and then in remote connection dialog choose select
Note: Raspberry should be connected at this time to internet through WiFi and at the same time connected to the breadboard and sensors
In this part of the project, we simulate the behavior of certain bus during a day. We assume that this bus has a route consists of 10 bus-stops.
static int MAX_NOF_STOPS_1 = 9; // (Number of stops - 1) in this route
static int[] mStations_Array = { 1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888, 9999, 1234};
int m_iCurrentStationID = -1; //Current station index in the route
Bus move forward from first stop till the last one then return backward to reach the first stop again to complete one journey.
static int DIRECTION_FORWARD = 1 ;
static int DIRECTION_BACKWORD = -1;
int m_iDirection = DIRECTION_FORWARD;
Bus makes 5 Journeys during the day. Each journey can have different driver.
static int MAX_NOF_JOURNIES = 5;
static int[,] m_JourniesArray = new int[5, 2] {{ 401, 1 }, { 402, 1}, { 403, 2}, { 404, 2}, { 405, 2}} ;
int m_iJourneyIndex = -1;
Some information is fixed regarding one bus however its changed from bus to another, such Line ID which represent bus route ID and Bus ID as many buses can serve one route
static int LINE_ID = 10;
static int BUS_ID = 610;
In Prototype and when bus arrive to certain bus-stop, the door push button will be pressed to indicate that door is opened.
To indicate that one passenger moves up to the bus, Passenger Up push button should be clicked.
To indicate that one passenger moves down from the bus, Passenger down push button should be clicked.
When bus ready to move away from bus-stop, door push button clicked again to indicate that door is closed
private GpioPin passupPin; // One passenger come inside the bus
private GpioPin passdownPin; // One passenger left the bus
private GpioPin doorclosedPin; // bus door closed or opened
3 LEDs are used to indicate the status of each push button
Two variables are used to count number of passengers get inside the bus and number of passengers get out the bus. These 2 counters is reset with each bus stop (door open)
int m_iNumInSensor = 0 ; //Number of passanger move inside the bus
int m_iNumOutSensor = 0 ; //Number of passanger move outside the bus
When door closed, All needed data will be prepared, packed and send to Azure IOT hub.
var message = new Microsoft.Azure.Devices.Client.Message(Encoding.ASCII.GetBytes(JsonConvert.SerializeObject(transaction)));
try { await deviceClient.SendEventAsync(message); }
catch (Exception e) { string str = e.Message; }
From "Microsoft Azure Storage Explorer", you can check that new record has been inserted in the table
Also you can ensure that Device has sent something using "Device Explorer"
And you can check that Azure IOT hub is receiving stream data
Notes:
- Ensure that "Bus streaming Analytics" is running. As it is responsible to read data from IOT hub and save it in table in the storage
- In Arduino you have 2 functions setup and loop, in C# both functions should be handled inside Run. InitGPIO() function play the role of setup in Arduino, inside this function you should define events such door button is pushed. Then there is a thread contains infinite loop to ensure that application will continue running, waiting for events to get fired
public void Run(IBackgroundTaskInstance taskInstance)
{
deferral = taskInstance.GetDeferral();
InitGPIO();
Task.Run(() =>
{
while (true)
{
//Thats right...do nothing.
}
});
}
Inside the InitGPIO() function, pins get defined and events function registered
doorclosedPin = gpio.OpenPin(DOORCLOSED_PIN);
if (doorclosedPin.IsDriveModeSupported(GpioPinDriveMode.InputPullUp))
doorclosedPin.SetDriveMode(GpioPinDriveMode.InputPullUp);
else
doorclosedPin.SetDriveMode(GpioPinDriveMode.Input);
doorclosedPin.DebounceTimeout = TimeSpan.FromMilliseconds(50);
doorclosedPin.ValueChanged += DoorPin_ValueChanged;
- Device object that used to send telemetry to IOT hub is initiated when bus arrive to each bus stop
m_deviceClient = DeviceClient.CreateFromConnectionString
(connectionString, "BusDevice2", Microsoft.Azure.Devices.Client.TransportType.Http1);
connectionString is the IOT hub conection string and "BusDevice2" is the Device ID
We needed to create two entities to keep bus lines data such as (Line, Stop, Stop Order), and to keep Stops data (Stop, Lat, Long).
Creating the entities can be on Azure SQL Database, or on the same storage account used by IoT hub which is using NoSQL database. I recommend using SQL Database for those familiar with SQL commands.
The purpose of the API is to provide the needed data to fulfill passenger request on the mobile app. Rather than deployment of such huge amount of logic on the mobile app that as with larger computing power and speed will provide a reasonable performance and better customer experiance.
To fulfill a request that ask for a bus from an "origin stop" to a "destination stop", the API does the following:
- Searching bus lines that pass by the requested "origin stop" in the lines/stops database, optionally more than a bus line can serve the same stop.
- Again filtering the lines found to those also serve destination stop
- For each of those "potential" lines, we check against "real data" comes from bus sensors through the IoT hub, to find which line is currently serving and if a bus has not yet passed requested origin stop
- In case a bus found, we go through the whole journey to collect number of seats taken and those left, so that calculate final number of booked seats, then conclude available seats (we are assuming a number of 60 seats as a capacity, however it represents not only actual seating but also available standing spots, anyway this can be modified to separate the two kinds of capacity and feedback these details to the mobile app)
- To estimate the duration a bus takes from the location where it is till the origin stop, we had some options:
1 - We should know first its current location (easily can be using a GPS sensor on the Raspberry Pi that sends timely records of location lat and long) , however for our prototype we are using "last stop bus departed from" as a reference, but we found a small challenge with this option that bus could be near or far from that stop, so, we did put it this way:
If bus moved from stop A at 6:00pm and is heading towards stop B, and we know it should take about T = 10 mins to get there, and the request for that bus was made at about 6:04pm which is t = 4 mins after latest departure, then during calculations we subtract T-t = 10 - 4 = 6 mins is the time estimated to reach next stop B instead of 10 mins, which is closer to reality.
Of course timely sent bus locations - say each minute or 30 seconds - will give the most accuracy to our calculations
2- The second trick comes when we need to calculate the duration from one stop to another using Google API, it will return results that are based on a shortest route algorithm, which mostly will not be the same like our bus route, and there will be a high level of inaccuracy of estimated duration times.
To resolve this puzzle, we divided the calculations according to route designated paths from one stop to another, which means we needed to calculate the duration between a stop and the next one, and accumluate the results along till the desired stop. Assuming that most of the time the path between two stops is the shortest or near to it. Adding to this, we can add as much points as we need to represent all key points in bus route either stops or not, thus maintain the highest accuracy.
- Next step is aggregating travel times between last stop departed from and origin stop, then subtracting time between last departure and request time -as explained above - will provide estimated bus arrival time.
- The same is done for duration from origin to detention to aggregate estimated trip travel time.
- In addition to arrival/travel times, API will calculate dwell times (bus waiting times at stops) for targeted stops adding these times to relevant arrival or travel time.
- Finally all information is collected for each potential line and sent to mobile app.
The following flow chart explains the API core activities with relevant dependencies
- Azure account - You can Open an Azure account for free.
- You need to have Storage Account or a SQL Database on your Azure workspace, you can make use of the storage account used by the IoT Hub in the previous sections rather than creating other storage places.
- In Visual Studio, click Help -> About Microsoft Visual Studio and ensure that you have "Azure App Service Tools v2.9.1" or higher installed.
- Make sure you have .NETFramework 4.5.2 installed.
- Install Swagger framework , by running the following commands in package manager console:
PM> Install-Package Swashbuckle -Pre
PM> Install-Package Swashbuckle.Swagger -Pre
To show your PM console, select from menu as explained
1- Create a new project, select ASP .NET Web Application
2- The wizard takes you to create the App Service on your Azure account
Now you are ready to code.
SwaggerSwagger is a simple yet powerful representation of your RESTful API, allows you to discover and understand the capabilities of the service without access to source code , which is very helpful specially at testing.
3- You need first to enable swagger, from file "SwaggerConfig.cs" which is created for you according to project type. Just remove the commented code highlighted below.
})
.EnableSwaggerUi(c =>
{
4- When debugging your app, the browser will open something like the below, with error code 403, don't panic. All you need is to add the word "/swagger" to the end of URL opened.
5- Now, testing your app through swagger is ready
Starting by showing the model example which is the parameters passed to the API, and exploring available methods (We only needed "GET" in our API), then testing can be done by passing values to the parameters, pressing on "Try it out" and waiting for the output results, which by default are in a JSON format
As long as you have already created your project using your Azure account, the remaining steps to publish your App will be so easy
After publishing successfully on Azure, the web service homepage is opened in your browser, here you can repeat same steps for testing using Swagger, by adding "/swagger" to the URL.
You are now done.
What users of your service app need to know is URL of your API and parameters name with respective order.
Monitoring your API APPUsing Azure portal provides a live chart of requests and errors resulted from your API App usage.
A very helpful, and light tool to explore, import, export data in your storage account. Also we used it for testing and in our demonstration.
We needed to calculate estimates for dwell times (bus waiting times at stops), looking at many models already there, we have selected KNN model, inspired by the research published by Jianxia Xin and Shuyan Chen here Bus Dwell Time Prediction Based on KNN . Choice was due to its clarity and simplicity, the model depends on clustering stops into groups based on periods during the day/week, for example: weak days peak hours, for a certain bus line.
In our case we used about 71 records for bus line no. 10, while stop dwell times to be calculated as time between bus door opening and door closing (Using GPS will give more accuracy to use the time of entrance to stop lane and exit from stop lane).
After training the model (we've chosen 70% of data for training ) and scoring, we have published the model as an API to be used by the web service adding dwell times to estimated times calculations.
Step by Step1- Create machine components, we used k=10 as it provided better results, in the following figure, we used a sample file uploaded to the machine data set
2- Running the machine, produced a "Predictive" model that can be published
3- Publishing the predective model as an API service
4- From the link labeled "Request/Response", one can browse the API documentation produced
5- In the following figure, showing how to configure the machine to load the data from BITS storage account table "busjourneyinfo"
- Create new project in Android Studio. click next and choose default values.
- Code can be found in below link
https://github.com/mrahman4/BITSCode
- Compile application and run it on Android Emulator
Passenger should has this application on his phone, he selects the start stop and end stop the submit his request. Android application communicate with Azure web service and get information related to all buses that can take him from this stop to end stop. web service return Bus number, when bus will arrive to start stop, duration needed to reach end stop and how many seats are available in this bus.
Application should have permission to access internet so you need to add below lines in AndroidManifest.xml
<permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.INTERNET" />
To support JSON & HTTP codes, add below lines in build.gardle (Module: app)
add below line before buildTyes
useLibrary 'org.apache.http.legacy'
add below line inside dependencies in the same file
compile 'com.google.code.gson:gson:2.4'
Note:
- To connect to Azure web service, RestClient class has been used from below link. Many thanks to them for help :)
http://lukencode.com/2010/04/27/calling-web-services-in-android-using-httpclient/
- getArrivalTime() function communicate with the webservice so it should be called from new thread different than main GUI thread. This new thread ca't update in GUI so you need to use runOnUIThread() to update in the GUI. Take care that Android Studio by default let you debug only in main GUI thread. if you want to debug in both threads you need to change Android Studio settings
new Thread(new Runnable() {
public void run()
{
m_strAvailableBusesInfo = "";
getArrivalTime();
runOnUiThread(new Runnable()
{
@Override
public void run() {
arrivalTimeTxt.setText(m_strAvailableBusesInfo);
arrivalTimeTxt.refreshDrawableState();
}
});
}
}).start();
- Webservice return JSON file. To parse this file, all unseen char should be
JSONObject jsonRootObject = new JSONObject(new String(strJson.getBytes("UTF-8")));
//Get the instance of JSONArray that contains JSONObjects
JSONArray jsonArray = jsonRootObject.optJSONArray("Buses");
int iNofJsonArray = jsonArray.length();
//Iterate the jsonArray and print the info of JSONObjects
for(int i = 0 ; i < iNofJsonArray ; i++ )
{
String strBusID = jsonObject.optString("BusID").toString();
String strArrivalTime = jsonObject.optString("ArrivalTime").toString();
String strTravelTime = jsonObject.optString("TravelTime").toString();
String strAvailableSeats= jsonObject.optString("AvailableSeats").toString();
}
DemoIn this video we demonstrate a complete transaction from the first trigger when a bus open the door at certain stop "1111" and then 6 passengers get into the bus and one passenger comes out, all this is simulated by push buttons on the bread board - however in reality two beam sensors installed at bus doors can be used so that whenever a passenger pass through them we can detect wither he is getting in or out - and then door closed which triggers a procedure to send collected data to the IoT hub that records this activity in the cloud storage.
When a request is submitted on the mobile app that requests a bus from a certain stop to another one, the API app detects that same bus is passing by those stops, and checks its availability and calculate number of available seats according to all previous stops in the same journey, then using google APIs calculate the forecasted duration for the bus to arrive and the duration estimated to get to the requested destination, and send it back to mobile app.
We used five tools in the demo:
- API App portal (Web Service) to show number of requests received
- IoT hub portal, to show number of messages received
- Visual Studio to show where the message sent to the IoT hub
- Android Studion emulator, to run the mobile app
- MS Azure Storage Explorer, to monitor data records
Following diagram describes how data flow inside different tools inside data analytics cycle
We used actual DataSet related to Dublin City buses; the Data set has been downloaded from following link:
https://data.gov.ie/dataset/dublin-bus-gps-sample-data-from-dublin-city-council-insight-project (Dublin Bus GPS Data)
The Dataset contains 30 days of actual Dublin Buses GPS data across Dublin City, downloaded from Dublin City Council 'traffic control, in csv format.
Data Transformation goes through following three steps, which will reduce total number of records for the 7 days from 9.5M records to 700K records :
a) We will select 7 days only from the DataSet “from 1-1-2013 till 7-1-2013” as all other weeks have similar patterns.
b) For the sake of the project we identified following fields in the DataSet that are relative to our project, all other fields will be filtered out:
[0]prodtime: Not needed, unix time format, we will generate another column that matches day time
[1]; Line ID: Keep it. Line ID for each travel
Direction: Not needed
Journey Pattern: Not needed
[2]; Date: Keep it
[3]; Journey ID: Keep it. Unique ID for each bus journey
Operator: Not needed
Congestion: Not needed
[4]; Long: Keep it, GPS Longitude
[5]; Lat: Keep it, GPS Latitude
Delay: Not Needed
Block ID: Not needed
[6]; Bus ID: Keep It, Bus ID
[7]; Station ID: Keep it, Bus station ID
[8]; At Stop: Keep it, 1=bus stopped, 0= bus is moving
· Also we are interest only in the rows that has “At Stop =1”
· We used Impala Job to do the following:
a) Filter all unneeded columns as described above
b) Select rows only that have the “At Stop =1”
c) Order the output based on the “journey ID” fiele
- Impala Code:
I. Create 7 tables one for each day, and load the data into it:
create external table businfo_2013_01_01 (prodTime bigint, lineID int, direction int, journeyPatternId string, journeydate string, journeyID int, operator string, congestion int, gpsLong float, gpLat float, delay int, blockID int, busID int, stationID int, atStop int ) row format delimited fields terminated by ',';
load data local inpath '/home/cloudera/DublinBuses010113-310113/DataSet/siri.20130101.csv' overwrite into table businfo_2013_01_01;
o We applied same code to create the remaining 6 tables: businfo_2013_01_02, businfo_2013_01_03, businfo_2013_01_04, businfo_2013_01_05, businfo_2013_01_06, businfo_2013_01_07
II. Filter all unneeded columns from the 7 tables, select “At Station =0” and order by “journey ID” field:
with
cte_station
as ( select *, row_number() over (partition by journeydate, journeyid, stationid
order by prodtime asc ) as rn_station
from businfo_2013_01_01 )
select prodtime, lineid, journeydate, journeyID, gpslong, gpslat, busid, stationid, atstop from cte_station
where rn_station = 1 and atstop = 1
order by journeyid;
o We applied same code to filter all 6 tables: businfo_2013_01_02, businfo_2013_01_03, businfo_2013_01_04, businfo_2013_01_05, businfo_2013_01_06, businfo_2013_01_07
III. Save the 7 tables into csv files
a) Using python code we added four columns to each output files, the new columns will match the following:
a. Pass_IN: Random number of passengers getting in the bus at each station
b. Pass_Out: random number of passengers getting at from the bus at each station
c. Total_Pass: total number of passengers onboard of the bus
d. timestamp: change the prod time which is Unix time to Data Time
· Assumption: bus maximum on board number of passengers is 60
· The python code we used to perform the above tasks is uploaded on the site with following name: busDataSetEdit.py
· We ran the same code for all remaining 6 output files so at the end we will have 7 files reflects all transformations that we mentioned: businfo_pass_2013_01_01.csv', businfo_pass_2013_01_02.csv', businfo_pass_2013_01_03.csv', businfo_pass_2013_01_04.csv', businfo_pass_2013_01_05.csv', businfo_pass_2013_01_06.csv', businfo_pass_2013_01_07.csv'
3- Data Extractiona. As our DataSet is large in size, we decided to upload only one file representing data of one day to the Azure table storage service
i. We uploaded the file using Azure Storage Explorer application where we identified the “storage account name” & “storage account Key” (extracted from Azure table properties)
ii. To read the data again from Azure table storage we used following python code:
#! /usr/bin/env python
# Auther: Mohamed Moussa
from azure.storage.table import TableService, Entity
table_service = TableService(account_name='bitsstorage1', account_key=<removed for security>')
tasks = table_service.query_entities('BusJournyData')
for task in tasks:
print(task.PartitionKey)
print(task.RowKey)
print(task.prodTime)
print(task.lineID)
print(task.journeyDate)
print(task.journeyID)
print(task.gpsLong)
print(task.gpsLat)
print(task.busID)
print(task.stationID)
print(task.atStop)
print(task.passIn)
print(task.passOut)
print(task.passOnBoard)
print(task.Dtime)
o Then we redirect the file output to a csv file, however for the sake of applying data analytics on a large dataset for more accurate date, we decided to continue using the local files as described in below steps
EndFragment
b. Data extraction is being done through following spark job, the main objective is to extract the longest journeys based on number of stations:
- Code is uploaded with name busjournyStation.py
o the output of the above job is 7 directories each directory contains a file with 2 columns, first column count the number of stations, & second column maps the journeyID
c. Another spark job is being used to generate list of buses and associate journey ID per each day as follow:
- Code is uploaded with name buscount.py
- Loading the Data
a. The output of the last step was 7 files that contains count of stations per each Journey, the objective of this step is to load the 7 files in 7 Hive tables, the quires are as follow:
create external table JournyStation_2013_01_01 (StationCount int, JourneyId int) row format delimited fields terminated by ',';
load data local inpath '/home/cloudera/DublinBuses010113-310113/DataSet/sparkJob/businfo_pass_2013_01_01.csv/part-00000'
· We have done the same Hive queries for all remaining 6 files
create external table JournyBuses_2013_01_01 (BusID int, JourneyId int) row format delimited fields terminated by ',';
load data local inpath '/home/cloudera/DublinBuses010113-310113/DataSet/sparkJob/Buses/businfo_pass_2013_01_01.csv/part-00000'
· We have done the same Hive queries for all remaining 6 files
4- Reportinga. Scope is to report on the following aspects:
i. Total number of Journeys per day
ii. Longest Journeys (has maximum number of stations)
iii. Total number of buses per day
b. Reporting will be done using Microsoft Power Bi plus the ODBC driver to Hive tables.
c. Steps to setup Power BI, ODBC to connect to HIVE table
i. Download cloudera ODBC from http://www.cloudera.com/documentation/other/connectors/hive-odbc/2-5-12.html
ii. From windows ODBC manager select cloudera ODBC and type in the IP address and the username ‘cloudera’, then test the connectivity
iii. From powerBI select the ODBC option/Cloudera Hive ODBC
d. After importing all data files in Power Bi (through ODBC connection) following changes need to be adjusted:
i. For all 7 files related to Journeys, all header columns need to be changed as to “Stations Count” and “Journey ID”
ii. For all 7 files related to buses, all header columns need to be changed to “Buses ID” and “Journey ID”
e. Below screen shot represents the report from inside Microsoft Power BI which represents the following:
i. Total number of Journeys per day
ii. Longest Journeys (has maximum number of stations)
iii. Total number of buses per day
· Further enhancement to the report will include: total number of passengers in each day, highest journeys in regards to cunt of passengers.
Opportunities and Future Scaling- Using GPS for accurate Bus location, and for improved route performance monitoring
- Improve the Machine Learning model to predict dwell times with less error percentage
- Developing the mobile app for Windows Phone and iOS platforms
Comments