• Alibek Jakupov

Bus passenger flow prediction using supervised and unsupervised learning

Updated: Nov 19, 2021


Based on data provided by transit smart card automated fare collection system predict bus crowding.

Input data

As there is no information provided on input data features, we decided to rely on approaches provided in earlier scientific researches and evaluate their potential to be applied in large areas, as Ile de France.

In most of the researches, the attributes provided by smart cards are card id, bus route id, bus vehicle code, boarding time, latitude and longitude of boarding stations.


Neither alighting time nor alighting station coordinates are provided.

Possible Approaches

State-of-the-art solutions are based on temporal features including the day of a week, the hour of a day, and holidays, the scenario features including direction (inbound and outbound) and type (tickets and cards), and the passenger flow features including the previous average passenger flow and real-time passenger flow [1].

Furthermore, some researchers tried to improve the technique by capturing both spatio-temporal correlation and specific scenario patterns of bus traffic flow [2].

Some works take into consideration land use data such as population size, demography of the different socio-professional categories, number of hospitals, malls, companies, etc. [3]. They justified their choice based on the conclusions pointed out in [4, 5]. These papers state that socio-economic indicators in all countries have similar impact on urban mobility.

Finally, there are some other approaches inspired by the previous research, but unlike [3] they included the temporal structure of the passenger flow in the learning model using smart card data [7]. They also add key features as theoretical travel time using different transport mode. As smart cards do not register the alighting locations, they used a trip chaining model [7] for estimating 65% of passengers' destinations.


After analyzing all the scientific papers mentioned above, here is the final approach that seems to correspond the most to the context.

Base model

Base model is inspired by [3] and uses ANN (Artificial Neural Network model)

The connection between traffic zones can be marked in the form of an adjacency matrix in the bus network.


Ultimate bus accessibility is H*: H*=a*H + b* H’ (a,b are coefficients). Because direct travel is more convenient than the transfer, coefficient a is greater than b.

Additional features

Besides these features the following input data will be added:

  1. Theoretical travel time (using "Navitia" API)

  2. Temporal structure of the passenger flow

  3. Contextual data (IRIS Grouped Areas for Statistical Information), the most detailed geographical statistics available in France)

Supply-side analysis

This method is inspired by [7] which allows to derive the destination from the systems that only provides information on boarding time with no information available about destination arrival times. The idea proposed by this research paper is to obtain arrival times for each stop with a superposition method. The time of arrival for each run of the day is calculated from all the boarding observations of the month. The time of arrival at the terminal stop is estimated with the calculated average commercial speed of the route and the “maximum time of arrival” of the run (departure time of the bus’s next run).

The following indices are fed into the model

i transit user (unique smart card identification)

j sequence number of transit stops within a route

r sequence number of the route in the user’s day

k day


[1] Lijuan Liu and Rung-Ching Chen. A novel passenger flow prediction model using deep learning methods. Transportation Research Part C: Emerging Technologies, 84:74 - 91, 2017.

[2] Panbiao Liu, Yong Zhang, Dehui Kong, and Baocai Yin. Improved spatio-temporal residual networks for bus traffic flow prediction. Applied Sciences, 9(4), 2019.

[3] Y. Shaoqiang, S. Caiyun, Y. Yang, Z. Shuyuan, and Y. Wenlong. Prediction of bus passenger trip flow based on artificial neural network. Advances in Mechanical Engineering, 8(10):1687814016675999, 2016.

[4] Rémi Louf, Camille Roth, and Marc Barthelemy. Scaling in transportation networks. PLoS One, 9(7):e102007, 2014.

[5] Rémi Louf and Marc Barthelemy. How congestion shapes cities: from mobility patterns to scaling. Scientific reports, 4:5561, 2014.

[6] Prediction of bus passenger flow using Deep Learning, Walid Kheriji, Sami Kraiem, Guilhem Sanmarty, Ghazaleh Khodabandelou and Fouad Hadj Selem

[7] R. Chapleau M. Trépepanier and N. Tranchant. Individual trip destination estimationin a transit smart card automated fare collection system. Journal of Intelligent Transportation Systems, 11:1-14, 2007.

28 views0 comments