ICE Wifi-user Heatmap

Capturing Passengers' Connectivity Behavior on Inter-City Express Train

Start With The Story

Germany's railway company Deutsche Bahn operates 247 long distance trains Inter-City Express (ICE) serving approximately 148 Million passengers, only in 2018. This is a huge number, and is expected to grow even more at the following years. Probably, the availability of Wifi connections within the ICE train is one of the cause.

Out of curiousity, I downloaded their "Wifi on ICE" data here to get a hint on the passenger behavior related to the Wifi facility. The zipped file size is 232Mb, and the real CSV data is around 2Gb with over 21 millions of rows. So it happened. Displaying all of them would be very costly. So, after a series of pre-processing steps (see below), only 25 trains with the longest routes is displayed here, representing 33k rows in the PostGreSQL database.

From those, we can obtain the Wifi-user Heatmap and the tracks, including stops at the train stations and cities as presented below.

How to use this page?
The number of displayed trains is adjustable:
  • num: [1..25], def = 25
  • show: ['station', else], def = 'city'
Calling this page without parameter will get the defaults. Otherwise, just try something like this

 
Legend
COLORS
The colors depict the heatmap of the wifi-users in the train:
  • pure green represents zero,
  • pure yellow is 150 users,
  • pure red means 300 or above.
The number of connected users may vary for different trains, they are all displayed in thin lines so that the gradation is still visible in the zoomed-out map.

Displayed data from train id:
[54117180] [58031130] [54038600] [54038030] [54117560]
 
Legend
Each track from a train is displayed using varying colors. Stops are represented using circles whose radius depict the duration. Each stop is matched with the lists of coordinates of cities (obtained from here) and train stations (pre-processed from here).

Adjust the number of displayed tracks using the parameter num, and pick train stations over cities to display using show. An example for that is given here.

Cities and train stations outside of Germany is not shown since their coordinates are not listed in the obtained database.
 

1. Train-based Splitting
Out of those millions of rows, there are "only" 255 unique train ID (column sid). Splitting the data based on that row brings us to work with 255 significantly smaller CSV files, particularly after duplicates removal.
2. Minute-based Merging
The column created indicates that the data are collected every five seconds for each train. The rows are merged based on the minutes, while carefully maintaining the valuable numbers in the columns.
3. Splitting Long Stops
After some plot test, we got the idea that the data came from at least two different dates (plus the following dawn). Therefore it is necessary to split them (at least) once again for avoiding "train jumps" on the display.
4. Travel-based Selection
We pick only the trains stopping at at least two stations and two cities. Some data are indicated as a test result, as they only wander around a workshop (or even a drone involved?). This selection ends the pre-processing phase.