Experts’ voices: Data Points and Politics – the Gordian Knot in Railroad Data Pools

February 26th, 2025 | by GEONATIVES

(7 min read)

From the very beginning of our blog, we have been fascinated by data, their handling and conditioning and how to make sure they can be stored in accessible pools in standardized formats. Last October, we had the pleasure to discuss this topic with an expert at German Aerospace Center (DLR). Dr. Jörn Groos, who joined DLR in 2014, graduated as geophysicist and experimental seismologist [1]. These days, Jörn works on projects for assessing the state of railroad infrastructure. This includes, among other things, the creation of diagnostic models and algorithms for sensor data analysis.

The topic we dived into are the so-called “Common European Data Spaces” which are designed to make data available for access and reuse in specific industries. Jörn is involved in the European Joint Undertaking for railway research Europe’s Rail (ERJU), which aim to foster innovative rail product solutions. Participants are rail operators, component providers and software providers. Here, he contributes to Flagship Area 3 of the ERJU (Intelligent and integrated asset management) addressing also data management and exchange. This task is his direct link to the European Rail Data Space (ERDS), also driven within the ERJU.

The goal of the efforts on the European level is to make sure that railroad infrastructure maintenance is performed on the preventive side but without excessive buffers (i.e. often enough but not too often). Disruptions of the operation due to malfunctions need to be avoided as well as the waste of resources on maintenance work for assets that do not yet need maintenance. Predicting the wear and tear is based on data that are collected in the field and fed into prediction algorithms.

Visualizing predictive maintenance data (Video by DLR)

Silos and the need for collaboration

Looking at the European level of railway infrastructure, you find lots of silos and stakeholders that need to be connected to a common dataspace. There are the owners of the infrastructure, its users, the service providers for maintenance, the service providers for inspections, the component manufacturers etc. And, not to forget, the country-specific ecosystems. The trick is to make them all collaborate and agree on common data models, data formats and rules for exchange of data.

Core tasks of predictive maintenance are asset management, asset monitoring and sufficiently precise prediction. The dataspace serves as the platform for data exchange. Algorithms are developed, for example, by DLR.

But how do data make it into the data space and how are they maintained? The first step is to make sure that data collected from different sources (read: in different formats and from different stakeholders) can be offered and found in a common data registry – while the data itself remains at the data owners servers in a federated architecture – and may be made available (i.e. providing means to retrieve, understand and decompose the data in agreed formats). This step also involves that data access and ownership be regulated within a valid legal framework. Therefore, efforts are made to define and establish the Common European Data Spaces such as the ERDS, so that handmade solutions like the ones enabled by the typical service providers (e.g. weTransfer, Sharepoint etc.) become obsolete.

From data space to data model

Within Europe’s Rail Flagship Area 3 project FP3-IAM4RAIL  relevant data for infrastructure monitoring and maintenance are defined and collected. In parallel, Flagship Area 1 project FP-1-MOTIONAL  is addressing data models and formats as well as the definition of a Rail Data Space utilizing GAIA-X to provide the technical foundation as the third step. The ERDS shall be considered as one of the above mentioned “Common European Data Spaces” specific for the rail sector. Its goal is to develop governance for the exchange of data and services, including mechanisms to maintain data and service sovereignty. For the railroad maintenance data that we have been discussing, it may provide a technology stack. But it falls short of providing the step in-between (the missing step two): a common data model.

This is the tricky first part. If you look at large infrastructure operators in the European market, they have two goals: define the nature and structure of their data pool, and remain independent from manufacturers. On the other hand, manufacturers typically maintain their own data formats. What is missing as of today is an integration layer between the parties that could be standardized and, thus, open the data and the market. With most of the operators being state-owned, the idea is, therefore, to initiate activities on the European level.

What would it take to agree on common, standardized data models and data formats? First, each party would need to clearly see the benefits of collaboration outside the established business relationships and also to be willing to “sacrifice” parts of what they already have. Standardization means agreeing on compromises to the benefit of all stakeholders.

But are today’s silos actually as rigid and consistent as they appear from outside? They are not. Just think of the various factions within these silos, which all have and maintain their own data pools in sometimes incompatible formats. Unifying data formats on the European level would also facilitate the data exchange within the existing conglomerates.

Legacy as a burden

Another obstacle to introducing common data formats is that each established stakeholder already has a large amount of legacy data. What to do with them? If you are a new player or if you start building infrastructure from scratch (see, for example, the newly created high-speed rail network in China) it’s quite easy to adopt whatever has been defined or is emerging as a standard. If you have been collecting and maintaining data for long periods of time and have merely started digitalizing them, your business case might look different.

The solution may be to consider data storage separately from data exchange. If for the latter an exchange mechanism and (standardized) format can be defined, the former may be decoupled and may just need an agreed space where it can reside, i.e. the European Rail Data Space.

Here comes the strength of the smaller players in the markets. Operators of smaller to mid-size rail infrastructures. For smaller countries and their respective operators it has not made and may also in the future not make sense to develop their own data spaces, data structures and tools for asset management. They purchase the respective services and tools from the market providers instead. But sometimes also departments of large infrastructure operators get challenged because third-party solutions get evaluated in parallel to their existing implementations.

This makes their positions comparable to parties in an open and balanced market. Vendor lock-in is avoided by requiring suppliers to comply with existing standards for data formats. One great example of an existing data standard is railML to describe railroad networks, timetables and rolling stock. But efforts don’t stop there. After the quest for standardization comes the request that data and software be provided as open-source solutions.

We have seen it in automotive before

Overall, it’s a trend we have seen in automotive before. Take, for example, ASAM e.V., an association that was founded with the goal in mind to create world-wide standards for data formats, protocols and APIs. This allowed the ecosystem to go from 1-on-1 relationships to an open and broad market. It were actually the big players that initiated the change – to the benefit of all stakeholders. In railroad, this adventure has only just begun. Standardization still has a long track to go.

The railroad business is different in other aspects, too: the number of customers, solution providers, units sold etc. is much smaller than in the automotive industry. The economies of scale when introducing new technologies – and standards – is smaller across the market. But unifying data formats and interfaces could still make a difference, again, for smaller players and new entrants. Instead of implementing (and debugging!) specifics for each customer individually, one sound implementation of a standard might do.

More hurdles for standardization

Unfortunately, there are other aspects that give a bleak perspective for rapid standardization: First, deliverables are not split across as many different parties as in the automotive business. In railroad, packages that provide assets-as-a-service are a common business model and include not only the hardware, but also algorithms, the data platform etc. from a single provider. Often, “local” solutions are preferred: DB buys solutions from suppliers in Germany, SNCF from suppliers in France. Second, the maintenance business is mostly about infrastructure, not primarily about vehicles (as opposed to the automotive sector). If new rolling stock gets acquired it comes with monitoring systems tailored to the corresponding vehicle platform. Third, life cycles of railroad assets are much longer (30+ years for a typical rolling stock or railway track infrastructure asset but up to 120 years e.g., for bridges), making a change in owner-vendor relationship less probable during lifetime. Fourth, regulation is rather strong in the railroad industry due to safety aspects, and it is much more country specific. Fifth – and definitely not last – as pointed out before, there’s already a long history related to railroad infrastructure in Europe, meaning lots of legacy data and legacy business.

The opportunity for change

But that things have always been done in a specific way doesn’t mean they cannot change. There are plenty of examples where the sheer complexity and inconsistency of data handling in some organizations hinders the progress in safety and efficiency. Jörn gave us a nice example about the handling and monitoring of sleeper assets in large vs small operators’ organizations. It became more and more important to know what was installed to be able to replace the proper parts if systematic wear and tear appear.

So, overall, what do we expect? The European Rail Data Space is a good starting point that also provides sandbox environments for new technologies. Adoption of these new technologies seems to be under way, but the railroad industry is measuring time in decades, not in years. Therefore, progress is expected to be slow (but steady!) for now.

If we want to see disruption, it will be the smaller players in the market who can make the difference. They are keen to adopt new technologies that make them more efficient. And if enough of them adopt what’s already available and drive innovation, they might even make a combined critical mass that the big players can’t ignore.

Thank you

The hour we spent talking with Jörn was inspiring. We learned a lot about the backstage processes of railroad business. Seeing the similarities with the automotive industry is one thing. Accepting the slower pace in railroad business is hard, though. But again, who knows? Sometimes disruption is just around the corner. Thank you, Jörn, for these valuable insights and for the great discussion.


[1] For the ones who are curious what’s the difference between seismology and seismic exploration – it’s the bang. A seismologist measures what’s available whereas you first create a bang and measure afterwards for seismic exploration.


Add a Comment

Your email address will not be published. Required fields are marked *