The Importance of the Business Analyst in Data Migration Projects

by Arnout Vanden Berghe on March 9, 2017

This blog focuses on the role of the business analyst within a data migration project. In our experience, this role is often underestimated within the migration context and that is why we want to take this opportunity to highlight some best practices. We will illustrate these best practices using one of our current migration projects: the migration as part of the Atrias project.


The Atrias project involves a.o. the migration of data from the internal systems of Belgium’s 5 largest distribution system operators (i.e. Eandis, Infrax, ORES, Sibelga and RESA) to the new federal market system, which is known as the Central Market System (CMS). This represents a major change because to date, every distribution system operator takes care of this individually (although Eandis and Ores do use a joint system). By centralising the various applications, the distribution system operators aim to optimize market communication in support of the new processes that are being introduced within MIG6.0.

This post focuses specifically on the role of the business analyst within the Atrias migration project.

The range of duties of a business analyst within a migration project can be subdivided into 5 separate phases and overall reporting. The various phases and the underlying tasks are discussed in more detail below.

The initial phase consists of ascertaining the scope of the data to be migrated. The best way to do this is to use the business processes as a point of departure, as they will decide which data are required for support. This also keeps the analyst from unnecessarily migrating data that might never be used again afterwards.

In addition, it is also important to agree on a time frame: if the scope is unclear or too wide, this has an unnecessary additional impact, both on the migration project and the actual operation of the system following the conclusion of the project (cf archiving, backup, performance, etc.).

In the second phase, a canonical data model (CaDaMo) is typically established based on the previously determined scope. The object of a CaDaMo is to function as an intermediate model so that migration can function independently from the parallel development of the target system (and the underlying data model).

This data model is composed of the new concepts introduced by the new processes (in this case the MIG6.0 processes). The major difference between this model and the data model of the target system is the fact that the CaDaMo is a system-neutral model aiming to provide an intermediate data model in support of the migration chain. This chain consists of a number of ETL steps. ETL stands for Extract, Transform & Load. Extract refers to the extraction of the data from the source system. Transform are the steps required to transform the data from the AS-IS data model to the TO-BE data model and Load signifies the loading of the data in the various intermediate models and the target system.

During the third phase – part of which can take place in parallel with phase two – the source systems that will supply data are identified. In this regard, it is useful to start with a level of detail superior to that of the CaDaMo, viz. the level of the so-called business entities. These entities are clusters of data that have a logical connection.

In case of the Atrias migration, typical examples include TMD (technical master data on the point of access), RMD (relational master data on the contracts), Metering (measuring data with reference to a point of access), etc. As you can see from the previous examples, business entities are often sector- or company-specific. Distribution system operators specialize in information on e.g. access points, measuring installations, indexes …, whereas an energy supplier will be more focussed on information relating to customers, contracts, products …

Once the business entities have been defined, it is time to determine which system is the best source to extract this information. By ‘the best source’ we do not necessarily mean the master data system (i.e. the system responsible for the creation of internal data or the reception of external data).

The reason is that the consolidation of data from various sources is often complex in a central data model, and sometimes this consolidation is already being carried out by a different internal system. For instance, as part of this project the internal clearing house is fed by several source systems and it consolidates this information before it is communicated externally. This is also the system of which the CMS will take over most functionalities at Go-Live.

That is also the reason why most of the data from this system is extracted, but data are also collected from a number of other systems. This is no easy feat and it is crucial to involve Business in this aspect (as well as the architects) as they are often best placed to assess the impact of certain choices.

Once the source systems have been selected, we can turn our attention to the mapping of the source models on the CaDaMo, and of the CaDaMo on the data model of the target system. This mapping makes it possible to ascertain which transformations are necessary to migrate from the source data model to the target data model.

As a result, the number of transformations depends on the structure of the CaDaMo and on the data model of the target system. As with the previous step, it is important to involve Business as it is crucial to interpret the source data correctly in light of the transformation. In support of the business review, design documents are drawn up which clearly describe the source fields (incl. the source system) as well as possible transformations.

As soon as the field mapping (and the necessary transformations) have been implemented, we can start testing the migration of the data from the source to the target system(s). This is typically done through a series of dry runs. As a result, the dropout rate can be analysed correctly and in full throughout the various dry runs and the necessary adjustments can be implemented.

Throughout the dry runs, the goal is to send increasing data volumes through the chain thanks to the adjustments that have been carried out. These adjustments can be made both on technical (e.g. field size, PK/FK relations …) and business grounds (e.g. inaccurate source data, erroneous transformation rules).

Checkpoints are used to confirm the analysis of the dropout rate. The purpose of checkpoints is to measure the quantity of the migrated data throughout the migration chain. Examples include ‘Number of unique access points’, ‘Number of unique contracts’, etc. These checkpoints are implemented on every data model, viz. the extracts from the source systems, the CaDaMo and the target system. This makes it possible to analyse whether the dropout rate is due to the migration from the source to the CaDaMo or from the CaDaMo to the target system.

Consistency checks are set up in addition to the checkpoints in order to measure the quality of the migrated data. Throughout the chain, data are often transformed, and this involves an inherent risk of loss of business value. That is why it is important to test for this as well. The example we’ve used is ‘Comparing access point X in the source system to the situation in the target system’.

This type of test is often performed by business experts as the source data are often transformed and must also be in line with the requirements of the new business processes.

In addition to the two types of testing detailed above, synchronization checks are also set up with the aim of comparing any transformations that must be carried out in the source systems with the transformations that take place throughout the migration chain. This is just to be sure that all systems are properly synchronized at the moment of Go-Live.

A final key aspect of any migration project is the creation of a reporting system. Migration is typically something that takes place behind the scenes and the changes it brings often only become noticeable at Go Live. That is why it is important to report on the progress of the different dry runs as well as, obviously, the final run.

Far too often, migration is regarded as a purely technical project, but a poorly executed migration can have a major impact on the business side of things. After all, there is an inalienable link between data and business value and it forms the basis upon which companies can operate and act in a constantly evolving market. Moreover, the impact of bad transformations or of data that was lost during the migration doesn’t become clear until later. If such matters must be rectified afterwards then this can have a serious effect both on business and IT since the modifications must be carried out on an operational system.

I hope I’ve been able to highlight the importance of the business analyst throughout the various stages of a migration project and hopefully this blog can serve as a basis for current or future migration projects.

Would you like to learn more about migration or our expertise in the Utilities sector? Then feel free to contact our Utilities experts.


Topics: Utilities, Data Migration