Integrating Data with MDM
The objective of most MDM Hub projects is to establish a trusted source of master data. In addition to the right vendor, appropriate partner, and an efficient implementation plan, it is very important to come up with the right integration strategy. An organization’s existing eco-system will typically consist of different source systems and integrating all of them with the new MDM system becomes a huge task in itself. Any small misstep here would lead to delays, cost overruns and substandard or missing data. In turn, sponsors will lose confidence in the MDM hub as a trusted source to be integrated with downstream applications.Net result: the ROI will seem a lot less attractive.
In today’s post, we will discuss the key elements of an MDM integration strategy.
Identifying the right source
The first step of an MDM integration project is to identify which data sources, if integrated with the MDM hub, will provide the greatest ROI. Flexibility in MDM allows it to work as a trusted source for any domain of data, making it easy for the sponsors to get organizational buy-in while integrating only a subset of data. Key is to start small with high value information sources such as “credit card member data”, “master patient index”, “product categories”, and “sub-categories” and so on.
Integrating sources with MDM
When we discuss MDM integration, a common question data management teams ask is “why should integrating data with an MDM application be any different than the traditional integration patterns followed for a data warehouse?” Our answer is; at the process level there is no change. However there are subtle changes in how these processes are implemented. Let us see the common Data Integration processes and their specific implementation in MDM Hub projects.
Figure 1: Common Data Integration Processes
Discovery and Define:
In the discovery phase, potential systems which can be integrated with the MDM are identified.
Some of the activities typically associated with this phase are:
- Create data dictionary for all the available attributes.
- Measure each attribute and assign priority and potential value if persisted in MDM against the KPIs
Post discovery, we analyze the data further and try to define it.
- Data Profiling.
- Measure Data Completeness
- Measure Data Quality
All these activities are normally carried on for a traditional EDW/OLTP/OLAP project as well. However the key difference is that for MDM implementations, the data should always be visualized as Business Objects instead of Facts, dimensions or entities.
Secondly, an MDM solution is expected to be the source of “Golden Record”, Data Quality becomes mission critical in MDM implementations. For example, John Doe and J. Doe cannot exist as separate customers in MDM due to its entity resolution capabilities. So, preparing data for such a resolution within MDM is critical.
As a result Data profiling, Data cleansing and Data verification become increasingly significant as compared to EDW/OLAP/OLTP implementations. Data Profiling allows better insight in the patterns in source data which can then be cleansed for better duplicate matching and customize the customer matching rules. Once we are done, we should now have data that has gone through profiling, cleansing and verification. The next step is to load the data into MDM.
Load:
An MDM Integration project provides an interesting departure from the traditional ETL methods used for loading an EDW.
In an EDW implementation project, most of the data movement involves moving data from database/flat file data sources to another database and the transformations are limited to validating, massaging, or roll up/roll down of data. Most of the data relationship restrictions are not applied at the load itself; as a result inconsistent data can be persisted in a data warehouse.
Though an MDM hub can be loaded in the same manner, except for initial load, direct loads of MDM are not preferred because the indexes and triggers need to be set after each load which can increase the down time of MDM.
Here again, we see the benefit of visualizing data as business objects. Since MDM communicates through services we can leverage these services to customize the granularity as well as action to be taken for the data to be loaded. For example, you can load just the customer details with a contact number and address without integrating the account source system by using MDM Service XMLs.
Using a hierarchical data structure such as XMLs has its benefits; however, do understand the complexities involved as well. Since every record has to be a well-formed XML to be loaded successfully into MDM, weird characters (&, ^, >, <) have to identified and cleansed. This is why there is high emphasis on data profiling and data cleansing in the discovery phase of MDM integration projects.
Another key difference in the MDM integration projects when compared to traditional data integration projects is the expectations of going “Real-Time” or “Near Real Time “. While an MDM system is expected to work real-time and near real time, EDWs are still by and large batch driven. So, the data management team working on MDM integration project should evaluate the readiness of ETL tools for real time or near real time integration. If a given tool does not have enough features, be ready to create custom code, or use another complementary tool as well.
Federate:
Federation of MDM data in the downstream applications can be achieved by two methods
- A Get Service call
- Directly querying the MDM database.
Since MDM is SOA driven, most of the downstream systems can be integrated through a data service. MDM services can be customized to modify the granularity of the data.
A common challenge which we have noticed across implementations is how to synchronize and store the customer records in EDW which have been marked as duplicate by the MDM and still track the source lineage of these records. One of the ways in which we deal with this challenge is to keep track of not just the source association, but keep even the attributes associated clearly to their respective sources in the EDW.
To sum up, a well-planned integration strategy covering detailed source data analysis, data load plan, and proper federation mechanisms, will contribute a great deal in the success of MDM Hub implementations.
For further discussion on Data Integration in the MDM eco-system , do feel free to contact me at ashish.mishra@infotrellis.com
Originally published at https://blogs.mastechinfotrellis.com.