Challenges of Data Integration and Data Migration

The 6 Challenges of Big Data Integration | FlyData

Data Integration

Data Integration is a combination of technical and business processes used to combine different data from disparate sources in order to turn it into valuable business insight. This process generally supports the analytical processing of data by aligning, combining, and presenting each data store to an end-user, and is usually executed in a data warehouse via specialized integration software. ETL (extract, transform, load) is the most common form of Data Integration in practice, but other techniques including replication and virtualization can also help to move the needle in some scenarios.

Challenges of Data Integration

i. Design
The data integration initiative within a company must be an initiative of business, not IT. There should be a champion who understands the data assets of the enterprise and will be able to lead the discussion about the long-term data integration initiative in order to make it consistent, successful and benefitial.

Analysis of the requirements (BRS), i.e. why is the data integration being done, what are the objectives and deliverables. From what systems will the data be sourced? Is all the data available to fulfill the requirements? What are the business rules? What is the support model and SLA?

Analysis of the source systems, i.e. what are the options of extracting the data from the systems (update notification, incremental extracts, full extracts), what is the required/available frequency of the extracts? What is the quality of the data? Are the required data fields populated properly and consistently? Is the documentation available? What are the data volumes being processed? Who is the system owner?

Any other non-functional requirements such as data processing window, system response time, estimated number of (concurrent) users, data security policy, backup policy.

What is the support model for the new system? What are the SLA requirements?

And last but not least, who will be the owner of the system and what is the funding of the maintenance and upgrade expenses?

The results of the above steps need to be documented in form of SRS document, confirmed and signed-off by all parties which will be participating in the data integration project.

ii. Implementation
Based on the BRS and SRS, a feasibility study should be performed to select the tools to implement the data integration system. Small companies and enterprises which are starting with data warehousing are faced with making a decision about the set of tools they will need to implement the solution. The larger enterprise or the enterprises which already have started other projects of data integration are in an easier position as they already have experience and can extend the existing system and exploit the existing knowledge to implement the system more effectively. There are cases, however, when using a new, better suited platform or technology makes a system more effective compared to staying with existing company standards. For example, finding a more suitable tool which provides better scaling for future growth/expansion, a solution that lowers the implementation/support cost, lowering the license costs, migrating the system to a new/modern platform, etc.

iii. Testing
Along with the implementation, the proper testing is a must to ensure that the unified data are correct, complete and up-to-date.
Both technical IT and business needs to participate in the testing to ensure that the results are as expected/required. Therefore, the testing should incorporate at least Performance Stress test (PST), Technical Acceptance Testing (TAT) and User Acceptance Testing (UAT) PST, TAT (Technical Acceptance Testing), UAT (User Acceptance Testing).

Data Migration

Data Migration is a process where data is transferred between storage types, formats, data architectures and enterprise systems. Whereas Data Integration involves collecting data from sources outside of an organization for analysis, migration refers to the movement of data already stored internally to different systems. Companies will typically migrate data when implementing a new system or merging to a new environment. Migration techniques are often performed by a set of programs or automated scripts that automatically transfer data. There are three main options to accomplish data migration:

1. Merge the systems from the two companies into a brand new one

2. Migrate one of the systems to the other one.

3. Leave the systems as they are but create a common view on top of them - a data warehouse.

Data Migration Challenges

i. Storage Migration
Storage migration can be handled in a manner transparent to the application so long as the application uses only general interfaces to access the data. In most systems this is not an issue. However, careful attention is necessary for old applications running on proprietary systems. In many cases, the source code of the application is not available and the application vendor may not be in market anymore. In such cases storage migration is rather tricky and should be properly tested before releasing the solution into production.

ii. Database Migration
Database migration is rather straight forward, assuming the database is used just as storage. It "only" requires moving the data from one database to another. However, even this may be a difficult task. The main issues one may encounter include:

Unmatched data types (number, date, sub-records)

Different character sets (encoding)

Different data types can be handled easily by approximating the closest type from the target database to maintain data integrity. If a source database supports complex data formats (e.g. sub-record), but the target database does not, amending the applications using the database is necessary. Similarly, if the source database supports different encoding in each column for a particular table but the target database does not, the applications using the database need to be thoroughly reviewed.

When a database is used not just as data storage, but also to represent business logic in the form of stored procedures and triggers, close attention must be paid when performing a feasibility study of the migration to target database. Again, if the target database does not support some of the features, changes may need to be implemented by applications or by middleware software.

ETL tools are very well suited for the task of migrating data from one database to another i. Using the ETL tools is highly advisable particularly when moving the data between the data stores which do not have any direct connection or interface implemented.

iii. Application Migration
If we take a step back to previous two cases, you may notice that the process is rather straight forward. This however, is extremely uncommon in the case of application migration. The reason is that the applications, even when designed by the same vendor, store data in significantly different formats and structures which make simple data transfer impossible. The full ETL process is a must as the Transformation step is not always straight forward. Of course, application migration can and usually does include storage and database migration as well. The advantage of an ETL tool in this instance is its ready-to-use connectivity to disparate data sources/targets.

Difficulty may occur when migrating data from mainframe systems or applications using proprietary data storage. Mainframe systems use record based formats to store data. Record based formats are easy to handle; however, there are often optimizations included in the mainframe data storage format which complicate data migration. Typical optimizations include binary coded decimal number storage, non-standard storing of positive/negative number values, or storing the mutually exclusive sub-records within a record. Let us consider a library data warehouse as an example. There are two types of publications - books and articles. The publication can be either a book or an article but not both. There are different kinds of information stored for books and articles. The information stored for a book and an article are mutually exclusive. Hence, when storing a book, the data used has a different sub-record format for a book and an article while occupying the same space. Still the data is stored using rather standard encoding. On the Contrary, proprietary data storage makes the Extract step even more complicated. In both cases, the most efficient way to extract data from the source system is performing the extraction in the source system itself; then converting the data into a printable format which can be parsed later using standard tools.

iv. Character Encoding
Most of the systems developed on PC-based platform use ASCII encoding or national extension based on ASCII. The latest one is UTF-8 which keeps ASCII mapping for alpha and numerical characters but enables storage of characters for most of the national alphabets including Chinese, Japanese and Russian. Mainframe systems are mostly based on EBCDIC encoding which is incompatible with ASCII and conversion is required to display the data. ETL tools should support the conversions between character sets, including EBCDIC.


Contributor: Asyira Suriani Mustapa

1 Comments

  1. No code & Low code app development platforms can help startups& large organizations seamlessly develop sophisticated Data Management Solution .

    ReplyDelete
Previous Post Next Post