You transfer it and it only comes to light when all the workarounds the business created to screen out this data are removed and the business becomes exposed to the problems that were previously fixed and now appear. Those too can probably be uncovered using DQ tools but you will certainly need to go back to the business because where there are different views or models of the data then business knowledge is critical as to deciding how the gaps should be managed. This is where your ETL tool will come into its own resolving data omissions that were never present in the legacy system and need to be present in the target.
Things like this can also be managed by modern DQ tools either directly within the ETL tool or within the overall process. Tools can detect these semantic issues but cannot always fix them. There are some data gaps that will only become apparent by including the business in the discovery process.
So you will need to have substantial business involvement in the process.
That is why I designed the Data Quality Rules process to integrate significant involvement from the business community. Dylan Jones: One of the things we see a lot from members is the question of priority, what are your thoughts on this? John Morris: One of our client employees raised an important point the other day.
They are a business domain expert and posed the question:. My response to that is precisely because everyone is under pressure this is why these people need to be involved.
We need to know how to plan and budget so that needs to be done in concert with our business colleagues. One is the scorecard approach where the high priority jobs rise to the top. By bringing the business people into the prioritisation process we will prioritise the activities that are most significant to the business. John Morris: Sure, the thing about DQR is that it is both a mechanism for managing DQ issues and fallouts that need to be addressed before the migration delivery and its also a way of bringing the business intelligence from the business for how to address prioritisation and resolve data gaps etc.
The way we do it is to set up a cross-matrix team that will handle all incoming DQ issues and this is typically chaired by a data migration analyst. It should have on it business domain experts, technical experts and other key data stakeholders such as regulatory etc. The key people are the business and technical experts.
If you start analysing data before the target systems are even defined you first look at the reality gaps, then the legacy data gaps. So by the time we introduce the target environment we already have a firm understanding of what our DQ issues are without having to wait for the target structures are. This is wrong because the big data items that drive the business rarely change substantially, logically both environments will require similar data so we can get a head start using DQR, reality gaps, legacy analysis etc.
This is especially the case if you take a business-led approach because they will tell you what data is important to the current and future business process. The method statement also defines exactly what metrics and activities will be required to help us satisfy the closure condition.
Practical Data Migration
At the next level of management we can now provide accurate assessments of where we are. Prioritisation is really important and we need key data stakeholders to be involved on the DQ board. What accounts have permissions and what queries are they running? In the new platform do these accounts need the same access or can you eliminate some access and better understand your access and security controls? Are there some jobs or other systems that only connect to the data on a monthly or quarterly timeframe? After completing these three tasks you are ready to move on to step 2. Now that you have collected and organized your metadata to determine what data are candidates to be migrated.
Work with the SMEs to document what are critical data elements and tables to the business. Perhaps some of the information that was listed as required has data quality issues that are unknown to the business users. How much of the data has null values or a high number of distinct values? One columns that are numeric.
Practical Data Migration
What are the minimum and maximum values? Do those values make sense for the data that you are reviewing? What are the most common values that occur in a column? Are there code tables that have values that are not used in the application?
- Dymocks - (ebook) Practical Data Migration, eBook ().
- Jewels for the Kingdom (Hearts Haven Book 1)?
- with John Morris.
These are common questions that should be asked when looking at the data to migrate. To help your focus, look closer at any columns that are critical to business functions and perform closer analysis on them. Fields that are required but not critical should have at least the same basic analysis performed so you have a common baseline. Upon completing the analysis on your data you can move to step 3. After performing the data analysis again work with the SMEs to determine what data to clean and to what level.
For instance, address correction can be done using lookup information from the post office. Knowing that you only have 1 customer at an address and not 3 instances of the same customer at the same address is not as easy. How to determine what data to clean and to what extent will vary by the type of system you are migrating and what you are leveraging the data for.
If you are. However, if you are implementing a new manufacturing system that only ships to an address then your data cleansing needs are different.
What is Kobo Super Points?
This step can take a lot of effort and time. Leverage Data Quality tools to speed up the process of correcting and cleaning your data. Once you are comfortable with your data quality you can then move to step 4. There are a variety of ways to move data from legacy sources to new platforms. During this phase you should layout where your data is being moved and in what methods. If you are doing an application migration, you may need to pull data from legacy and store it in new tables that the application can load via its API calls or data load techniques.
If you are moving from one database technology to another you can leverage ETL tools or SQL scripts to pull the data from the source system and load into the new system. Regardless of the method you use to move your data.
Include in the process, steps that create record metadata for each table that you are migrating. Have the system push this information to a central audit table that stores the table name, record count, beginning date for the migration, end date for the migration.
Practical Data Migration - CERN Document Server
Having these fields will enable you to know exactly how many records were pulled for the table and create an audit log that you can leverage in step 5. In the data validation step you are ensuring the records that you wanted to migrate have been moved successfully.
This would include record counts and data quality enhancements. To check data quality review the data in the new system to see that any corrections to the data during the data quality step was executed as expected.
(ebook) Practical Data Migration
One method to make this process easier is to leverage the information and code you used in the Data Analysis step. Data validation will require many of the same types of queries such as distinct counts, minimum and maximum values that you performed on the legacy system. Also, if the migration was into an application. You need to have unit or functional testing performed to ensure the information was loaded correctly for the system to work correctly.
Within an organization the data and applications will change as the organization matures. In order to keep up with these changes IT needs to have a process to handle the migration of data from legacy systems to newer applications.