Why Use ETL?
ETL stands for extract, transform and load. These are the processes that enable companies to move required data from a multitude of sources, reformat and cleanse it, and then load it into a data warehouse for analysis, or maybe onto another operational system to support another business process.
All Companies are well aware that they have valuable information spread throughout their company networks that needs to be moved to a data warehouse for analysis and if they do not move it all this information will become worthless. The problem is that this data is stored in a variety of systems and formats. And different systems are capable of using different methods of definitions.
So this problem has to be solved. The best solution is to implement extract, transform and load (ETL) software. By definition this software’s capabilities includes reading data from the source, cleaning and formatting it, and then writing it to its target where it can be utilised efficiently.
The source data for the ETL processes can come from almost any source: a mainframe or ERP application, a CRM tool, a flat file or spreadsheet, it matters not.
After extraction, the data is transformed, or modified, so that it can be sent to the target repository.
There are a variety of ways to perform the transformation, and the work involved varies. The data may require reformatting only, but most ETL operations also involve cleaning the data to remove duplicates and enforce consistency. Part of what the software does is examine individual data fields and apply rules to consistently convert the contents to the form required by the data warehouse.
For example, a category might be represented in three different ways in three different systems. The ETL software would recognize that these entries mean the same thing and convert them to the target format.
In addition, the ETL process might involve some form of standardisation and verification of fields such as addresses or dates of birth or even expanding records with additional fields containing demographic information or data from other systems.
The transformation occurs when the data from each source is mapped, cleansed and reconciled so it all can be tied together, with receivables tied to invoices and so on.
After reconciliation, the data is transported and loaded into the data warehouse for analysis of things such as cycle times and total outstanding receivables.
In the past, companies that were doing data warehousing projects often used in-house code to support ETL processes,but as the sources and data evolved, the ETL code had to be modified and maintained. And companies encountered problems and scalability became a serious issue with in-house ETL software.
Providers of packaged ETL systems and third-party vendors that offer bolt-on tools emerged to replace the in-house systems..
About the Author: Mike has more than 15 years ox experience designing and implementing Data warehouses based on Oracle, MS SQL Server, MySql, PostgreSQL and more he is currently working for DB Software Laboratory