Previous | Table of Contents | Next |
Extract-Transform-Load (ETL) is a common term for the warehouse load process comprising a set of data movement operations,
each from a data source to a data target with some transforming or restructuring logic applied.
The ETL Scenario starts by defining a CWM Transformation model for movement from a data source to a data target. Parameters
of the source data, target data, and transformation logic are assigned values in the model. Source data parameters depend
on the type of the data source (object-oriented, relational, record-oriented, multidimensional, or XML). Target data parameters
are similarly chosen. Transformation logic parameters include identification of a transformation component and of data sources
and data targets. The transformation component is a method composed of a possibly large hierarchy of components (commercial
tools, commercial libraries, custom scripts) whose detailed structure is defined elsewhere.
An ETL process is realized by a number of components across several CWM packages. A CWM warehouse process may launch an ETL
process as a scheduled operation consisting of a number of transformation steps executed in sequence.
For example, the first transformation consists of the extraction and filtering of data from any of a number of possible data
sources. A second transformation cleanses, combines, or otherwise reduces the data and then stores it in a normalized format
in some primary relational database of the warehouse. A third transformation selects certain rows from the primary relational
database and loads their values into the input cells of a multidimensional database. Finally, the CWM warehouse process might
instruct the multidimensional database to re-calculate its aggregated cells based on the new input data.