Previous | Table of Contents | Next |
A key aspect of data warehousing is to extract, transform, and load data from operational resources to a data warehouse or
data mart for analysis. Extraction, transformation, and loading can all be characterized as transformations. In fact, whenever
data needs to be converted from one form to another in data warehousing, whether for storage, retrieval, or presentation purposes,
transformations are involved. Transformation, therefore, is central to data warehousing.
The Transformation package contains classes and associations that represent common transformation metadata used in data warehousing.
It covers basic transformations among all types of data sources and targets: object-oriented, relational, record, multidimensional,
XML, OLAP, and data mining.
The Transformation package is designed to enable interchange of common metadata about transformation tools and activities.
Specifically it is designed to:
• Relate a transformation with its data sources and targets. These data sources and targets can be of any type (e.g., object-oriented, relational) or granularity (e.g., class, attribute, table, column). They can be persistent (e.g., stored in a relational database) or transient.
• Accommodate both “black box? and “white box? transformations. In the case of “black box? transformations, data sources and targets are related to a transformation and to each other at a coarse-grain level. We know the data sources and targets are related through the transformation, but we don’t know how a specific piece of a data source is related to a specific piece of a data target. In the case of “white box? transformations, however, data sources and targets are related to a transformation and to each other at a fine-grain level. We know exactly how a specific piece of a data source is related to a specific piece of a data target through a specific part of the transformation.
• Allow grouping of transformations into logical units. At the functional level, a logical unit defines a single unit of work, within which all transformations must be executed and completed together. At the execution level, logical units can be used to define the execution grouping and sequencing (either explicitly through precedence constraints or implicitly through data dependencies). A key consideration here is that both parallel and sequential executions (or a combination of both) can be accommodated.
The Transformation package assumes the existence of the following packages that represent types of potential data sources
or targets: ObjectModel (object-oriented), Relational, Record, Multidimensional, XML, OLAP, and Data Mining. The Transformation
package is an integral part of the following packages: OLAP, Data Mining, Warehouse Process, and Warehouse Operation. In particular,
the Transformation and Warehouse Process packages together provide metamodel constructs that facilitate scheduling and execution
in data warehousing, and the Transformation and Warehouse Operation packages together provide metamodel constructs that enable
data lineage in data warehousing.