Previous | Table of Contents | Next |
The CWM Data Mining metamodel consists of seven conceptual areas: A core Mining metamodel (upon which the other areas depend),
and metamodels representing the data mining subdomains of Clustering, Association Rules, Supervised, Classification, Approximation,
and Attribute Importance. Each area is represented by the metamodel packages shown in the diagram below.
DataMining (from Analysis)
<<metamodel>> <<metamodel>> <<metamodel>> AssociationRules
Figure 12-1 CWM Data Mining Metamodel
Collectively, the collection of Data Mining packages provide the necessary abstractions to model generic representations of
data mining models (i.e., mathematical models produced or generated by the execution of data mining algorithms).
Included are representations of data mining tasks and models, as well as other entities (such as category matrix) that are
common across most data mining applications and tools, as well as their relationships to each other and their mappings to
technical metadata.
The Mining Core package consists of common Data Mining abstractions that are fundamental to, and reused by, the major conceptual
areas. In particular, this package contains several basic packages that are required to implement the CWM Data Mining interfaces.
It is required that at least this package and one more Data Mining package be implemented for compliance. The packages forming
the Mining Core are shown in the next diagram.
<<metamodel>> MiningCore (from DataMining)
Figure 12-2 CWM Data Mining Metamodel: Mining Core Package
The following subsections describe the content of each component package of the MiningCore. This is subsequently followed
by subsections describing each of the major conceptual area packages.
12.2.2.1 Mining Function Settings
algorithmSettings MiningAlgorithmSettings
settings
MiningFunctionSettings settings attributeUsageSet AttributeUsageSet
(from MiningData)
settings logicalData
Logical Data
(fr om MiningData)
Figure 12-3 CWM Data Mining Metamodel: Mining Function Settings
This package defines the objects that contain parameters specific to mining functions. The separation of mining functions
from mining algorithms enables the user to specify the type of the desired result without being concerned with a particular
algorithm. The Mining Function Settings metamodel is illustrated above.
Mining FunctionSettings (MFS) is the superclass of all other function settings classes. An MFS instance references a set of
MiningAttributes, aggregated by a LogicalData instance. The AttributeUsage set defines how each of the MiningAttributes will
be used by the Mining Algorithm.
12.2.2.2 Mining Model
+modelLocation
Class
(from Core) +model
MiningModel
+model +modelSignature
+/owner
+model
+model
MiningAttribute (fromMiningData)
+settings+keyAttribute +/feature
Attribute MiningFunctionSettings
SignatureAttribute
(from Core) (fromMiningFunctionSettings)
Figure 12-4 CWM Data Mining Metamodel: Mining Model
This package defines the basic Mining Model from which all model objects inherit as the result of a mining build task. The
Mining Model metamodel is illustrated above.
Each MiningModel has a signature that defines the characteristics of the data required by the model.
12.2.2.3 Mining Result
ModelElement
MiningResult
Figure 12-5 CWM Data Mining Metamodel: Mining Result
This package defines the basic MiningResult object from which all result objects inherit as the result of a specific mining
task (other than build).
12.2.2.4 Mining Data
This package defines the objects that describe the input data, the way the input data is to be treated, and the mapping between
the input data and internal representation for which mining algorithms can understand.
PhysicalData effectively references and instance of a class or subclass (e.g., Table, file, etc.). This allows JDM to leverage
the various row/column format data representation expressible in CWM.
Mining Data metaclasses representing the concepts of physical data are illustrated in
Figure 12-6. Logical data metaclasses are illustrated in Figure 12-7.
Attribute assignment and attribute usage metaclasses are illustrated in two subsequent diagrams (
Figure 12-8 and Figure 12-9, respectively).
Finally, metaclasses used to model the matrix representation and taxonomy of mining
data are presented in Figure 12-10, Category Matrix, and Figure 12-11, Category
Taxonomy, respectively.
ModelElement (from Core)
Figure 12-6 CWM Data Mining Metamodel: Physical Data
Figure 12-6
illustrates those elements of the Mining Data metamodel used to model physical data, whereas the following diagram shows those
elements facilitating the logical modeling of data.
Class (from Core)
Attribute (from Core)
MiningAttribute
LogicalData
/owner
LogicalAttribute
/featurelogicalAttribute
logicalAttribute
numericalProperties
NumericalAttributeProperties
CategoricalAttributeProperties categoricalProperties categoricalProperties
categoricalProperties
{ordered}
taxonomy
OrdinalAttributeProperties CategoryTaxonomy
category
Figure 12-7 CWM Data Mining Metamodel: Logical Data
Figure 12-7
contains objects that represent how physical data should be interpreted, logically by the mining algorithm.
A LogicalAttribute can be categorical, numerical, or both, depending on its usage. Categorical attributes that have ordered
category values are created as ordinal attributes.
12-8 Common Warehouse Metamodel, v1.1 March 2003
AttributeAssignmentSet
set
MiningAttribute
attrAssi gnment AttributeAssignment
logicalAttribute assignment
attrAssignmnet orderIdAttribute Attribute{ordered} (from Core)
Pi votAttributeAssi gnment DirectAttributeAssignment
directAttrAssignment
pivotAttrAssignmnet
pivotAttrAssignment
pivotAttrAssignment
attri bute
Attribute (from Core)
nameAttribute
Attribute
(from Core)
SetAttri buteAssignment
setAttrAssignment setAttrAssignment
setIdAttribute
Attribute (from Core) memberAttribute
valueAttribute ReversePivotAttributeAssignment
setIdAttribute
reversePivotAttrAssignment
0.
0.
Attribute selectorAttribute
(from Core)
Figure 12-8 CWM Data Mining Metamodel: Attribute Assignment
•
Figure 12-8
illustrates metaclasses that enable mapping physical data attributes to logical data mining attributes. The following attribute
assignments are supported:
• Direct assignment: A direct mapping between a mining attribute and a physical attribute.
• Pivot assignment: A mapping where the input data is in transactional format; each of the logical attributes occurring in a pivoted table is mapped to the three physical columns, presumably the same ones every time.
• Reverse pivot assignment: A mapping where the input data is in 2D format; the transformed input data contains set valued attributes; the sets are represented by enumerating the set elements based on the selection function.
• Set assignment: A mapping between a set valued mining attribute and a set of attributes in the physical data.
March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel
Class(from Core)
AttributeUsageSet
Feature (from Core) /owner
/feature
attribute
usage
Figure 12-9 CWM Data Mining Metamodel: Attribute Usage
Figure 12-9
illustrates metaclasses that enable specification of how a mining attribute should be used, interpreted, or preprocessed (e.g.,
mining value or outlier/invalid value treatment).
CategoryMatrix
categoryMatrix category
CategoryMatrixObject
categoryMatrix
matrixTaable
CategoryMatrixTable source Class
(from Core)
matrixTable
entry
CategoryMatrixEntry matrixTable matrixTable
categoryEntrycategoryEntry
rowIndex
col umnIndex columnAttribute
Attribute
rowAttribute
(from Core) valueAttribute
Figure 12-10 CWM Data Mining Metamodel: Category Matrix
Figure 12-10
illustrates the metaclasses that generalize a complex object used to represent a cost matrix (a model build input) or a confusion
matrix (a model test result). Two representations are supported:
• Java objects (CategoryMatrixObject)
• Table based (CategoryMatrixTable)
March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel
CategoryTaxonomy taxonomy CategoryMapcategoryMap
taxonomy
CategoryMapObject
CategoryMapTable mapTable table Class (from Core)
mapTable mapTable
mapObject
mapTable
entry
CategoryMapObjectEntry
entry
entry
parent
child
childAttribute
Category
Attribute
parentAttribute
(from Core) graphIdAttributerootCategory
Figure 12-11 CWM Data Mining Metamodel: Category Taxonomy
Figure 12-11
also illustrates the metaclasses that enable representing a taxonomy as a directed acyclic graph (DAG). Two representations
are supported::
• Java Object (CategoryMapObject)
• table-bound (CategoryMapTable).
Mining Task
This package defines the objects that are related to mining tasks. A MiningTask object represents a specific mining operation
to be performed on a given data set (i.e., physical data).
Figure 12-12
illustrates the basic Mining Task metamodel.
Transformation (from Transformation)
MiningTransformation ModelElement(fromCore)
transformation
procedure
MiningTask
miningTaskMini ngModel inputModel miningTask
(fromMiningModel)
miningTask
inputData
PhysicalDatamodelAssignment
(from MiningData) AttributeAssignmentSet (from MiningData)
Figure 12-12 CWM Data Mining Metamodel: Mining Task
Figure 12-12
illustrates Mining Task as referenced by a Mining Transformation. A Mining Task maps physical data to a model signature (when
applicable; for example, lift, test, etc.) using the Attribute Assignment set.
Min ingTask
MiningBuildTask buildTask
validationData
buildTask
buildTask buildTaskbuildTask
(from MiningData)
validationAssignmentresultModel miningSettings
settingsAssignment
Min ingModel MiningFunctionSettings
AttributeAssignmentSet
(from MiningModel) (from MiningFunctionSettings)
(from MiningData)
Figure 12-13 CWM Data Mining Metamodel: Mining Build Task
Model elements comprising the Mining Build Task are shown
in Figure 12-13
. The modeling of the application of output and the computation of the result of an application of a data mining model to
(new) data are illustrated in
Figure 12-14
and
Figure 12-15
, respectively.
MiningApplyOutput
applyOutput
MiningAttribute {ordered}
(from MiningData)
item
ApplyOutputItem
ApplySourceItem
ApplyContentItem
ApplyProbabilityItem ApplyScoreItem ApplyRuleIdItem
Figure 12-14 CWM Data Mining Metamodel: Apply Output
Figure 12-14
illustrates metaclasses that enable defining the content of an Apply task. This includes source items; for example, keys,
or specific content of apply (data scoring using a model).
An apply output may contain multiple source and content items.
MiningTask
MiningApplyOutput AttributeAssignmentSet (from MiningData)
Figure 12-15 CWM Data Mining Metamodel: Mining Apply Task
Figure 12-15
illustrates metaclasses that allow specification of an apply task. The apply task requires a model, physical data, apply output,
and an attribute assignment set.
Entry Point
This package defines the top-level objects of DataMining package which can be used as entry point in application programming.
This is illustrated in
Figure 12-16
.
Package (from Core)
CatalogLogicalData(from MiningData)
result MiningResult
catalog
(from MiningResult)
logicalData
schema
schema
Schema
schema categoryMatrix CategoryMatrix
(from Mi ningData)
schema
schemaschema
auxOobjects
schema
schema
schema AuxiliaryObject
miningModel 0..*
0..*
(from MiningModel)
MiningModel auxiliaryObject
task
miningFunctionSettings
MiningTask
taxonomy
MiningFunctionSettings
CategoryTaxonomy (from MiningFunctionSettings) (from MiningTask) attributeAssignmentSet (from MiningData) AttributeAssignmentSet
(from MiningData)
Figure 12-16 CWM Data Mining Metamodel: Entry Point
Clustering
This package contains the metamodel that represents clustering functions, models, and settings. The Clustering metamodel is
illustrated in
Figure 12-17
. It contains attribute usage and function settings, subclasses that are specific to the Clustering function.
March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel
AttributeUsage (from MiningData)
ClusteringAttributeUsage attributeComparisonFunction : AttributeComparisonFunction similarityScale : Double / comparisonMatrix
: CategoryMatrix
attributeUsage
comparisonMatrix
CategoryMatrix (from MiningData)
MiningFunctionSettings(from MiningFunctionSettings)
ClusteringFunctionSettings maxNumberOfClusters :Integer minClusterSize : Integer = 1 aggregationFunction : AggregationFunction
Figure 12-17 CWM Data Mining Metamodel: Clustering
Association Rules
This package contains the metamodel that represents the constructs for frequent itemset, association rules and sequence algorithms.
The Association Rules metamodel is illustrated in
Figure 12-18
.
MiningFunctionSettings (fromMiningFunctionSettings)
FrequentItemSetFunctionSettings
settings exclusion Category
(from MiningData)
AssociationRulesFunctionSettings SequenceFunctionSettings
Figure 12-18 CWM Data Mining Metamodel: Association Rules
12.2.2.5 Supervised
This package contains the metamodel that represents the constructs for supervised learning algorithms. The Approximation,
Attribute Importance, and Classification packages must implement this package.
Figure 12-19
illustrates the Supervised metamodel. It contains test and lift tasks, test and lift results, and a common superclass for
supervised function settings.
MiningTask (from MiningTask)
MiningResult(from MiningResult)
MiningTestTask MiningTestResult
testResult
testTask
liftAnalysis
positiveTargetCategory
LiftAnalysis Category positiveTargetCategory liftAnalysis (from MiningData)
MiningFunctionSettings(from MiningFunctionSettings)
liftAnalysis
point
LiftAnalysisPoint
SupervisedFunctionSettings
Figure 12-19 CWM Data Mining Metamodel: Supervised
Classification
This package contains the metamodel that represents classification function, models, and settings.
.
SupervisedFunctionSettings (from Supervised)
ClassificationFunctionSettings
(from MiningData)
Figure 12-20 CWM Data Mining Metamodel: Classification Function Settings
Figure 12-20
represents the model for Function Settings, while
Figure 12-21
illustrates those model elements used to represent Attribute Usage that can include prior probability specification.
Figure 12-22
shows that portion of the Classification metamodel modeling Classification Test tasks, results, and apply output.
AttributeUsage (from MiningData)
ClassificationAttributeUsage
usage
usage
priors
PriorProbabilities
positiveCategory 1..*1..*
Category(from MiningData)
priors
targetValue
prior
priorsEntry PriorProbabilitiesEntry
Figure 12-21 CWM Data Mining Metamodel: Classification Attribute Usage
MiningTestTask (from Supervised)
MiningTestResult (from Supervised)
ClassificationTestTask testTask testResult
ClassificationTestResult
testResult
confusionMatrix
ApplyOutputItem (from MiningTask)
CategoryMatrix (from MiningData)
ApplyTargetValueItem
(from MiningData)
Figure 12-22 CWM Data Mining Metamodel: Classification Test and Result
Approximation
This package contains the metamodel that represents the constructs for approximation modeling (also known as regression).
The metamodel is shown in
Figure 12-23
.
March 2003 OMG-CWM, v1.1: Organization of the Data Mining Metamodel
MiningTestTask
MiningTestResult (from Supervised)
SupervisedFunctionSettings (from Supervised)
ApproximationFunctionSettings
Figure 12-23 CWM Data Mining Metamodel: Approximation
Attribute Importance
This package contains the metamodel that represents the constructs for attribute importance (also known as feature selection)
model. This metamodel is illustrated in
Figure 12-24
.
SupervisedFunctionSettings (from Supervised)
AttributeImportanceSettings
Figure 12-24 CWM Data Mining Metamodel: Attribute Importance