Previous | Table of Contents | Next |
Data mining is the application of mathematical or statistical processes for the purpose of extracting hidden knowledge from
large data sets. This knowledge is subsequently used for various purposes, including actionable business intelligence and
biotechnology research.
Data mining techniques provide descriptive information that is manifest in inherent patterns or relations between the data.
This can be achieved, for example, with algorithms for clustering or association rules detection (link analysis).
They also uncover correlations, often due to causal relationships, between the data and a specific target property. This information
is used to make predictions about unknown data or future behavior. Techniques generating these models are known as supervised
learning algorithms, and include classification and approximation algorithms.
Whereas most analysis tools support the retrospective analysis of data sets by verifying a user’s hypotheses, data mining
attempts to discover trends and behaviors without the need for guessing about possible relationships.
Data mining tools are particularly effective in the data warehouse environment, because data warehouses offer large quantities
of cleansed business data for consumption by data mining tools. Also, the advanced query and analytical capabilities available
in most data warehouses (e.g., relational databases, OLAP servers, and information visualization tools) can be used to great
advantage by data mining tools in their formulation of models, and in the evaluation of those models by human users.