12.1 Overview

Data mining is the application of mathematical or statistical processes for the purpose of extracting hidden knowledge from large data sets. This knowledge is subsequently used for various purposes, including actionable business intelligence and biotechnology research.

Data mining techniques provide descriptive information that is manifest in inherent patterns or relations between the data. This can be achieved, for example, with algorithms for clustering or association rules detection (link analysis).

They also uncover correlations, often due to causal relationships, between the data and a specific target property. This information is used to make predictions about unknown data or future behavior. Techniques generating these models are known as supervised learning algorithms, and include classification and approximation algorithms.

Whereas most analysis tools support the retrospective analysis of data sets by verifying a user’s hypotheses, data mining attempts to discover trends and behaviors without the need for guessing about possible relationships.

Data mining tools are particularly effective in the data warehouse environment, because data warehouses offer large quantities of cleansed business data for consumption by data mining tools. Also, the advanced query and analytical capabilities available in most data warehouses (e.g., relational databases, OLAP servers, and information visualization tools) can be used to great advantage by data mining tools in their formulation of models, and in the evaluation of those models by human users.