MSAS - Introduction to Data Mining
The process of probing into a set of information for descriptive and predictive purposes is called data mining. The purpose is to identify those trends and patterns which indicate the direction of effort to achieve desired outcomes. SQL Server 2000 and Analysis Services, has inbuilt powerful data mining capabilities including algorithms for Clustering and for Decision Trees.
Before actually studying the data mining capabilities of Analysis Services, let us briefly look at some terminology generally used while discussing data mining
Clustering breaks down large chunks of data into more manageable groups by identifying similar traits. The clusters provide description of the attributes of the members in each cluster. It is often the first technique that is used in a project and the data is used as a source of future mining efforts as it highlights promising areas to investigate. Microsoft SQL 2000 Analysis Server uses a Scaleable Expectation Maximization algorithm to create clusters based on population density. The advantage of this process is that it requires only a single pass over the entire data and the algorithm creates clusters as it passes and the centers of these clusters are adjusted as more data is processed. It provides reasonable results at any point during its computation. Moreover, it works with a minimum amount of memory.
More on http://www.exforsys.com/tutorials/msas/introduction-to-data-mining.html
Before actually studying the data mining capabilities of Analysis Services, let us briefly look at some terminology generally used while discussing data mining
Understanding Terms used in Data Mining
A case is the term used for the facts being studied. The data used to study these facts are called case sets. Each data mining case has a unique identifier called a key. Descriptive pieces of information are called attributes or measures. The case may contain information about a single table or from multiple tables. If there are multiple tables from which data is derived, such a case is defined as a case with nested tables. The hierarchical attributes of a case that can be conveniently grouped are called dimensions of the case.Clustering breaks down large chunks of data into more manageable groups by identifying similar traits. The clusters provide description of the attributes of the members in each cluster. It is often the first technique that is used in a project and the data is used as a source of future mining efforts as it highlights promising areas to investigate. Microsoft SQL 2000 Analysis Server uses a Scaleable Expectation Maximization algorithm to create clusters based on population density. The advantage of this process is that it requires only a single pass over the entire data and the algorithm creates clusters as it passes and the centers of these clusters are adjusted as more data is processed. It provides reasonable results at any point during its computation. Moreover, it works with a minimum amount of memory.
More on http://www.exforsys.com/tutorials/msas/introduction-to-data-mining.html
Comments
Post a Comment