Data Mining Software Information


Data mining software is used to sort large amounts of data and identify or mine relevant information. Applications use advanced search capabilities and statistical algorithms to identify patterns and correlations in a large database, data warehouse, or corpus. Typically, data mining software is used to support applications such as police or military surveillance, fraud detection, and customer relationship management.




Data mining techniques include classification, clustering, regression, and association rule learning. Results validation, the final step in data-based knowledge discovery, tests the patterns produced by data mining algorithms against a larger data set. Patterns in the training set that are not also present in the general data set are rejected as examples of overfitting.




There are many different types of data mining software. Examples include applications for classification discovery, cluster analysis, regression analysis, and association rule learning. Classification discovery software divides mined data into two or more groups and then predicts the groups to which new records belongs. Cluster discovery software applies a probabilistic model to a mined data set that contains both labeled and unlabeled data. Regression analysis software models and analyzes multiple variables while focusing upon the relationship between a dependent variable and one or more independent variables. Data mining software with association rule learning software is designed to discover interesting relationships between variables in large databases.


Types of data mining software include text mining software, data visualization software, and discovery visualization software. Text mining uses statistical pattern learning to determine trends in a corpus of text. Typically, this corpus is extracted from different written sources. Data visualization software presents information visually so that data miners can spot pinpoint patterns and anomalies. Discovery visualization software is designed to find a very large number of related rules and patterns. Often, these knowledge products are used at the end of the data mining process. Specialty and proprietary data mining software is also available. Some data mining software supports standards such as predictive model markup language (PMML).




Special features include multiple features, scalable processing, data preparation and summarization, prescriptive and descriptive modeling, and model comparisons. Data mining software that supports scoring processes is also available. Subject-based data mining and spatial data mining also used in games and business, science and engineering, and specialized data warehouse applications.