Introduction to Genetic Algorithms

10.7: Data Mining

10.7 Data Mining

10.7.1 A Genetic Algorithm for Feature Selection in Data-Mining

In this section, we look into discovering certain features and factors that are involved in large database. To exploit this data, data mining tools are required and a 2-phase approach using a specific genetic algorithm is employed.

This heuristic approach has been chosen as the number of features to consider is large. Consider a data which indicates for pairs of affected individuals of a same family their similarity at given points (locus) of their chromosomes. This is represented in a matrix where each locus is represented by a column and each pairs of individuals considered by a row. The objective is first to isolate the most relevant associations of features, and then to class individuals that have the considered similarities according to these associations.

For the first phase, the feature selection problem, we use a genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms have been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator has been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm K-means, which is very popular in clustering.

10.7.1.1 GA for Feature Selection

The first phase of this algorithm deals with isolating the very few relevant features from the large set. This is not exactly the classical feature selection problem known in Data mining. Here, we have...

UNLIMITED FREE
ACCESS
TO THE WORLD'S BEST IDEAS

SUBMIT
Already a GlobalSpec user? Log in.

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.

Customize Your GlobalSpec Experience

Category: Data Mining Software
Finish!
Privacy Policy

This is embarrasing...

An error occurred while processing the form. Please try again in a few minutes.