10.7: Data Mining

10.7 Data Mining

10.7.1 A Genetic Algorithm for Feature Selection in Data-Mining

In this section, we look into discovering certain features and factors that are involved in large database. To exploit this data, data mining tools are required and a 2-phase approach using a specific genetic algorithm is employed.

This heuristic approach has been chosen as the number of features to consider is large. Consider a data which indicates for pairs of affected individuals of a same family their similarity at given points (locus) of their chromosomes. This is represented in a matrix where each locus is represented by a column and each pairs of individuals considered by a row. The objective is first to isolate the most relevant associations of features, and then to class individuals that have the considered similarities according to these associations.

For the first phase, the feature selection problem, we use a genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms have been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator has been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm K-means, which is very popular in clustering.

10.7.1.1 GA for Feature Selection

The first phase of this algorithm deals with isolating the very few relevant features from the large set. This is not exactly the classical feature selection problem known in Data mining. Here, we have...

< Previous Excerpt Next Excerpt >

Purchase This Book

Introduction to Genetic Algorithms

TABLE OF CONTENTS

10.7: Data Mining

10.7 Data Mining

10.7.1 A Genetic Algorithm for Feature Selection in Data-Mining

10.7.1.1 GA for Feature Selection

Contact Preferences

This is embarrasing...

Customize Your GlobalSpec Experience

Select Your Free Newsletters

Industry Newsletters

Select Your Free Product Alerts

This is embarrasing...