Introduction to Genetic Algorithms

In this section, we look into discovering certain features and factors that are involved in large database. To exploit this data, data mining tools are required and a 2-phase approach using a specific genetic algorithm is employed.
This heuristic approach has been chosen as the number of features to consider is large. Consider a data which indicates for pairs of affected individuals of a same family their similarity at given points (locus) of their chromosomes. This is represented in a matrix where each locus is represented by a column and each pairs of individuals considered by a row. The objective is first to isolate the most relevant associations of features, and then to class individuals that have the considered similarities according to these associations.
For the first phase, the feature selection problem, we use a genetic algorithm (GA). To deal with this very specific problem, some advanced mechanisms have been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator has been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm K-means, which is very popular in clustering.
The first phase of this algorithm deals with isolating the very few relevant features from the large set. This is not exactly the classical feature selection problem known in Data mining. Here, we have...