Pattern Recognition in Industry

A major corporation became concerned about possible gender-related pay bias within their organization. They wanted to rapidly determine whether their concerns were valid and, if so, to what extent (quantitatively) were there salary differences between similarly qualified individuals of different genders.
The company's several acquisitions had resulted in a profusion of disparate job titles and functional roles. While the salary grades within the company were distinct and well defined, employees within each grade had varying levels of education, job titles, responsibility, ages, lengths of service, and years in their current positions, etc. The challenge was to extract reliable gender-related salary patterns from multiple confounding other factors mentioned above.
The first step in discovering underlying trends by mining the corporate salary/demographic data available for a particular calendar year involved identifying common job functionalities associated with the various job titles. For a variety of reasons there were approximately 800 different job titles distributed within the 2000 employee pool for which data were provided. Left in this form the data would have posed considerable modeling difficulties on account of the disproportionately large number of degrees of freedom compared to the total number of data. Text-based clustering analysis performed on the job titles resulted in the 800 or so job titles being organized into just 17 distinct functional categories, thereby making the problem far more amenable to robust modeling.
Several gender-blind models (see Figure 20.1) were then created to correlate salaries with the following demographic...