Waveband Selection in NIR Spectra using Enhanced Genetic Operators
Sprache des Titels:
Proceedings of the Chemometrics in Analytical Chemistry (CAC) 2012 Conference
Nowadays the techniques employed in data acquisition provide huge amounts of data. For instance, near infra-red (NIR) spectral data consist of thousands of wavelengths (variables). Some parts of the information are related to the others, making desirable a way to reduce the number of variables, i.e. dimensionality reduction, loosing as less information as possible, in order to decrease computational times and curse of dimensionality when applying any data mining techniques, e.g. for classification or regression purposes.
Genetic algorithms (GAs) offer the possibility of selecting which variables contain the most relevant information in order to represent all the original ones. The traditional genetic operators  seem to be too general, leading to results which could be improved by means of designed operators that take advantage of the available problem specific information. Especially, when dealing with calibration by means of NIR spectral data, it is known that not isolated wavelengths but wavebands allow a more robust model design . This aspect should be taken into account when crossing individuals.
We propose two crossover operators specifically designed for calibration with NIR spectral data, based on a pseudo-random 2-points crossover where the first point is randomly chosen and the selection of the second point is guided by problem specific information, and we compare their performance against state of the art operators. The chosen fitness function is partial least squares regression (PLSR), because it is fast and widely used in chemometrics . Our benchmark consists of two real world high dimensional data sets, corresponding to polyetheracrylat (PEA), where hydroxyl number, viscosity and acidity are on-line monitored, and melamine resin production, where the chilling point is considered in order to regulate the condensation.