Proceedings Article | 13 October 2008
Xiang Li, Guangjun Zhang, Yan Yuan, Qingbo Li, Jinguang Wu
KEYWORDS: Tissues, Genetic algorithms, Principal component analysis, Cancer, FT-IR spectroscopy, Proteins, Data modeling, Absorbance, Detection and tracking algorithms, Spectroscopy
In this paper, an application of genetic algorithm (GA) which makes the spectra of malignant tissue and that of normal
tissue cluster respectively is investigated. Cluster analysis is a typical optimization problem of permutation and
combination. The results of traditional algorithms closely depend on whether the parameters are rightly set. Besides, the
physical understanding of sample spectra which has not been clearly known is usually needed to obtain a better result.
The high dimension of the spectral data also adds difficulty in the analysis. Thus, it is almost impossible to set every
parameter properly. Furthermore, since the variables and object functions are always discrete, there are a mass of local
extremums. Conventional methods have no good strategy to deal with these inferior solutions. Therefore, the final cluster
result is greatly influenced by the initial cluster centers and the order how the samples are input. Genetic algorithm is
established based on the theory of nature selection and evolution. For GA, the understanding of the physical meaning is
not necessary. Meanwhile, GA performs in a considerable high efficiency way. In the experiment, the sum of the inter-cluster
distances is regarded as the object function. After smoothing, standard normal variate (SNV) processing, and
outlier detection on sample spectra, Principal component analysis (PCA) is processed. Then selection, mutation and
crossover are carried out on chromosomes whose ith bit value indicates which class sample i belongs to. Once the GA
clustering is finished, tissue samples could be easily discriminated based on the characteristic absorbance peaks of
protein, fat, nucleic acid and water. In this paper, three kinds of clustering algorithms are processed, and it shows that
comparing to the conventional method, GA obtains a better result.