Survival prediction and optimal treatment choice for cancer patients are dependent on correct disease classification. This classification can be improved significantly when high- throughput data such as microarray expression analysis is employed. These data sets usually suffer from the dimensionality problem: many features and few patients. Consequently, care must be taken when feature selection is performed and classifiers for disease classification are designed. In this paper we investigate several issues associated with this problem, including 1) data representation; 2) the type of classifier employed and 3) classifier construction, with specific emphasis on feature selection approaches. More specifically, 'filter' and 'wrapper' approaches for feature selection are studied. The different representations, selection criteria, classifiers and feature selection approaches are evaluated with regard to the effect on true classification performance. As test cases we employ a Comparative Genomic Hybridization breast cancer data sets and two publicly available gene expression data sets.
Four different algorithms are presented which generate potentially well biding peptides against specific mono- clonal antibodies. The input data for these algorithms comes form random peptide array screening experiments, which results in a binding strength for a few thousand peptides against the antibody. The first algorithm identifies short motifs of amino acids which occur more frequently among the best binding peptides than among the worst binding ones. The second algorithm differs from this algorithm in the sense that it searches for amino acids in the best binding peptides, regardless of the order of these amino acids. The fourth algorithm replaces all amino acids by a hydrophilicity value and starts to search for clusters in the profiles of the best measured results. Results obtained from experimental data show that the algorithms are able to generate peptides which obtain a resemblance with the peptides which are known to bind reasonably well against these antibodies. The information gained from these algorithms is useful for the design of subsequent experiments aimed at further optimization of the best binding peptides found during the peptide screening experiment.
KEYWORDS: Genetics, Data modeling, Visualization, Reverse modeling, Optimization (mathematics), Solids, Reverse engineering, Genetic algorithms, Process modeling, Chemical elements
A major problem associated with the reverse engineering of genetic networks from micro-array data is how to reliably find genetic interactions when faced with a relatively small number of arrays compared to the number of genes. To cope with this dimensionality problem, it is imperative to employ additional (biological) knowledge about genetic networks, such as limited connectivity, redundancy, stability and robustness, to sensibly constrain the modeling process. Recently, we have shown that by applying single criteria, the inference of genetic interactions under realistic conditions can be significantly improved. In this paper, we study the problem of how to combine constraints by formulating it as a multi-criterion optimization problem.
Currently, the need arises for tools capable of unraveling the functionality of genes based on the analysis of microarray measurements. Modeling genetic interactions by means of genetic network models provides a methodology to infer functional relationships between genes. Although a wide variety of different models have been introduced so far, it remains, in general, unclear what the strengths and weaknesses of each of these approaches are and where these models overlap and differ. This paper compares different genetic modeling approaches that attempt to extract the gene regulation matrix from expression data. A taxonomy of continuous genetic network models is proposed and the following important characteristics are suggested and employed to compare the models: inferential power; predictive power; robustness; consistency; stability and computational cost. Where possible, synthetic time series data are employed to investigate some of these properties. The comparison shows that although genetic network modeling might provide valuable information regarding genetic interactions, current models show disappointing results on simple artificial problems. For now, the simplest models are favored because they generalize better, but more complex models will probably prevail once their bias is more thoroughly understood and their variance is better controlled.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.