Open Access
1 July 2004 Noise factor analysis for cDNA microarrays
Yoganand Balagurunathan, Naisyin Wang, Edward R. Dougherty, Danh V. Nguyen, Yidong Chen, Michael L. Bittner, Jeffrey M. Trent, Raymond J. Carroll
Author Affiliations +
Abstract
A microarray-image model is used that takes into account many factors, including spot morphology, signal strength, background fluorescent noise, and shape and surface degradation. The model yields synthetic images whose appearance and quality reflect that of real microarray images. The model is used to link noise factors to the fidelity of signal extraction with respect to a standard image-extraction algorithm. Of particular interest is the identification of the noise factors and their interactions that significantly degrade the ability to accurately detect the true gene-expression signal. This study uses statistical criteria in conjunction with the simulation of various noise conditions to better understand the noise influence on signal extraction for cDNA microarray images. It proposes a paradigm that is implemented in software. It specifically considers certain kinds of noise in the noise model and sets these at certain levels; however, one can choose other types of noise or use different noise levels. In sum, it develops a statistical package that can work in conjunction with the existing image simulation toolbox.

1.

Introduction

The introduction of cDNA microarray technology1 allows thousands of gene expression values to be measured simultaneously, thereby providing insight into the global gene-expression patterns of cells (tissues) being studied. The approach is powerful for studying the myriad transcription-related pathways involved in cellular growth, differentiation, and transformation.2 3 4 5 The quality of each gene-expression value detected from this measurement technology depends intricately on the image-processing algorithm and interactions. Numerous image-processing tools have been proposed to extract signal intensity from the cDNA arrays. A method that uses a statistical test to segment the hybridized region from the background and the inner hole is used in our study.6 To better quantify the extracted data, metrics have been introduced to better understand the data generation.7

Despite the extensive application of cDNA technology, few studies have been devoted to examining the quality and reliability of gene expression signals in terms of how close the detected signals are to the true gene expression levels in a biological sense.8 Linking various noise conditions to the signal extraction has been the goal of most image-extraction algorithms, the purpose being to develop better algorithms. Most proposed imaging methods are based on intuitive evidence. This study employs a microarray-image model that takes into account many factors, including spot morphology, signal strength, background fluorescent noise, and shape and surface degradation.9 The model yields synthetic images whose appearance and quality reflect that of real microarray images. Here we use the model to link noise factors to the fidelity of signal extraction with respect to a standard image-extraction algorithm.6 7 Of particular interest is the identification of the noise factors and their interactions that significantly degrade the ability to accurately detect the true gene-expression signal. This study uses statistical criteria in conjunction with the simulation of various noise conditions to better understand the influence of noise on signal extraction for cDNA microarray images.

Although some principles of experimental design have been proposed for microarray experiments, they have been focused primarily on optimizing the yield of information on the biological tissue samples of interest relative to the reference sample10 11 and on assessing within and between array variability. In this study, we use factorial experiments to systematically identify factors and their interactions that significantly affect the accuracy of detecting the expression signal. Because noise–factor interactions can affect the quality of signal detection in unpredictable ways, a systematic examination of these interactions is needed.

Two points need to be kept in mind regarding the statistical analysis. First, it is generally true that signal detection algorithms can better recover the true signal for images with less severe levels of noise. Thus, when we compare signal estimation for low noise with estimation for high noise, the actual error of estimation should be less for low noise—and this will be borne out. Our concern here, however, lies in a different direction. We want to examine the significance of different levels of various kinds of noise on signal estimation. If there is no significant effect on estimation error relative to different levels of a particular type of noise, then reducing the noise in the image to a lower level will not significantly affect signal detection; however, if there is a significant effect, then it would be worthwhile to try to reduce that type of noise.

A second point is that we are proposing a paradigm implemented in software, and not simply providing results. We have chosen to consider certain kinds of noise in the noise model and to set these at certain levels. One can choose other types of noise or use different noise levels. Clearly, bringing the noise levels closer will reduce the significance of noise effects, whereas moving them farther apart will increase the significance. What we have done is to develop a statistical package to work in conjunction with the existing image simulation toolbox.

2.

Image Simulation

This section describes the noise conditions used in the current study. A detailed description of image simulation is given in the original paper.9 Figure 1 shows the cDNA spot and model generation with various noise conditions. The addition of noise to the array is broadly divided into three levels: array-level, block-level, and spot-level noise. Detailed distributional descriptions of the various types of noise are given in the appendix. Throughout this section, when describing a type of noise, we refer to the appendix for specific distributional information. The reference uses the simulation number. Our experiments involve three noise settings: −1, 0, and +1, where the increasing ordinal numbering corresponds to worst to least noise (Table 1).

Figure 1

Microarray spot model.

002404j.1.jpg

Table 1

Settings for noise parameters.
Index Noise Type Level +1:
Good
Level 0:
Average
Level −1:
Bad
1 Sig./background
noise(SigBack)
3 2 1.5
2 Expresser or outlier
probability rate (OutL)
0.1 0.25 0.5
3 Spike noise (spike)
(Lspispi,Wspi)
0.01,
(500,700),
(2,5)
0.015,
(700,1000),
(2,5)
0.06,
(900,1200), (6,10)
4 Snake noise (Snake)
sn,Ls,Wsn,N seg )
0.15,
(10,50),1,2
0.20,
(40,70),1,5
0.25, (50,90),2,12
5 Parabolic background with
deviation (ParaB) ch1,
γch2)
1, (10, 12) 1, (15, 17) 1, (25, 27)
6 Spot radius: deviation
(Spot) s)
10 20 30
7 Inner hole (InnH)
hhvv)
(4,7,5,8),(4,7,
5,8)
(10,20,5,10),(10,20,5,10) (35,45,10,20),(35,
45,10,20)
8 Foreground noise
(ForeN)
ms)
(0, 0,4,7),
(0, 0,4,7)
(0,0,5,10),
(0,0,5,10)
(0,0,10,15),
(0,0,10,15)
9 Edge noise (EdgeN) ed)0.3 0.1 0.03
10 Chord noise (Chord)
(p0,p1,p2,p3,p4)
(0.9,0.07,
0.03)
(0.75,0.15,0.05,0.05) (0.2,0.35,0.20,
0.15,0.1)
11 Scratch noise (Scratch)
sc,Ls∼U[Lsc1,Lsc2],Wsc,
Nsc)
2.5,(9,35),3,2 3.5,(15,45),5,4 4,(25,65),7,10
12 Signal deviation (sigSD)
(α)
0.15 0.25 0.35
13 Flat background with
deviation (FlatBack)
ch1ch2)
0, (10,12) 0, (15,17) 0, (25,27)

The analysis of a detection algorithm begins with a ground truth. Here that ground truth refers to a “true” expression intensity that must be estimated by the detection algorithm. A microarray containing N gene expression spots with intensity levels Ik, for k=1,…,N, is simulated by an exponential distribution. Base intensities for the red and green channels, Rk and Gk, respectively, are generated from two independent normal distributions having a mean Ik and standard deviation αIk, where α is a common coefficient of variation.

A particular gene (RNA) may be over/or underexpressed, and this will show up in the red (test) channel. We refer to such a gene as an expresser or outlier. These are found randomly in the model by selecting a gene from the entire microarray with a probability p outlier to be an outlier. If gene k is selected, then a scaling factor tk=10bk is applied, where bk satisfies a beta distribution, bk∼Β(1.7,4.8), and where the ± sign is selected with equal probability. Based on the scaling factor, the individual channel intensities are given by Rk=Rktk and Gk=Gk/tk

The dyes commonly used for microarray experiments show nonlinear response characteristics, and different dyes give different responses. This effect is modeled by the nonlinear function

f(x)=a3[a0+x(1ex/a1)a2];  a3>1.
R and G are transformed by the detection system response characteristic function defined by fR(x) or fG(x) to obtain realistic fluorescent intensities. The resulting observed fluorescent intensities, Rk =fR(Rk ) and Gk =fG(Gk ) are the true mean intensities across the k’th spot.

Normally distributed foreground noise of intensity If is added pixelwise on the spots (simulation 9 in the appendix). This foreground noise typically has zero mean. It results in spot intensities SR=Rk +If1 and SG=Gk +If2. Figure 2 shows noise addition at various levels. In this figure, and in all subsequent figures illustrating noise, all other noise factors are set at the best level (less variant than +1 level).

Figure 2

Foreground noise variation illustrated at three levels: +1, 0, −1.  

002404j.2.jpg

Owing to laboratory dust that may stick on the arrays and fluoresce on laser excitation to give high-intensity spikes, or high-intensity points caused by cDNA precipitation, spike noise, at a preset rate, Lspi, is added randomly across the entire slide area. Once a pixel is selected for spike noise, the adjacent pixels have a higher probability of being affected. This is fixed by a random number chosen from a uniform rate, Wspi, which gives a count of pixels randomly chosen to be influenced by this noise. The intensity, NS, of the spike noise is governed by an exponential distribution with mean μspi. Figure 3 shows spike noise added at different levels.

Figure 3

Spike noise variation illustrated at three levels +1, 0, −1 (left to right).

002404j.3.jpg

Physical handling of the array slides can result in scratch noise (surface scratches), which typically results in low intensity levels. Scratch-noise intensity is parameterized as a ratio, κsc, giving the background-to-scratch noise intensity level. Other parameters are the number of strips, strip thickness Wsc, and a random strip length, Lsc (simulation 24 in the appendix). These scratches are placed at random positions on the array and are inclined according to a (discrete) uniformly random angle, θsc∈{0,45,90,135,180}. Figure 4 shows scratch noise at different levels.

Figure 4

Scratch noise variation illustrated at three levels: +1, 0, −1 (left to right).

002404j.4.jpg

Fine dust particles on the slides can create snake noise upon laser excitation. These snake-noise strips are typically of higher intensity than the signal level. To simulate this noise, multidirectional snake noise has been generated consisting of some number, N seg , of segments. Analogously to scratch noise, the intensity is parameterized as a ratio, κsn, giving the average signal-to-snake noise intensity level, the number of snakes, snake thickness Wsn, and a random length, Lsn, given as a multiple of the spot size. Figure 5 shows snake noise at different levels.

Figure 5

Snake noise variation illustrated at three levels +1, 0, −1 (left to right).  

002404j.5.jpg

The cDNA deposition spot is considered to be circular, with a random radius S (simulation 1 in the appendix). The mean of the radius is set according to the array density, and its variance relates to the consistency of spot size. The standard deviation is a predetermined proportion, ks, of the mean. The radius mean is set for every block, and randomized over a small range within the array (simulation 12). Depending on the robot arm and printing ability of the pins, the interspot distance, Gsp, may vary. Owing to the physical mechanics of the robot arm, the block size (pixel units) is fixed in most cases. The interspot distance can be set to accommodate spot size and random variations in spot radii. The spot variability at three levels is shown in Fig. 6.

Figure 6

Spot radius deviation illustrated at three levels: +1, 0, −1 (left to right).

002404j.6.jpg

Owing to the impact of the print tip on the glass surface, or possibly to the effect of surface tension during the drying process, a significantly lesser amount of cDNA can be deposited near the spot center. An elliptical shape models this inner hole with random horizontal and vertical axes, H and V (simulation 3). Interarray variability in the distributions of H and V is modeled by uniformly distributed means μH and μV (simulation 14). The choice of the parameters governs the hole shapes. The center position of a hole is allowed to drift over a range (simulation 4). The shape is unaffected by the drift because the contact of the mechanical print tip to the surface is unaffected. Figure 7 shows the noise at different levels.

Figure 7

Inner hole noise variation illustrated at three levels: +1, 0, −1 (left to right).  

002404j.7.jpg

The irregularity of RNA washout during slide preparation is modeled by chord noise (chord removal). The number, Nc, of chords to be removed for a spot is selected from a discrete distribution, {0,1,2,3,4}, where the elements of the distribution occur with probabilities p0, p1, p2, p3, and p4, respectively. For images with very few pieces cut off, the zero-chord probability p0 is very high, and the three- and four-chord probabilities are close to 0 (possibly equal to 0). To model interarray variability, the probabilities can be treated randomly. This noise parameter is set once for every block that is not a spot level noise. Once the number of chords for a spot is determined, the distance, L, of each chord center to the edge is selected from a beta distribution, with interblock variability for the beta distribution being uniformly modeled (simulation 5). Finally, the chord locations are chosen uniformly randomly according to an angle θ between 0 and 2π. Figure 8 shows chord noise at different levels.

Figure 8

Chord noise variation illustrated at three levels: +1, 0, −1 (left to right).

002404j.8.jpg

Owing to the manner in which liquid dries, the spots usually do not have smooth edges. Edge noise is simulated via a parameterized edge-noise algorithm adopted from digital document processing. Edge noise is applied to the outer perimeter of the spot (after chord removal). Figure 9 shows the noise at different levels.

Figure 9

Spot edge variation illustrated at three levels: +1, 0, −1 (left to right).

002404j.9.jpg

Many factors contribute to the fluorescent background observed: autofluorescence from the glass surface or the surface of the detection instrument, nonspecific binding of fluorescent residues after hybridization, local contamination from posthybridization slide handling, etc. Background noise is simulated by a normal distribution whose parameters are randomly chosen to describe the process, and for multiple arrays, the interarray difference is modeled by a uniform distribution (simulation 20).

Rather than be constant across the entire microarray, the mean of the background noise may vary, owing to various scanning effects. It can take different shapes: parabolic, positive slope, or negative slope. In this case a function g(x,y) is first generated (parabolic, positive slope, or negative slope) to form a background surface and normal noise is added to it pixelwise. Figure 10 shows parabolic background noise at different levels.

Figure 10

Variation in signal-to-background noise ratio (SigBack) and parabolic background. SigBack is set at −1, while parabolic background is varied from +1, 0, −1, left to right.

002404j.10.jpg

The addition of various noise types makes the microarray highly peaked, with high pixel differences. This stark irregularity can be mitigated by smoothing the image with either a flat or pyramidal convolution kernel. Our simulation study uses a flat smoothing function.

Once a microarray image has been simulated, the signal extraction toolbox Dearray uses statistical methods to segment the signal and the background pixels.6 7 Different levels of significance can be set for this procedure. Once the signal pixels are identified, a trimmed mean of their values gives an estimate of the signal mean. Background information is extracted by taking pixel information from four corners of a given spot to estimate its mean. Actual signal expression is estimated by the difference between the two. If a spot’s irregularity in shape and signal (area of the spot, signal variation, etc.) is reflected by a low-quality metric, then the spot can be flagged. At the final step, a linear corrective normalization procedure is carried out to compensate for variation in the dye response. Ratio intensities are then computed. A logarithmic scale applied to the ratios can be used to map the data to a desirable range.

3.

Experimental Design and Statistical Data Analysis

The array model has more than twenty parameterized noise conditions. We consider thirteen commonly occurring noise conditions for this study. These are grouped into four categories, which then correspond to four experiments: (1) background noise, (2) shape noise, (3) surface noise, and (4) weak signal. Each category has five conditions, with some of the thirteen conditions occurring in more than one category. The experiments are described in Table 2. In experiments 1A through 4A, each factor can take on two levels, 0 or 1. In experiments 1B through 4B, the factors take on the levels −1 or 1. Assuming two levels for each noise factor, there are thirty-two conditions for each category. For each condition, 8 replicate arrays are generated so there are 256 arrays per experiment. Each array has 1600 spots in a 40×40 matrix format. These numbers have been chosen to provide sufficient replicates while not resulting in inordinate image-processing time.

Table 2

Experiments.
Experiment 1: Background–Noise Interactions
Index Noise Type
1 Sig./background
noise(SigBack)
2 Expresser or outlier level (OutL)
3 Spike noise (Spike)
(Lspi,μspi∼U[e,f],Wspi∼U[g,
h])
4 Snake noise (Snake)
sn,Lsn∼U[Lsn1,Lsn2],Wsn,N seg )
5 Parabolic background with
deviation (ParaB) ch1ch2)
Experiment 2: Shape–Noise Interactions
Index Noise Type
1 Spot radius: deviation (Spot)
s)
2 Inner hole (InnH) h,σh,μv,
σv)
3 Foreground noise (ForeN)
ms)
4 Edge noise (EdgeN) ed)
5 Chord noise (Chord)
(p0,p1,p2,p3,p4)
Experiment 3: Surface–Noise Interactions
Index Noise Type
1 Spot radius: deviation (Spot)
s)
2 Inner hole (InnH) h,σh,μv,
σv)
3 Snake noise (Snake)
sn,Lsn∼U[Lsn1,Lsn2],Wsn,N seg )
4 Scratch noise (Scratch)
sc,Ls∼U[Lsc1,Lsc2],Wsc,Nsc)
5 Chord noise (Chord)
(p0,p1,p2,p3,p4)
Experiment 4: Weak Signal–Noise
Index Noise Type
1 Signal standard deviation
(SigSD) (α)
2 Foreground noise (ForeN)
ms)
3 Sig./background noise
(SigBack)
4 Flat background with
background deviation (FlatBack)
ch1ch2)
5 Spike noise (Spike)
(Lspispi∼U[e,f],Wspi∼U[g,])

3.1.

Experimental Conditions

The background–noise interaction involves noise that can alter the background and thereby influence signal extraction. Parabolic noise generates a concave background, and at different levels the backgrounds are expected to show more deviation. A high signal-to-background noise ratio reduces the gap between the average signal and background mean levels. Spike and snake noise create surface noise. Expresser variability simulates spots with expresser gene expressions.

Noise degradations related to spot shapes are grouped together in the shape–noise interaction experiment. Noise related to spot shapes is grouped together. These include spot radius, inner-hole variation (from no hole to close to half the spot size), edge noise, and chord removal. To check the interaction of these with foreground noise, the latter is included.

The third experiment, surface–noise interaction, combines shape variation with surface noise, both snake and scratch.

In the last experiment, weak signal–noise interaction involves alterations in signal level, including foreground noise, spike noise, background unevenness, and signal-to-background ratio. This grouping is good for analyzing the effects of weak signals on the signal estimation process.

The quality of microarray images is typically assessed by a trained microbiologist in the laboratory after image scanning. In this study, the noise-level parameters used for the different factor levels correspond to the kinds of noise distributions seen in practice. As noted in the original simulation paper,9 the exact parameters will vary, depending on the technology, and the ones used in this paper correspond to general conditions observed over many years of application since the development of Dearray in 1997.6 Although metrics have been proposed to quantify microarray quality,7 there is no direct way to determine the effect of each noise level on the metrics. This is mostly attributed to the multivariate influence of the various degradations on the estimated signal. While it is no doubt true that individual statistical results obtained in this paper may not apply for different noise distributions, the general methodology will apply, and we believe that the conclusions drawn here are indicative of what one might expect with similar technology (for specific issues regarding parameters, refer to the original paper).

To quantify the relation between the factor levels (−1,0,+1), noise levels, and image quality, Table 3 provides measures corresponding to the different experiments and factor levels. All measures, except for the coefficient of variation, are defined at the spot level, and therefore have been averaged across all spots over all replicates. The table includes the means (expectations) of twelve measurements. There are four measurements for the red channel: SR_S.Dev is the standard deviation of the signal intensity; SR_SNR is the signal-to-noise ratio, which is defined as the ratio of the mean signal intensity to the local background standard deviation; SR_Quality is the channel quality metric defined in Ref. 7, which is formed as a minimum of four component qualities involving area, background, consistency, and saturation; and SR_BkDev is the standard deviation of the background intensity. There are four analogous measures for the green channel: SG_S.Dev, SG_SNR, SG_Quality, and SG_BkDev. There are four common measurements: |Error| is the absolute error for the signal estimation; Prop.Area is the proportional area relative to the mask size; Total-Q is the total quality, which is based on the intensity quality of both channels and the signal-to-noise ratio of both channels, and CV is the coefficient of variation of the intensity. In all experiments, the mean error, E|Error|], of the actual to estimated signal ratios increases as the degradation increases.

Table 3

Image-quality measurements for the experiments.
Quantitative Measures Noise Levels for Experiment 1
Good (+1 Level) Average (0 Level) Bad (−1 Level)
E[SR_S.Dev] 1176.52 1178.82 1320.937
E[SR_SNR] 111.269 66.143 17.921
E[SR_Q] 0.7333 0.7807 0.9775
E[SR_bkDev] 17.884 32.315 131.853
E[SG_S.Dev] 1181.09 1178.237 1328.67
E[SG_SNR] 105.290 61.987 17.655
E[SG_Q] 0.7514 0.8017 0.9793
E[SG_bkDev] 19.134 34.613 134.66
E[|Error|] 0.0714 0.1402 0.2843
E[Pro.Area] 0.9622 0.9509 0.8491
E[Total-Q] 0.7131 0.7495 0.8744
E[CV] 0.0478 0.1108 0.1805
Quantitative Measures Noise Levels for Experiment 2
Good (+1 Level) Average (0 Level) Bad (−1 Level)
E[SR_S.Dev] 970.14 727.107 569.103
E[SR_SNR] 96.750 77.082 52.770
E[SR_Q] 0.9956 0.9894 0.9528
E[SR_bkDev] 16.83 16.415 16.158
E[SG_S.Dev] 971.78 733.498 575.78
E[SG_SNR] 78.434 64.192 45.785
E[SG_Q] 0.9956 0.9894 0.9504
E[SG_bkDev] 21.079 19.90 18.767
E[|Error|] 0.1675 0.2473 0.5212
E[Pro.Area] 0.9405 0.8775 0.7606
E[Total-Q] 0.9627 0.9489 0.8989
E[CV] 0.0423 0.0423 0.0531
Quantitative Measures Noise Levels for Experiment 3
Good (+1 Level) Average (0 Level) Bad (−1 Level)
E[SR_S.Dev] 987.017 747.165 576.904
E[SR_SNR] 99.803 86.568 60.511
E[SR_Q] 0.9927 0.9878 0.9222
E[SR_bkDev] 17.205 18.097 20.328
E[SG_S.Dev] 989.46 746.12 584.234
E[SG_SNR] 79.880 69.825 50.435
E[SG_Q] 0.9998 0.9880 0.9234
E[SG_bkDev] 21.470 21.65 22.925
E[|Error|] 0.1121 0.3274 0.4992
E[Pro.Area] 0.9350 0.8586 0.7343
E[Total-Q] 0.9904 0.9483 0.8744
E[CV] 0.0419 0.0417 0.0477
Quantitative Measures Noise Levels for Experiment 4
Good (+1 Level) Average (0 Level) Bad (−1 Level)
E[SR_S.Dev] 1160.35 1125.47 1134.62
E[SR_SNR] 48.610 23.032 8.772
E[SR_Q] 0.9905 0.9768 0.9568
E[SR_bkDev] 39.331 87.582 261.48
E[SG_S.Dev] 1160.29 1131.83 1140.364
E[SG_SNR] 41.620 20.620 8.3094
E[SG_Q] 0.9906 0.9768 0.9569
E[SG_bkDev] 46.343 98.420 275.703
E[|Error|] 0.1474 0.4607 0.8199
E[Pro.Area] 0.9432 0.8993 0.83707
E[Total-Q] 0.9243 0.8355 0.61612
E[CV] 0.1209 0.1980 0.2493

While most of the measurements in Table 3 show straightforward effects, there is an apparent anomaly in experiment 1, which treats background characteristics. The mean variation of the background (E[SR_bkDev],E[SG_bkDev]) shows an increase from +1 to −1 level, along with the mean SNR (E[SR_SNR],E[SG_SNR]), which goes from good to bad. Some decrease in the proportional area of the spots is also seen. A paradox occurs with respect to total quality: E[Total-Q] increases as the levels go from +1 to −1. This is due to the effect of the parabolic background on spots in the central portion of the array. There the image gets a very low background standard deviation, which improves the SNR, and therefore improves E[Total-Q].

3.2.

Statistical Analysis of Data

For each set of experiments we used a 2k factorial design, with k=5 experimental factors. Each factor consists of two levels.12 13 Since our primary objective is to determine how the experimental noise factors affect the accuracy of detecting gene expression, the appropriate basic response variable considered for analysis is the absolute difference between the detected (estimated) and the true expression ratio at each spot. Because the distribution of these measurements tends to have a long right tail, we therefore analyze the response variable in the log-log scale for the analysis of variance model.12 More precisely, a constant 1 has been added to a response before taking the log transformation. The goal here is to reduce the potential dominating influence from extremely large responses, yet not to dramatically increase the transformed absolute differences when the true expression ratios are close to 0, noting that log (0) goes to negative infinite. Here, taking a different transformation can be viewed as evaluating the responses at different scales. One advantage of considering the absolute difference rather than the original difference, beyond its being a meaningful measurement, is that the responses are now all positive so that regardless of what monotone transformation is taken, the relative order among responses is kept. Because of that, even though the outcomes are not transformation invariant among nonlinear monotone transformations, they are less sensitive toward the choice of transformation. In fact, we have conducted analyses using other concave transformations as well as rank-based methods, in which cases the conclusion of the analysis remains unchanged.

To further avoid the situation that outlying observations have a dominating influence on the estimated main or interaction effects, we adopt the following screening procedure in our analysis. First, data points with an estimated expression ratio larger than 30 are excluded from the analysis. Such high-ratio points are often excluded in practice. Second, we have performed a regular least-squares estimation procedure12 and produced studentized residuals12 for each observation. A data point with an absolute studentized residual greater than 4 is considered as an extreme outlying observation and is further excluded from the main analysis. The chance of having an absolute studentized residual greater than 4 is less than 10−4 (for normally distributed data). The use of studentized residuals gives us a statistically meaningful way to exclude points with very high estimated ratios without requiring a subjective cutoff point lower than 30. This two-part screening procedure eliminates about 1 of the total observations in each experiment.

We fit an analysis-of-variance model with main effects, two-way, and three-way interactions to the remaining data. Results for the main effects and two-way interactions based on F-tests are obtained. We test the significance of the five main affects and all ten first-order interactions simultaneously for each experiment. Thus, we have a total of 15 hypothesis tests per experiment. We use the Bonferonni adjustment12 to control the family wise error rate (FWER) in multiple testing (testing main and first-order interactions). At α=0.05 level, this gives 0.0033 as the significance threshold for each test. Thus, the probability of erroneously rejecting any null hypothesis is controlled at 0.05.

When there are two levels in each factor, as in all of our experiments, we construct an equivalent t-test for each of the 15 F -tests. By equivalence, we mean that the p value of an F-test is the same as that of the corresponding two-sided t-test. The t-test statistics with sign and the p values, when significant, are reported. For each main effect, the t-test statistic is the difference, standardized by its standard error (S.E.), between the estimated effects of the two noise levels. Even though the S.E.s are not identical among all main effects, a consequence of using robust regression procedures, they are within 0.5 of each other. In other words, the size of the t-test statistic reflects the magnitude of changes associated with the noise factor. All the main effect t-test statistics are positive and this simply indicates that the presence of a high noise level creates more damage than that of a low noise level. For each two-way interaction, the t-test statistic is the standardized difference between the estimated cell mean when both high noise factors are present and that cell mean predicted based on outcomes from individual noise factors, assuming no interaction. A positive t-test statistic indicates a “synergistic” interaction; that is, the damage caused by the presence of both noise factors is worse than the additive effect from individual noise factors. A negative t-test statistic stands for an “antagonistic” interaction— the opposite of “synergistic” interaction. Finally, throughout, the experimental unit is the individual spot in each array.

4.

Experimental Results

As noted in the introduction, signal-detection algorithms can recover the true signal more easily for images with less severe levels of noise. Thus, when comparing experiments 1A to 4A with experiments 1B to 4B, with the noise level 0 (less severe) and noise level −1 (more severe), respectively, we expect that the true gene expression can be more accurately estimated in experiments 1A to 4A. This means that for data with more noise (−1; experiments 1B to 4B) the difference between the estimated and true expression ratio is greater. This is shown in Fig. 11, where, for all experiments, the distributions of these absolute differences at their most extreme noise level (all 0, or all −1) in log-log scale are presented by box plots. The top and bottom edges of each box correspond to the upper and lower quartiles of the measurements, respectively. The solid dots in the middle give the locations of the medians. Figure 11 clearly shows that the medians and upper quartiles of 1B to 4B are larger than the corresponding medians and upper quartiles of 1A to 4A.

Figure 11

Box plots for absolute differences between the true and estimated expression ratios in a log-log scale. For each experiment, only the responses in the most extreme level (all −1 for Bs and all 0 for As) are plotted. Each box contains the central 50 of the data. The solid dot in the middle gives the location of the median. The top and bottom whiskers reach the largest and smallest nonoutlying observations, respectively, while the circles indicate the locations of outlying observations.

002404j.11.jpg

In this paper our interest goes beyond this general statement; it is to determine the kinds of noise reduction that significantly affect signal estimates. For instance, in experiment 1, concerning background noise, if there is a significant difference between levels −1 and 1 for the parabolic background factor (p<0.0033), then lessening the curvature of the parabolic background significantly improves estimation at level α=0.05. We reach this conclusion because the response for the factorial experiment is the absolute difference between the estimated and actual signal values.

Let us consider experiment 1 in detail, the results being given in Table 4. The four columns of the table correspond to experiment 1B for all signal levels, 1B for low signal levels, 1A for all signal levels, and 1A for low signal levels. For each experiment, data in the low signal level comprise the one-third of the original data points whose true signal values are in their lower tertile. We have considered low signal levels as a case in their own right (besides being included among all signal levels) because signal detection is made more difficult when a signal is low. The table is broken into main effects and interactions. For experiment 1B (level −1 versus level 1) using all signals, all five effects are significant. This means that reducing any of these effects can be helpful. They are also all significant for low signals. Note that all five factors in the experiment directly affect pixel values, either raising or lowering them for the affected pixels, and the difference in degrees between levels −1 and 1 significantly affects signal estimation. The magnitude of the t-test statistics suggests that the high outlier and spike noise levels are more damaging to the image than the others.

Table 4

Experiment 1: Background noise.
Source Exp. 1B
All Levels
Exp. 1B
Low Levels
Exp. 1A
All Levels
Exp. 1A
Low Levels
Main Effects
SigBack 32.60(<0.0001) 15.65(<0.0001) 35.38(<0.0001) 21.81(<0.0001)
OutL 106.80(<0.0001) 42.32(<0.0001) 24.79(<0.0001) 4.02(<0.0001)
Spike 104.77(<0.0001) 94.09(<0.0001) 2.37 5.72(<0.0001)
Snake 3.17(0.0015) 2.99(0.0028) 0.10 0.69
ParaB 28.27(<0.0001) 13.85(<0.0001) 21.57(<0.0001) 11.40(<0.0001)
Interaction
SigBack*outL −2.10 −3.65(0.0003) 0.85 −0.95
SigBack*spike −17.25(<0.0001) −12.44(<0.0001) 2.81 3.38(0.0007)
SigBack*snake −0.37 0.22 −2.05 1.31
sigBack*paraB 11.20(<0.0001) 7.15(<0.0001) 4.22(<0.0001) 0.44
outL*spike 73.19(<0.0001) 42.35(<0.0001) 3.12(0.0018) 4.05(<0.0001)
outL*snake 0.66 0.40 −1.87 0.83
outL*paraB −6.57(<0.0001) −5.67(<0.0001) −0.10 −1.55
Spike*snake −1.92 −1.27 −0.26 1.73
Spike*paraB −11.61(<0.0001) −6.50(<0.0001) 1.77 −0.35
snake*paraB −2.29 −0.10 −1.47 0.35

If we now consider experiment 1A (level 0 versus level 1) for all signals, both spike and snake effects become insignificant. This means that, relative to snake or spike noise, signal estimation is not significantly different at these two levels. Looking at the fourth column, we see that spike noise is still significant for level 0 versus level 1 for low signals. For these, there is a significant difference in performance of the algorithm relative to spike noise.

Interpretation of interactions can often be difficult, but in some cases it can be revealing. For instance, confining ourselves to the case of all signal levels, in experiment 1B we see that there is interaction between the signal-to-background noise and the parabolic effect. This is not surprising because the ratio is affected by the background. The interaction of the outlier effect and spike noise is also reasonable since both produce extreme values on the microarray. The large positive t-test statistic suggests a strong “synergistic” interaction effect throughout all four scenarios. A similar type of interaction is observed when both signal-to-background noise and parabolic-background noise levels are high. Figures 12 and 13 illustrate the mixed visual effects between signal-to-background noise and parabolic-background noise and spike noise, respectively, with the underlying true spot-intensity distributions being the same in each part and with only the noise factors contributing to the differences.

Figure 12

Signal-to-background and parabolic noise at (+1,+1), (0,0), (−1,−1) level, from left to right.  

002404j.12.jpg

Figure 13

Signal-to-background and spike noises at different levels.

002404j.13.jpg

For experiment 2 (shape noise), in Table 5 we see that four of the factors are significant for experiment 2B, for all signals or just low signals. Among them, the strongest factors are the inner hole size and the foreground noise. The effect of foreground noise is similar to background noise in that it directly affects pixel values. The effect of low spot radius, large inner hole size, and excessive chord removal is to lessen the signal area, thereby reducing the pixel area over which the signal is to be estimated. Chord removal is not significant in experiment 2A for low signals, which means that at level 0 there is insufficient chord removal to significantly affect signal estimation relative to level 1. The fact that edge noise is not significant in experiment 2B indicates that the imaging algorithm can deal equally well with spot detection at both levels relative to handling edge noise.

Table 5

Experiment 2: Shape noise.
Source Exp. 2B
All Levels
Exp. 2B
Low Levels
Exp. 2A
All Levels
Exp. 2A
Low Levels
Main Effects
Spot 39.83(<0.0001) 24.19(<0.0001) 9.89(<0.0001) 5.32(<0.0001)
InnH 96.02(<0.0001) 54.21(<0.0001) 19.43(<0.0001) 11.78(<0.0001)
ForeN 71.22(<0.0001) 39.65(<0.0001) 21.31(<0.0001) 11.79(<0.0001)
EdgeN 2.41 0.77 3.91(<0.0001) 1.33
Chord 15.65(<0.0001) 8.27(<0.0001) 5.42(<0.0001) 2.21
Interaction
spotR*innH 26.31(<0.0001) 14.95(<0.0001) 3.99(<0.0001) 1.49
spotR*foreN −0.98 −0.41 1.82 0.71
spotR*edgeN 4.96(<0.0001) 3.36(0.0008) −0.10 −0.33
spotR*chord −6.52(<0.0001) −3.55(0.0004) −0.45 −0.10
innH*foreN 12.17(<0.0001) 7.60(<0.0001) −2.01 −0.70
innH*edgeN 0.70 0.41 3.00(0.0027) 2.82
innH*chord 9.01(<0.0001) 4.33(<0.0001) 4.26(<0.0001) 4.13(<0.0001)
foreN*edgeN 0.45 −0.30 −0.22 −1.15
foreN*chord 3.80(0.0001) 2.37 1.69 0.55
edgeN*chord −3.74(0.0002) −2.25 −0.41 −1.01

There is an apparent anomaly with regard to edge noise in experiment 2A: edge noise is significant relative to levels 0 and 1, but not with respect to levels −1 and 1. This phenomenon is an “apparent” anomaly because one cannot compare p values across different experiments with full confidence—although we often do make such comparisons in a heuristic mode. Recall that the denominator of the F-statistic contains a variance estimator, and therefore a low variance will tend to make the F-statistic significant. Because the variance is very low in experiment 2A in contrast to experiment 2B, significance in the former and lack of significance in the latter is a reasonable consequence and does not imply that the difference in damage between two levels in experiment 2B is less than that in 2A. The damage effect of edge noise starts to show in 2A when the effects of inner hole size and foreground noise are not as dominating as they are in 2B. In experiment 2B, the effect of edge noise is still present in its significant interaction with both spot radius and chord noise. Figure 14 shows the mixed visual effect between the spot radius and chord noise.

Figure 14

Spot radius deviation and chord noise at (+1,+1), (0,0), (−1,−1) levels, from left to right.

002404j.14.jpg

Regarding interaction in experiment 2B, the three distinctly geometric factors (spot radius, inner hole, and chord noise) interact significantly for both the overall signal and low-signal cases. This is reasonable because each affects the area over which signal estimation takes place. Interaction is greatly reduced in experiment 2A, particularly for low signals, where only interaction between the inner hole and chord removal is strongly significant. Figures 15 and 16 show the mixed visual effects of the inner hole with spot radius and chord noise, respectively.

Figure 15

Spot radius deviation and inner hole at (+1,+1), (0,0), (−1,−1) levels, left to right.

002404j.15.jpg

Figure 16

Inner hole and chord noises at (+1,+1), (0,0), (−1,−1) levels, left to right.  

002404j.16.jpg

Whereas experiment 2 mixes shape effects with foreground noise and edge noise, experiment 3 mixes them with scratch and snake noise. Table 6 shows a fair amount of consistency between the two experiments with regard to the three geometric factors relative to both main effects and interaction. One notable change is that the interaction between spot radius and chord removal changes from being “antagonistic” in experiment 2 to being “synergistic” in experiment 3. Even though the order of estimated cell means in the four noise level combinations remains the same in both experiments, in experiment 3 the estimated cell mean when both noise factors are present is much higher than in the other three; consequently, a significant “synergistic” interaction is observed. For the most part, snake and scratch noise show no significant main effects. The exception is scratch noise for low signals in experiment 3B. This is quite plausible because scratch noise causes a strip of low values, thereby reducing an already low signal. Note also the interaction of snake and scratch noise in three of the four experiments.

Table 6

Experiment 3: Shape-surface noise.
Source Exp. 3B
All Levels
Exp. 3B
Low Levels
Exp. 3A
All Levels
Exp. 3A
Low Levels
Main Effects
Spot 32.80(<0.0001) 20.70(<0.0001) 6.40(<0.0001) 4.10(<0.0001)
InnH 103.75(<0.0001) 22.13(<0.0001) 15.87(<0.0001) 8.94(<0.0001)
Snake 0.68 1.81 0.17 0.20
Scratch 1.26 5.50(<0.0001) 0.37 0.69
Chord 20.68(<0.0001) 13.55(<0.0001) 1.65 0.40
Interaction
spotR*innH 23.47(<0.0001) 14.44(<0.0001) −0.14 −0.71
spotR*snake −6.46(<0.0001) −4.05(<0.0001) −2.86 −1.31
spotR*scratch −3.04(0.0023) 0.57 −0.17 −1.09
spotR*chord 16.56(<0.0001) 8.71(<0.0001) 2.61 0.81
innH*snake −3.06(0.0022) −1.51 0.00 −0.44
innH*scratch −1.79 1.87 0.10 −0.79
innH*chord 14.60(<0.0001) 7.70(<0.0001) 0.17 −2.38
snake*scratch −3.49(0.0005) −2.12 −5.49(<0.0001) −4.24(<0.0001)
snake*chord 8.84(<0.0001) 5.53(<0.0001) 0.92 0.00
scratch*chord 1.66 0.84 2.78 2.04

Experiment 4 concerns signal conditions, in particular, signal deviation, signal-to-background ratio, and foreground noise. These conditions are bound to affect signal estimation, and the main-effects part of Table 7 demonstrates this. The only exception is for low-signal values when comparing levels 0 and 1 in experiment 4A. Since signal deviation is tied to the signal mean, a low signal diminishes this deviation and signal deviation is not significant for low signal values. Figure 17 shows the mixed visual effects between signal-to-background and spike noise. As has been common throughout, overall interaction between the factors is much less relative to levels 0 and 1 than with respect to levels −1 and 1.

Figure 17

Signal-to-background and spike noise variation at (+1,+1), (0,0), (−1,−1) levels, left to right.

002404j.17.jpg

Table 7

Experiment 4: Weak-signal noise.
Source Exp. 4B
All Levels
Exp. 4B
Low Levels
Exp. 4A
All Levels
Exp. 4A
Low Levels
Main Effects
SigSD 77.19(<0.0001) 20.87(<0.0001) 20.21(<0.0001) 0.35
ForeN 7.85(<0.0001) 3.52(0.0004) 5.58(<0.0001) 2.16
SigBack 55.55(<0.0001) 25.82(<0.0001) 46.33(<0.0001) 26.34(<0.0001)
FlatBack 67.22(<0.0001) 30.82(<0.0001) 42.74(<0.0001) 24.38(<0.0001)
Spike 74.96(<0.0001) 68.49(<0.0001) 3.01(0.0025) 2.84
Interaction
sigSD*foreN −1.41 −3.48(0.0005) −0.22 1.04
sigSD*sigBack −4.86(<0.0001) −4.43(<0.0001) 0.22 0.00
sigSD*flatBack −5.44(<0.0001) −5.42(<0.0001) −2.80 −4.13(<0.0001)
sigSD*spike 44.79(<0.0001) 24.89(<0.0001) 0.30 −1.00
foreN*sigBack −0.42 0.82 2.86 2.19
foreN*flatBack 0.17 0.49 −0.77 −3.72(0.0002)
foreN*spike −2.29 −0.81 0.57 1.09
sigBack*flatBack 21.72(<0.0001) 11.84(<0.0001) 3.23(0.0012) 1.59
sigBack*spike −17.92(<0.0001) −16.97(<0.0001) 1.69 2.33
flatBack*spike −21.68(<0.0001) −17.62(<0.0001) −2.49 −2.10

5.

Conclusion

Factorial analysis has been applied to simulated microarray images to study the effects and interaction of noise types at different noise levels. This type of analysis provides a general paradigm for investigating the effects of noise within a comprehensive simulation environment, thereby providing a tool by which one can quantitatively determine which kinds of noise should be mitigated in microarray technology. For instance, from the analysis described in this paper, it can be concluded that elimination of the inner hole and the stabilizing of spot radius will have a strongly beneficial effect on signal estimation. Additional information can be found online.14

Appendix

Parameter settings for the microarray simulation. The notation N(a,b) denotes the normal distribution with mean a and variance b; U[a,b] is the uniform distribution on the interval [a,b]; U{a,b,c,…} is the uniform distribution on the indicated set of values; Β(a,b) is the beta distribution with parameters a and b; and exp(a) is the exponential distribution with mean a.  

Level  Simulation Parameter Descriptions      Distribution
Spot 1. Spot size S: Spot radius with ss 2)S∼N(μss 2)
2. Spot drift δx,δy: Drifting level δx,δy∼U(da,db)
da,db: Percentage of spot radius
PD: Drift activation probability Dxx×S×U[−1,1]
Dyy×S×U[−1,1]
Dx,Dy: Relative drifting
(X1 ,Y1 ): Drifted center coordinates {X1=X+DxY1=Y+Dy{X2=X1+u[1,1]Y2=Y1+u[1,1]
(X2 ,Y2 ): Second channel,
where (X,Y) are predefined spot
center coordinates
3. Inner hole size H, V: Horizontal and vertical
axis of the inner elliptical hole
H∼N(μHH)
V∼N(μVV)
4. Inner hole drift XC,YC: Ideal spot center XR=XC+δcxR
XR,YR: First channel coordinates YR=YC+δcyR
XG,YG: Second channel coordinates XG=XC+δcxG
where
δcxG,δcyG,δcxR,δcyR: drift level set
at the block level
YG=YC+δcyG
5. Chord
removal
PNc : Chord removal probability
{ pk: probability of k chords to
be removed from a target spot}
PNc ={p0,p1,p2,p3,p4}, where
p0+p1+p2+p3+p4=1
Nc∼{0,1,2,3,4}
L: Chord length L∼B(αLL)
θ: Chord position θ∼U(0,2π)
6. Spot intensity β: Mean intensity for the
assumed cell system
Ikexp(β)
Rk,Gk:k’th spot (fixed) signal
intensities for both channels
Rk∼N(IkI)
Gk∼N(IkI)
α: Coefficient of variation of
signal intensity in the system
σI=α×Ik
7. Expresser
or
outlier’s
intensity
p outlier : Outlier activation
probability
bk: Outlier control level
tk: Targeted outlier expression
ratio, with equal probability of ± sign
Rk ,Gk :k’th outlier signal
intensities for both channels
Equal probability at 0.05 to 0.10
bk∼Β(1.7,4.8)

tk=10±bk

Rk=Rktk
Gk=Gk/tk
8. Channel
conditioning
Rk ,Gk : Prenormalized signal
intensity of the spots on
red, green channels
Rk =f1(Rk )
Gk =f2(Gk )
a0,a1,a2, and a3, parameters for
response characteristic function
f(x)=[a0+x(1−e−x/a1 )a2 ]a3;
where a3>1
9. Spot signal variation—
foreground noise
SRk,SGk: Pixelwise (x,y) signal
intensity
SRk(x,y)∼Rk +N(μRk R 2)
SGk(x,y)∼Gk +N(μGk G 2)
αs: Within-spot signal coefficient
of variation
{μRk=Rk×αm1; αm1u[fa1,fb1]μGk=Gk×αm2; αm2u[fa2,fb2]
{σR=Rk×αs1; αs1u[fc1,fd1]σG=Gk×αs2; αs2u[fc2,fd2]
10. Edge
enhancement
Wed: Level of enhancement,
parameter e) set for the block
Wed∼N(μe,1)
Ne: Number of pixels enhanced
11. Edge noise Apply edge noise at the set level
ed)
Block 12. Radius
parameters
μs,ks: mean and radius deviation
factor
μr∼U(sa,sb)
σs∼ks×μs
sa,sb: bounds of radius, set by
block size and interspot gap
13. Chord
parameters
Nc: Chord rate picked with equal
probability
Nc∈U{0,1,2,3,4} having weights
{p0,p1,p2,p3,p4}
αL,βL: Chord distributional
parameters
αL∼U(aα,bα),βL∼U(aβ,bβ)
14. Inner hole
parameters
μH,μV,σH,σV: Parameters for
inner elliptical hole
μH∼U(La,Lb)×μR,
μV∼U(La,Lb)×μR
μR: Mean spot radius in the block σH1×μR,σV2×μR
α1∼U(Pa,Pb),α1∼U(Pa,Pb)
15. Drift
parameters
δcxG,δcyG,δcxR,δcyR: drift level
i, j: Percentage of the spot radius
δc∼U[i,j]
δcxG=δc×U[−1,1],δcyG=δc×U[−1,1]
δcxR=δcxG+U[−1,1],δcyR=δcyG+
U[−1,1]
16. Enhancement la,lb: Range of intensity ratio. Set
mean level of enhancement for a
block
μe∼U(la,lb)
Array 17. Physical
dimensions
Bw,Bh: Block size—width, height
(distance between first spot
centers of any two blocks)
Typical setting for an 8-block, 2-row
array (in pixels):
Ml,Mr,Mt,Mb: Margin settings
(left, right, top, bottom)
Bh,Bw=900
Ml,Mr,Mt,Mb=100
N pin ,N row : Number of pins in an
array, printed equally across
N row number of rows
NSw,NSh: Number of spots
along the width (NSw) and
height (NSh) of the block
18. Signal-to-
noise ratio
SNR: Signal-to-noise level is set
for an array
19. Interspot
distance
Gsp: Interspot distance, set for an
array
20. Background Ib_ch1,Ib_ch2: Background intensity,
with parameters set for an array
Ib_ch1∼N(μbb1 2)
Ib_ch2∼N(μbb2 2)
γ: Background level γ∼U[a,b]
Parameter settings:
-Flat fluorescent background μb=γ,
-Functional background g(x,y):
choice of parabolic, positive
or negative slant surface function
μb=γ×g(x,y),
with
σb1 =(kb1 μb),σb2 =(kb2 μb)
21. Spike noise Lspi: Level of spike noise (set in
terms of percentage of total pixels)
Ns: Intensity of the spike noise Nsexpspi),
μspi: Noise rate μspi∼U[e,f]
Wspi: Width of the noise cluster Wspi∼U[g,h]
22. Edge noise δed: Set the controlling parameter δed set as a percentage of maximum
intensity value
23. Snake noise Nseg: Number of snake tails in an
image
Nseg,κsn,Lsn,Wsn
Isn: Intensity of the noise tail Isn∼N(μsnsn),
κsn: Average signal-to-snake
noise intensity level
μsn=(Iksn),σsn=ksn×μsn
Lsn: Length of the segment
expressed as multiples of
average spot size
Lsn∼U[Lsn1,Lsn2]
Wsn: Width of the snake noise tail
24. Scratch noise Nsc: Number of scratch tails in an
image
Nsc,κsc,Wsc, θ
Isc: Intensity of the scratch noise Isc∼N(μscsc)
κsc: Average background-to-
scratch noise intensity level
μsc=(μbsc),σsc=ksc×μsc
Lsc: Length of the segment in
units of average size of the spots
Wsc: Width of the scratch noise
θ: Scratch noise inclination
Lsc∼U[Lsc1,Lsc2]

θ∈U{0,45,90,135,180} deg

Acknowledgments

Y.B. was supported by the Center for Environmental and Rural Health at Texas A&M University. E.R.D. was supported by the National Human Genome Research Institute. D.V.N. was supported by the National Cancer Institute (CA-90301). R.J.C. was supported by a grant from the National Cancer Institute (CA-57030), and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106). N.W. was supported by the National Cancer Institute (CA-74552).

REFERENCES

1. 

M. Schena , D. Shalon , R. W. Davis , and P. O. Brown , “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science , 270 467 –470 (1995). Google Scholar

2. 

M. N. Arbeitman , E. E. Furlong , F. Imam , E. Johnson , B. H. Null , B. S. Baker , M. A. Krasnow , M. P. Scott , R. W. Davis , and K. P. White , “Gene expression during the life cycle of Drosophila melanogaster,” Science , 297 2270 –2275 (2002). Google Scholar

3. 

S. Chu , J. DeRisi , M. Eisen , J. Mulholland , D. Botstein , P. O. Brown , and I. Herskowitz , “The transcriptional program of sporulation in budding yeast,” Science , 282 699 –705 (1998). Google Scholar

4. 

J. DeRisi , L. Penland , P. O. Brown , M. L. Bittner , P. S. Meltzer , M. Ray , Y. Chen , Y. A. Su , and J. M. Trent , “Use of a cDNA microarray to analyse gene expression patterns in human cancer,” Nat. Genet. , 14 457 –460 (1996). Google Scholar

5. 

I. S. Lossos , A. A. Alizadeh , M. Diehn , R. Warnke , Y. Thorstenson , P. J. Oefner , P. O. Brown , D. Botstein , and R. Levy , “Transformation of follicular lymphoma to diffuse large-cell lymphoma: alternative patterns with increased or decreased expression of c-myc and its regulated genes,” Proc. Natl. Acad. Sci. U.S.A. , 99 8886 –8891 (2002). Google Scholar

6. 

Y. Chen , E. R. Dougherty , and M. L. Bittner , “Ratio-based decision and quantitative analysis of CDNA microarrays,” J. Biomed. Opt. , 2 (4), 364 –374 (1997). Google Scholar

7. 

Y. Chen , V. Kamat , E. R. Dougherty , M. L. Bittner , P. S. Meltzer , and J. Trent , “Ratio statistics of gene expression levels and application to microarray data analysis,” Bioinformatics , 18 (9), 1207 –1215 (2002). Google Scholar

8. 

D. V. Nguyen , A. B. Arpat , N. Wang , and R. J. Carroll , “DNA microarray experiments: biological and technological aspects,” Biometrics , 58 (4), 701 –717 (2002). Google Scholar

9. 

Y. Balagurunathan , E. R. Dougherty , Y. Chen , M. L. Bittner , and J. M. Trent , “Simulation of cDNA microarrays via a parameterized random signal model,” J. Biomed. Opt. , 7 (3), 507 –523 (2002). Google Scholar

10. 

K. Kerr and G. A. Churchill , “Experimental design for gene expression microarrays,” Biostatistics, 2 183 –202 (2001). Google Scholar

11. 

M. K. Kerr and G. A. Churchill , “Statistical design and analysis of gene expression microarrays,” Genet. Res. , 77 (2), 123 –128 (2001). Google Scholar

Notes

Address all correspondence to Dr. Edward R. Dougherty, Texas A&M Univ., Dept. of Electrical Engineering, 111D Zachry, College Station, TX 77843-3128. Tel: 979-694-9538, Fax: 979-845-6259, E-mail: e-dougherty@tamu.edu

©(2004) Society of Photo-Optical Instrumentation Engineers (SPIE)
Yoganand Balagurunathan, Naisyin Wang, Edward R. Dougherty, Danh V. Nguyen, Yidong Chen, Michael L. Bittner, Jeffrey M. Trent, and Raymond J. Carroll "Noise factor analysis for cDNA microarrays," Journal of Biomedical Optics 9(4), (1 July 2004). https://doi.org/10.1117/1.1755232
Published: 1 July 2004
Lens.org Logo
CITATIONS
Cited by 27 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Interference (communication)

Signal to noise ratio

Signal detection

Factor analysis

Statistical analysis

Error analysis

Seaborgium

Back to Top