Open Access
22 August 2017 Multitaper and multisegment spectral estimation of line-edge roughness
Yao Luo, Serap A. Savari
Author Affiliations +
Abstract
Line-edge roughness (LER) has important impacts on the quality of semiconductor device performance, and power spectrum estimates are useful tools in characterizing it. These estimates are often obtained by taking measurements of many lines and averaging a classical power spectrum estimate from each one. While this approach reduces the uncertainty of the estimates, there are disadvantages to the collection of many measurements. We propose techniques with widespread application in other fields that simultaneously reduce data requirements and the uncertainty of LER power spectrum estimates over current approaches at the price of computational complexity. Multitaper spectral analysis uses an orthogonal collection of data windowing functions or tapers to obtain a set of approximately statistically independent spectrum estimates. The Welch overlapped segment averaging is an earlier approach to reduce the uncertainty of power spectrum estimates. There are known techniques to evaluate the uncertainty of power spectrum estimates. We simulate random rough lines using the Thorsos method.

1.

Introduction

Line-edge roughness (LER) is known to be a crucial factor in the yield of integrated circuit manufacturing (see, e.g., Ref. 1, pp. 82–92) and has been analyzed as part of the study of emerging patterning processes and techniques.2,3 For example, in extreme ultraviolet lithography, the LER resulting from photon shot noise and resist chemistry remains a concern for the use of this technology in high-volume manufacturing despite the ongoing development of resolution enhancement techniques.3 The critical-dimension scanning electron microscope (CD-SEM) is the standard metrology tool to obtain measurements for LER analysis (Ref. 4, pp. 109, 114), but it is also possible to use critical-dimension atomic force microscopes (CD-AFM) for this purpose (Ref. 4, p. 130). The standard deviation of sampled edge positions of a line does not completely describe its LER.5 The power spectrum is an important and more detailed LER characterization metric,6,7 which is related to transistor performance8,9 and is used in process monitoring.2 However, it can be challenging to estimate the power spectrum of LER. First, the power spectrum estimates of LER are based on a finite number of sampled edge positions and they, therefore, suffer from a phenomenon called leakage where the spectral component at one frequency produces a spreading to other frequencies.10 References 1112.13 have discussed the need to use data windowing functions or tapers14 to reduce the spectral leakage of LER spectrum estimates. Second, there is consensus in recent papers7,12,1518 that it is necessary to reduce variance, i.e., the uncertainty of LER power spectrum estimates; in the newest of these papers,1618 the authors additionally defined and considered different scenarios in the low-, middle-, and high-frequency regions.

The classical approach of LER power spectrum estimation is based on the periodogram,10,14 which is conceptually and computationally simple. However, the price for this simplicity is the relatively high uncertainty that can grow when the image noise increases. It is possible to reduce image noise by increasing the dose of CD-SEM or using CD-AFM, and one could decrease the power spectrum estimation uncertainty by averaging power spectrum estimates over many lines.15,17 However, these techniques may not be efficient because CD-SEM may cause sample damage as the dose level increases (see, e.g., Ref. 19 and Ref. 4, p. 42) and because CD-AFM is slow and has other disadvantages that affect measurement.4,19 We will consider two approaches from the more recent literature on spectral analysis, and we will evaluate their abilities to reduce the power spectrum estimation uncertainty given less measured data.

The remainder of the paper is organized as follows: in Sec. 2, we will discuss the Welch overlapped segment averaging (WOSA) spectrum estimate.20 In Sec. 3, we will touch upon multitaper spectral analysis.10,21,22 In Sec. 4, we will write about our results on simulated random rough lines, and in the last section, we will offer concluding remarks.

2.

Welch Overlapped Segment Averaging Spectrum Estimates

The power spectrum estimate of a random rough line is a random process, where the component at each frequency is a random variable. The bias of a power spectrum estimate refers to the difference between the mean values of these random variables and the true values of the power spectrum. Power spectrum estimates are most valuable when these random variables have low variance and low bias. The Bartlett method23 is widely used to reduce variance in periodogram estimation by dividing the data samples into M disjoint segments of equal length and choosing the spectrum estimate to be the average of the periodograms associated with the M segments. In the LER literature, Ref. 15 suggests using a resampling method to reduce statistical noise, where assembled long lines based on random subsets of the available data are generated for power spectrum estimation. The modified periodogram10 has been recently introduced in the LER literature1113 to reduce the bias due to leakage. The WOSA spectrum estimate20 is based on the modified periodogram and extends the Bartlett method to reduce both the bias and the variance in power spectrum estimation.

Suppose we sample a rough edge at N points with distance Δ between successive points. Let wk,k{0,1,,N1} denote the values of the sampled edge positions and let w¯ be the mean of the edge positions. The sequence of real constants hk,k{0,1,,N1}, is called a data taper or window. The modified periodogram is defined as

Eq. (1)

S^modified(f)=Δ|k=0N1hk(wkw¯)ei2πfΔk|2.

We require {hk} be a normalized taper with

Eq. (2)

k=0N1hk2=1
for asymptotically unbiased power spectrum estimation. When hk=1N for all k, the window is called “rectangular” and Eq. (1) describes the periodogram. Let S(f), |f|1/(2Δ), denote the true power spectrum of interest, and let H(f)=Δ|k=0N1hkei2πfΔk|2. The mean of the modified periodogram estimator is10

Eq. (3)

E[S^modified(f)]=1/(2Δ)1/(2Δ)H(ff)S(f)df.

The convolution in Eq. (3) shows a potential bias called leakage, where the power at one frequency can spread to the others. This bias appears when a finite-sized sampling window is used for estimating the power spectrum of a process with a relatively high dynamic range and is reduced by a modified periodogram using a nonrectangular taper with a large central lobe, such as the Welch window.11,13 The k’th element in the (unnormalized) Welch taper with size N is defined by

Eq. (4)

y(k,N)=1(2kN1)2,k{0,1,,N1}.

We normalize tapers when use them in power spectrum estimates.

The WOSA method divides the data samples into overlapping segments and uses the average of the modified periodograms associated with these segments. The nonrectangular windows reduce the bias due to spectrum leakage but place less weights on the end samples in each individual segment. The overlapping segments compensate for this effect and preserve the autocovariance information between adjacent segments,10 and therefore, reduce the variance in power spectrum estimation. The WOSA spectrum estimation has a hardware implementation24 and is thus potentially useful for in-line metrology.

To describe the WOSA spectrum estimate, we introduce some additional notation in Fig. 1. The T segments each have Nseg points and share a fraction r of their data points with the next segment, which leads to an “offset” of D=Nseg×(1r) points between neighboring segments. The number of overlapping segments T used in WOSA is

Eq. (5)

T=NNsegNseg×(1r)+1.

Fig. 1

T overlapped segments for WOSA and T segments for WOSA via circular segments with offset D and segment size Nseg.

JM3_16_3_034001_f001.png

It is desirable to have more segments to reduce the variance in power spectrum estimation for a fixed N. Thus, Nseg should be chosen according to the lowest frequency resolution that can be tolerated. The parameter r is generally set to 50% to achieve nearly maximum variance reduction. There is a more recent variation of WOSA that permits the circular overlap of segments.25 Using the circularly overlapped segments can increases T to T, where T is defined as

Eq. (6)

T=NNseg×(1r).

For WOSA, the elements of the t’th segment are wk+D×(t1) and k{0,1,,Nseg1}. Let

Eq. (7)

gk(t)=y(k,Nseg),k{0,1,,Nseg1},t{1,2,,T},hk(t)=gk(t)k=0Nseg1|gk(t)|2.

Then, the WOSA power spectrum estimate is

Eq. (8)

S^WOSA(f)=1Tt=1TΔ|k=0Nseg1hk(t)[wk+D×(t1)w¯]ei2πfΔk|2.

For the WOSA variant with circular overlap, the elements of the t’th segment are w[k+D×(t1)]modN. Thus, the first T segments are the same as those in WOSA. For t>T suppose that the first vt points come from the end of the line and the remaining Nsegvt points come from the beginning of the line; here vt=N{[D×(t1)]modN}. The tapers used for circular WOSA are

Eq. (9)

gk(t)={y(k,Nseg),0k<Nseg,1tTy(k,vt),0k<vt,t>Ty(kvt,Nsegvt),vtk<Nseg,t>T,hk(t)=gk(t)k=0Nseg1|gk(t)|2,
and the overall spectrum estimate is

Eq. (10)

S^circular(f)=1Tt=1TΔ|k=0Nseg1hk(t)(w[k+D×(t1)]modNw¯)ei2πfΔk|2.

3.

Multitaper Spectrum Estimation

Multitaper methods21 are among the great advances in spectrum estimation. Multitaper methods have low uncertainty and are resistant to spectral leakage since they recover the information lost by single nonrectangular taper estimators through the use of a group of orthogonal tapers. As discussed in Ref. 10, multitaper methods work for various types of power spectra. They have widespread applications including neuroscience,26 climate studies,27 nuclear test-ban treaty verification,28 and cognitive radio.29 Multitaper spectral analysis uses an orthogonal collection of tapers on a finite sample of data to obtain a set of approximately statistically independent spectrum estimates. The overall spectrum estimate is either an average or a weighted average of the individual estimates. To provide more details about the earliest version of this technique, we introduce the following notation. Suppose we have the T orthogonal normalized tapers hk(t),k{0,1,,N1},t{1,2,,T}. The basic estimator takes the arithmetic mean of the spectrum estimates associated with these tapers.

Eq. (11)

S^multitaper(f)=1Tt=1TΔ|k=0N1hk(t)(wkw¯)ei2πfΔk|2.

To complete the specification of this estimate, we need to discuss the choice and number of orthogonal tapers.

Observe that the multitaper method is potentially useful for LER power spectrum estimation because it has some similarity to taking the average of modified periodogram spectrum estimates from a group of lines. The technique of applying a group of orthogonal leakage-resistant tapers to a single line resembles the technique of applying a single-leakage resistant taper to a group of lines. Thus, the multitaper method may be able to reduce the uncertainty and leakage in LER spectrum estimation with a smaller group of lines.

There have been multiple classes of orthogonal tapers considered in the literature (see, e.g., Ref. 30). In his seminal paper, Thomson21 chose to use discrete prolate spheroidal sequences (DPSS), and we will begin by following his example. These orthogonal tapers continue to be popular because they are resistant to spectral leakage. They are also known as Slepian sequences because Slepian31 observed that they are the solution to the following famous time-frequency concentration problem:10 given a sampling frequency 1/Δ and an “effective” bandwidth W<1/(2Δ), find the sequence h0,h1,,hN1 with Fourier transform H(f) defined over a continuous frequency domain |f|1/(2Δ) that maximizes λ given as

Eq. (12)

λ=WW|H(f)|2df1/(2Δ)1/(2Δ)|H(f)|2df.

The first Slepian sequence corresponds to the largest λ. The second Slepian sequence maximizes λ among sequences orthogonal to the first Slepian sequence, and one can similarly construct an arbitrarily large set of Slepian sequences.21,31 There are many existing implementations for the construction of Slepian sequences. We use the R package multitaper 1.0–12 for Slepian sequence calculation that is based on a tridiagonal function method10 and LAPACK function calls.32

The spectral leakage associated with using {hk} as a normalized taper for data sequence {wk} can be studied based on Eq. (12). As we discussed in Sec. 2, a taper with good leakage resistance should have a large central lobe within |f|W.10 The first few Slepian sequences have good leakage resistance because they concentrate nearly all of the energy of H(f) in the region |f|W. How many orthogonal tapers can we select? The first few Slepian sequences obtained by optimizing Eq. (12) are known to have λ close to one, and they therefore offer good spectral leakage protection. The higher-order Slepian sequences may not help with spectral leakage. In practice, the number T of orthogonal tapers satisfies T<2  NW.21

Thomson21 also proposed a generalization of Eq. (11). The use of frequency-dependent weights dt(f),t=1,2,,T can reduce the bias from spectral leakage if the weights {dt(f)} are set to be close to 1 in the region where the spectrum is flat and are set to reduce the contribution from the higher-order tapers in the region where the spectrum has a large slope. The multitaper spectrum estimate with the weights {dt(f)} is given as

Eq. (13)

S^multitaper(f)=1t=1T|dt(f)|2t=1T|dt(f)|2Δ|k=0N1hk(t)(wkw¯)ei2πfΔk|2.

Riedel and Sidorenko22 proposed using the sinusoidal tapers for an alternate multitaper spectrum estimate. This approach is interesting because the tapers have an analytic expression and because they approximate the solution to an optimization problem discussed by Papoulis33 related to the bias error of a taper estimate as N;. The t’th sinusoidal taper has the following expression:

Eq. (14)

hk(t)=2N+1sin[πt(k+1)N+1].

One can again consider spectrum estimates based on an average or a weighted average of the individual single-tapered estimates.

The performance of Slepian sequences for this latter optimization problem on “local bias” depends on the choice of the effective bandwidth W. Increasing W generally increases the local bias but tends to reduce the variance from random errors since one can use more orthogonal tapers with λ value close to one.

4.

Simulations

For our simulations we consider the periodogram, the modified periodogram using the Welch window, the original WOSA method with the Welch window and the variant with circular overlap25 with the Welch window, Thomson’s multitaper method using DPSS tapers with and without adaptive weights, and sinusoidal tapers without adaptive weights.

To study the performance of different spectrum estimators, we simulate random rough lines using the Thorsos method.34,35 We consider the K-correlation model or the Palasantzas power spectral density model36

Eq. (15)

PSD(f)=πΓ(α+0.5)Γ(α)·2σ2ξ[1+(2πfξ)2]α+0.5,
where σ represents LER, ξ is the correlation length, and α is the roughness (or Hurst) exponent. We follow Ref. 7 in choosing σ=1.5  nm, ξ=25  nm, and α=0.75, which are typical values observed in the experimental measurements. We generate 4096 or 20,480 edge positions for each line with the sampling distance Δ=1  nm and zero mean position w. To incorporate some of the effects of leakage and SEM noise in our simulation, we use the middle 2048 edge positions for each line. These positions are then corrupted by additive white Gaussian noise.11,3739 To evaluate the performance of LER power spectrum estimation, we consider confidence intervals, i.e., error bars, relative bias [see Eq. (18)], and spectrum concentration [see Eq. (12)]. We consider error bars to be the most important among these metrics for the application of LER metrology because it is related to random error. One generally can only analyze a relatively small number of lines where random errors are significant. For error bars and relative bias, we offer average results over all frequencies with σnoise=0  nm. For error bars, we also offer results under four other noise levels σnoise{0.5,1.0,1.5,2.0}  nm and for three frequency regions specified in a recent paper by Levi et al.,18 which defines the low-frequency region as all frequencies below 1/200  nm1, the middle-frequency region as all frequencies between 1/200 and 1/20  nm1 and the high-frequency region as all frequencies above 1/20  nm1. For spectrum concentration, we offer results for the rectangular, Welch, DPSS, and sinusoidal tapers.

The spectrum estimate of a random rough line can be treated as a random process. For example, the periodogram or modified periodogram at each frequency f follows a χ2 distribution for rough lines that follow a normal distribution, and this is a common approximation for the other spectrum estimates that we consider. We choose error bars based on an estimate of the 95% confidence interval at each frequency point.13 The lower bound CIlower(f) and upper bound CIupper(f) of the 95% confidence interval can be estimated by

Eq. (16)

CIlower(f)=νS^(f)χ10.0252(ν),

Eq. (17)

CIupper(f)=νS^(f)χ0.0252(ν),
where ν denotes the appropriate degrees of freedom (d.o.f.) for the χ2 distribution. For power spectrum estimates obtained by averaging the periodograms or modified periodograms corresponding to M rough lines it is known that ν2M. For power spectrum estimates determined by averaging M classical WOSA method estimates or M circular WOSA method estimates with T segments per line or by averaging M multitaper estimates with T tapers per line, it is known that ν2MT. It is of interest to determine the empirical fraction of instances where a true power spectrum value S(f) falls into the estimated confidence interval [CIlower(f),CIupper(f)]; we call this fraction the coverage rate, and in theory, it should be 0.95. For the periodogram and modified periodogram, we choose ν=2M. For classical WOSA, circular WOSA, and the multitaper methods, we initially set ν=2MT and subsequently lower this to match or surpass the average coverage rate of the periodogram or modified periodogram in the three different frequency regions specified earlier. By adjusting ν, we simultaneously change the widths of the estimated confidence intervals and the coverage rates.

In Table 1, we compile the average coverage rate and confidence interval width over all frequency points for eight power spectrum estimation methods when σnoise=0.5  nm. Each simulated rough line consists of 2048 points and comes from a longer line with 4096 points with sampling distance Δ=1  nm. Each power spectrum estimate is calculated from the average of the individual spectrum estimates from 26 or 14 lines. We do 1000 simulations per technique. The appendix provides analogous results for the low-, middle-, and high-frequency regions specified earlier. The multitaper methods consistently offer the best error bar performance among the eight methods. The WOSA variant with circular overlap and the classical WOSA methods are the next most effective techniques. Finally, the periodogram and modified periodogram using a single taper provided the worst error bar performances among the eight techniques.

Table 1

Power spectrum estimates when σnoise=0.5  nm. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

No. of lines per estimateCoverage rate /d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate /d.o.f.Average width of confidence interval (nm3)/d.o.f.
Periodogram (left) and modified periodogram with single Welch taper (right)
260.9489/522.12/520.9499/522.12/52
Multisegment: classical WOSA with three segments per line
140.9320/841.66/840.9743/562.09/56
Multisegment: circular WOSA with four segments per line
140.9142/1121.43/1120.9697/701.84/70
Multisegment: classical WOSA with three segments per line
260.9315/1561.20/1560.9742/1041.49/104
Multisegment: circular WOSA with four segments per line
260.9126/2081.03/2080.9693/1301.32/130
Multitaper: six DPSS tapers per line and adaptive weights
140.9493/1681.12/1680.9689/1381.24/138
Multitaper: six sinusoidal tapers per line and nonadaptive weights
140.9490/1681.12/1680.9687/1381.24/138

As we mentioned earlier, the bias and leakage of spectrum estimates are also of interest, and different techniques offer different trade-offs among performance metrics. Therefore, we will also mention the relative bias and the spectrum concentration associated with spectrum estimation. We define εbias(f) as follows:

Eq. (18)

εbias(f)=S^(f)S(f)S(f).

This expression is related to the εalias and εleakage parameters defined in Ref. 11. The spectrum concentration is evaluated using Eq. (12) in terms of for a given bandwidth W. To evaluate εbias, we obtain each spectrum estimate from the average of 10,000 individual spectrum estimates with 2048 points per line and sampling distance Δ=1  nm. The 2048 points of each line are taken from a longer line with 4096 points generated using the Thorsos method applied to the Palasantzas model. We obtain λ for each taper using simulated sequences of 20,480 points with the middle N=2048 points assigned with taper values and the remaining points assigned a value of zero.

In Fig. 2, we illustrate six power spectrum estimates assuming σnoise=0.5  nm. We will next offer more details about the simulations for the various power spectrum estimates.

Fig. 2

Spectrum estimates: (a) periodogram and (b) modified periodogram with the Welch taper over 26 lines; (c, d) WOSA with three overlapped segments and with four circularly overlapped segments over 26 lines; (e) multitaper with DPSS tapers and adaptive weights, and (f) sinusoidal tapers without adaptive weights over 14 lines. Here σnoise=0.5  nm. The dashed and curved lines are the K-correlation model used in the Thorsos method to generate random roughness. The solid and curved lines are the power spectrum estimates.

JM3_16_3_034001_f002.png

For the simulations using the periodogram and the modified periodogram with a Welch window every spectrum estimate comes from the average of 26 lines; we consider the 95% confidence intervals based on a chi-squared distribution with 52 d.o.f. We do 1000 simulations per technique. We report the average widths of confidence intervals and coverage rates over all frequency points in Table 2. The estimated confidence intervals based on the modified periodogram with the Welch window are, on average, nearly equal in width as those based on the periodogram, but the periodogram tends to have worse coverage rates because of its higher potential of spectral leakage.

Table 2

Overall results from 1000 simulations with one spectrum estimate coming from 26 lines for the periodogram and the modified periodogram with a Welch window. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each 95% confidence interval is computed in terms of a χ2 distribution with 52 d.o.f. as discussed in Ref. 13.

σnoiseSNR (dB)Coverage rate of periodogramAverage width of confidence interval (nm3)Coverage rate of Welch windowAverage width of confidence interval (nm3)
00.91561.920.94981.91
0.59.5420.94892.120.94992.12
1.03.5220.94892.740.94962.74
1.500.94963.780.94993.78
2.02.4990.94965.220.94955.22

The simulation values of εbias for the periodogram and the modified periodogram with the Welch taper are εbias=0.0836716 and εbias=0.0000699, respectively. The simulation values of λ for the rectangular taper in a periodogram and the Welch single taper in a modified periodogram assuming N=2048 and W=4/2048  nm1 are λ=0.974749 and λ=0.999627, respectively.

We next report results on the WOSA method and the variant with circular overlap. We choose the Welch taper and r=50%. Every segment has Nseg=1024 points. We average over 14 lines (see Table 3) or 26 lines (see Table 4) to obtain one spectrum estimate. As discussed in Eqs. (5) and (6), for a line of N=2048 points, we use three segments per line for the classical WOSA method and four segments per line for the variant with circular overlap. The simulations with the WOSA methods consistently attain better performance than those for the single-taper spectrum estimates. As expected, we attain better performance by averaging over 26 lines than by averaging over 14 lines. The WOSA method with circular overlap is better than the classical WOSA method in terms of confidence interval widths and coverage rates. However, we will see that the classical WOSA method has a smaller bias than the circular variation for the simulated data we consider.

Table 3

Overall results from 1000 simulations with one spectrum estimate coming from 14 lines. The results for the classical WOSA method assume Welch windows, r=50%, and three segments per line. For the variant with circular overlap, assume Welch windows, r=50% and four segments per line. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

σnoise (nm)SNR (dB)Coverage rate d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate d.o.f.Average width of confidence interval (nm3)/d.o.f.
Classical WOSA
00.9739/561.89/560.9310/841.51/84
0.59.5420.9743/562.09/560.9320/841.66/84
13.5220.9739/562.68/560.9313/842.14/84
1.500.9743/563.67/560.9315/842.93/84
22.4990.9744/565.06/560.9317/844.04/84
Circular WOSA
00.9692/701.67/700.9133/1121.29/112
0.59.5420.9697/701.84/700.9142/1121.43/112
13.5220.9694/702.37/700.9135/1121.84/112
1.500.9697/703.24/700.9126/1122.51/112
22.4990.9698/704.46/700.9132/1123.46/112

Table 4

Overall results from 1000 simulations with one spectrum estimate coming from 26 lines. The results for the classical WOSA method assume Welch windows, r=50%, and three segments per line. For the variant with circular overlap, assume Welch windows, r=50%, and four segments per line. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

σnoise (nm)SNR (dB)Coverage rate d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate d.o.f.Average width of confidence interval (nm3)/d.o.f.
Classical WOSA
00.9742/1041.35/1040.9310/1561.09/156
0.59.5420.9742/1041.49/1040.9315/1561.20/156
13.5220.9737/1041.90/1040.9305/1561.54/156
1.500.9739/1042.62/1040.9311/1562.11/156
22.4990.9744/1043.60/1040.9317/1562.90/156
Circular WOSA
00.9690/1301.19/1300.9135/2080.93/208
0.59.5420.9693/1301.32/1300.9126/2081.03/208
13.5220.9693/1301.69/1300.9123/2081.32/208
1.500.9694/1302.32/1300.9131/2081.82/208
22.4990.9696/1303.20/1300.9128/2082.50/208

We report the simulation values of εbias for the classical WOSA methods and the variant with circular overlap in Table 5. The results for the periodogram and the single Welch taper are also listed here for comparison. While the WOSA methods have better variance than the periodogram and the modified periodogram with a single Welch taper for a line with the same number of samples and distance between samples, the frequency resolution of WOSA methods is worse because of the decreased length of individual segments.

Table 5

Average simulation values of εbias from 10,000 randomly generated lines for the periodogram, the modified periodogram with the Welch taper, the classical WOSA method with three segments per line, and the variant with circular overlap and four segments per line when σnoise=0  nm. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7

PeriodogramSingle Welch taperClassical WOSACircular WOSA
εbias0.08367160.00006990.00003420.0001851

We conclude this section with some remarks on the multitaper methods. The choices of NW and the number of tapers used are not unique. In examining error bar performance, we use six NW=4 DPSS tapers and six sinusoidal tapers (see Fig. 3). We produce one spectrum estimate by averaging individual estimates from 14 lines; we initially consider 168 d.o.f and subsequently lower this by 20 or 30 to match or surpass the coverage rates of the first two techniques. The results are summarized in Table 6 and demonstrate the superiority of the multitaper method over the periodogram, the modified periodogram with a Welch window, and the WOSA methods.

Fig. 3

(a) The first to third and (b) the fourth to sixth lowest order Slepian sequences of length 2048 with NW=4. (c) The first to third and (d) the fourth to sixth lowest order sinusoidal tapers of length 2048.

JM3_16_3_034001_f003.png

Table 6

Overall results from 1000 simulations with one spectrum estimate coming from 14 lines for the multitaper method with Slepian tapers and sinusoidal tapers. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

σnoise (nm)SNR (dB)Coverage rate d.o.f.=168Average width of confidence interval (nm3)/d.o.f. = 168Coverage rate/d.o.f.Average width of confidence interval (nm3)/d.o.f.
Slepian (DPSS) tapers
00.93591.010.9516/1481.08/148
0.59.5420.94931.120.9689/1381.24/138
1.03.5220.94861.450.9684/1381.61/138
1.500.94922.000.9687/1382.22/138
2.02.4990.94872.770.9686/1383.07/138
Sinusoidal tapers
00.94901.010.9625/1481.08/148
0.59.5420.94901.120.9687/1381.24/138
13.5220.94971.450.9691/1381.61/138
1.500.94952.000.9689/1382.21/138
22.4990.94882.760.9683/1383.06/138

The effective bandwidth and the number of tapers can be adjusted to obtain different trade-offs between bias and spectrum concentration. Table 7 lists the average simulation value of εbias over all frequencies for the sinusoidal tapers and the DPSS tapers. There is a tendency for εbias to increase with the number of tapers, but we have seen that the multitaper methods reduce the variance of the estimates when one has access to relatively few lines. The sinusoidal tapers have smaller εbias than the DPSS tapers since the sinusoidal tapers approximate the minimum bias tapers.

Table 7

Average simulation values of εbias from 10,000 randomly generated lines for the DPSS tapers and the sinusoidal tapers when σnoise=0  nm. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 The corresponding results for the periodogram and the modified periodogram with a single Welch taper are εbias=0.0836716 and εbias=−0.0000699, respectively.

TSinusoidalDPSS (NW=2)DPSS (NW=4)DPSS (NW=4, adaptive)DPSS (NW=8)
10.00006860.00000040.00007240.00007240.0001705
20.00000870.00114320.00022950.00022950.0004815
30.00001860.01114420.00021410.00021400.0006259
40.00023640.05353320.00034800.00032400.0008320
50.00038810.14045330.00060030.00119710.0010770
60.00066400.25120910.00260470.00358630.0013110
70.00088450.36417710.01426360.00386430.0014573

Table 8 lists the spectrum concentration results in terms of λ for the first seven NW=4 DPSS tapers and the first seven sinusoidal tapers with N=2048, W=4/2048  nm1, and sampling distance Δ=1  nm. As expected, the DPSS tapers offer better results because they are obtained from the original spectrum concentration problem.

Table 8

The simulation values of λ for the first seven NW=4 DPSS tapers and the first seven sinusoidal tapers with N=2048, W=4/2048  nm−1, and sampling distance Δ=1  nm. The corresponding values of λ for the rectangular window in a periodogram and the Welch single taper in a modified periodogram under same setting are λ=0.974749 and λ=0.999627, respectively.

t1234567
DPSS1.0000001.0000000.9999990.9999730.9995200.9938930.947127
Sinusoidal0.9997470.9988830.9973320.9942290.9890020.9766730.937289

5.

Conclusions

Most of the metrology literature that mentions power spectrum estimation refers only to the periodogram, which is over a century old. We propose using the more modern multitaper and multisegment spectral estimation techniques. For LER metrology over a relatively small group of lines, we assess the effectiveness of a spectrum estimate by the widths of confidence intervals and by the experimental coverage rates. For a broader discussion of the trade-offs among different techniques, we also considered the relative bias and spectrum concentration of the various estimates. We investigated the performance of the periodogram, the modified periodogram with the Welch taper, the Welch’s overlapped segment averaging method and the variant with circular overlap with the Welch taper, and the multitaper methods using DPSS tapers with adaptive weights and sinusoidal tapers without adaptive weights for the spectrum estimation of random rough lines at five different noise levels. The multitaper methods offer the smallest average error bars among these techniques at a given coverage rate while the Welch’s overlapped segment averaging method is not quite as effective but may be a better candidate for in-line metrology because of an existing hardware implementation. These results applied not only when we consider an average performance over all frequencies but also in the low-, the middle-, and the high-frequency ranges. The average widths of confidence intervals invariably increased with increasing noise, so improvements in power spectrum estimation may be even more important at high noise levels. In recent work, we used the multitaper method in an attempt to reduce metrology errors in LER estimation from simulated low-dose SEM images.40 For a future direction of research, we observe that the WOSA variant with circular overlap is potentially interesting for estimating the power spectrum of contact edge roughness since the edge points of a contact hole have approximately a circular shape, i.e, the first and last sampled edge positions are close in space.41

Appendices

Appendix A:

Simulations for the Low-Frequency Region

In Table 9, we compile the average coverage rate and confidence interval width over points in the low-frequency region for eight power spectrum estimation methods when σnoise=0.5  nm. The multitaper methods offer the best error bar performance among the eight methods in the low-frequency region.

Table 9

Power spectrum estimates when σnoise=0.5  nm. The low-frequency region is defined as all frequencies below 1/200  nm−1. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

No. of lines per estimateCoverage rate/d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate/d.o.f.Average width of confidence interval (nm3)/d.o.f.
Periodogram (left) and modified periodogram with single Welch taper (right)
260.9370/52102.06/520.9361/52102.97/52
Multisegment: classical WOSA with three segments per line
140.9070/8480.39/840.9550/56100.69/56
Multisegment: circular WOSA with four segments per line
140.8888/11268.65/1120.9534/7088.51/70
Multisegment: classical WOSA with three segments per line
260.9066/15657.77/1560.9580/10471.61/104
Multisegment: circular WOSA with four segments per line
260.8840/20849.72/2080.9474/13063.42/130
Multitaper: six DPSS tapers per line and adaptive weights
140.9158/16853.68/1680.9434/13859.52/138
Multitaper: six sinusoidal tapers per line and nonadaptive weights
140.9187/16853.57/1680.9444/13859.40/138

Appendix B:

Simulations for the Middle-Frequency Region

In Table 10, we compile the average coverage rate and confidence interval width over points in the middle-frequency region for eight power spectrum estimation methods when σnoise=0.5  nm. The multitaper methods offer the best error bar performance among the eight methods in the middle-frequency region.

Table 10

Power spectrum estimates when σnoise=0.5  nm. The middle-frequency region is defined as all frequencies between 1/200 and 1/20  nm−1. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

No. of lines per estimateCoverage rate/d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate/d.o.f.Average width of confidence interval (nm3)/d.o.f.
Periodogram (left) and modified periodogram with single Welch taper (right)
260.9471/529.96/520.9496/529.89/52
Multisegment: classical WOSA with three segments per line
140.9320/847.85/840.9737/569.83/56
Multisegment: circular WOSA with four segments per line
140.9139/1126.75/1120.9699/708.70/70
Multisegment: classical WOSA with three segments per line
260.9323/1565.67/1560.9743/1047.03/104
Multisegment: circular WOSA with four segments per line
260.9133/2084.88/2080.9695/1306.25/130
Multitaper: six DPSS tapers per line and adaptive weights
140.9459/1685.30/1680.9666/1385.87/138
Multitaper: six sinusoidal tapers per line and nonadaptive weights
140.9463/1685.28/1680.9665/1385.86/138

Appendix C:

Simulations for the High-Frequency Region

In Table 11, we compile the average coverage rate and confidence interval width over points in the high-frequency region for eight power spectrum estimation methods when σnoise=0.5  nm. The multitaper methods offer the best error bar performance among the eight methods in the high-frequency region.

Table 11

Power spectrum estimates when σnoise=0.5  nm. The high-frequency region is defined as all frequencies above 1/20  nm−1. The parameters of the Palasantzas model are LER=1.5  nm, correlation length=25  nm, and roughness exponent=0.75.7 Each confidence interval is computed in terms of a χ2 distribution as discussed in Ref. 13. The d.o.f. are reported within the table.

No. of Lines per estimateCoverage rate /d.o.f.Average width of confidence interval (nm3)/d.o.f.Coverage rate /d.o.f.Average width of confidence interval (nm3)/d.o.f.
Periodogram (left) and modified periodogram with single Welch taper (right)
260.9492/520.26/520.9501/520.26/52
Multisegment: classical WOSA with three segments per line
140.9323/840.20/840.9746/560.25/56
Multisegment: circular WOSA with four segments per line
140.9145/1120.17/1120.9699/700.22/70
Multisegment: classical WOSA with three segments per line
260.9317/1560.14/1560.9743/1040.18/104
Multisegment: circular WOSA with four segments per line
260.9129/2080.12/2080.9695/1300.16/130
Multitaper: six DPSS tapers per line and adaptive weights
140.9501/1680.14/1680.9694/1380.15/138
Multitaper: six sinusoidal tapers per line and nonadaptive weights
140.9496/1680.14/1680.9692/1380.15/138

Acknowledgments

This work was supported in part by the National Science Foundation (NSF), Award No. ECCS-1201994. The second author is grateful to Y. Borodovsky and C. A. Mack for helpful conversations and to T. Groves for various correspondences.

References

1. 

H. J. Levinson, Principles of Lithography, 198 SPIE Press, Bellingham, Washington (2010). Google Scholar

2. 

L. Sun et al., “Line edge roughness frequency analysis for SAQP process,” Proc. SPIE, 9780 97801S (2016). http://dx.doi.org/10.1117/12.2229176 PSISDG 0277-786X Google Scholar

3. 

R.-H. Kim et al., “Application of EUV resolution enhancement techniques (RET) to optimize and extend single exposure bi-directional patterning for 7 nm and beyond logic designs,” Proc. SPIE, 9776 97761R (2016). http://dx.doi.org/10.1117/12.2219177 Google Scholar

4. 

B. Su, E. Solecky and A. Vaid, Introduction to Metrology Applications in IC Manufacturing, TT101 SPIE Press, Bellingham, Washington (2015). Google Scholar

5. 

V. Constantoudis et al., “Quantification of line-edge roughness of photoresists. II. Scaling and fractal analysis and the best roughness descriptors,” J. Vac. Sci. Technol., 21 (3), 1019 –1026 (2003). http://dx.doi.org/10.1116/1.1570844 Google Scholar

6. 

A. Hiraiwa and A. Nishida, “Discrete power spectrum of line width roughness,” J. Appl. Phys., 106 (7), 074905 (2009). http://dx.doi.org/10.1063/1.3226883 JAPIAU 0021-8979 Google Scholar

7. 

T. Verduin, P. Kruit and C. W. Hagen, “Determination of line edge roughness in low-dose top-down scanning electron microscopy images,” J. Micro/Nanolithogr. MEMS MOEMS, 13 (3), 033009 (2014). http://dx.doi.org/10.1117/1.JMM.13.3.033009 Google Scholar

8. 

E. Baravelli et al., “Impact of line-edge roughness on FinFET matching performance,” IEEE Trans. Electron Devices, 54 (9), 2466 –2474 (2007). http://dx.doi.org/10.1109/TED.2007.902166 IETDAI 0018-9383 Google Scholar

9. 

A. Asenov, S. Kayaz and A. R. Brown, “Intrinsic parameter fluctuations in decananometer MOSFETs introduced by gate line edge roughness,” IEEE Trans. Electron Devices, 50 (5), 1254 –1260 (2003). http://dx.doi.org/10.1109/TED.2003.813457 IETDAI 0018-9383 Google Scholar

10. 

D. B. Percival and A. T. Walden, Spectral Analysis for Physical Applications, Cambridge University Press, Cambridge (1993). Google Scholar

11. 

C. A. Mack, “Systematic errors in the measurement of power spectral density,” J. Micro/Nanolithogr. MEMS MOEMS, 12 (3), 033016 (2013). http://dx.doi.org/10.1117/1.JMM.12.3.033016 Google Scholar

12. 

B. D. Bunday and C. A. Mack, “Influence of metrology error in measurement of line edge roughness power spectral density,” Proc. SPIE, 9050 90500G (2014). http://dx.doi.org/10.1117/12.2047100 Google Scholar

13. 

C. A. Mack, “More systematic errors in the measurement of power spectral density,” J. Micro/Nanolithogr. MEMS MOEMS, 14 (3), 033502 (2015). http://dx.doi.org/10.1117/1.JMM.14.3.033502 Google Scholar

14. 

G. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications, Holden-Day, San Francisco (1968). Google Scholar

15. 

A. Hiraiwa and A. Nishida, “Spectral analysis of line edge and line-width roughness with long-range correlation,” J. Appl. Phys., 108 (3), 034908 (2010). http://dx.doi.org/10.1063/1.3466777 JAPIAU 0021-8979 Google Scholar

16. 

L. Sun et al., “Application of frequency domain line edge roughness characterization methodology in lithography,” Proc. SPIE, 9424 942404 (2015). http://dx.doi.org/10.1117/12.2086961 PSISDG 0277-786X Google Scholar

17. 

E. Dupuy et al., “Spectral analysis of sidewall roughness during resist-core self-aligned double patterning integration,” J. Vac. Sci. Technol. B, 34 (5), 051807 (2016). http://dx.doi.org/10.1116/1.4962322 Google Scholar

18. 

S. Levi et al., “Edge roughness characterization of advanced patterning processes using power spectral density analysis (PSD),” Proc. SPIE, 9782 97820I (2016). http://dx.doi.org/10.1117/12.2220814 PSISDG 0277-786X Google Scholar

19. 

L. Azarnouche et al., “Unbiased line width roughness measurements with critical dimension scanning electron microscopy and critical dimension atomic force microscopy,” J. Appl. Phys., 111 (8), 084318 (2012). http://dx.doi.org/10.1063/1.4705509 JAPIAU 0021-8979 Google Scholar

20. 

P. Welch, “The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms,” IEEE Trans. Audio Electroacoust., 15 (2), 70 –73 (1967). http://dx.doi.org/10.1109/TAU.1967.1161901 ITADAS 0018-9278 Google Scholar

21. 

D. J. Thomson, “Spectrum estimation and harmonic analysis,” Proc. IEEE, 70 (9), 1055 –1096 (1982). http://dx.doi.org/10.1109/PROC.1982.12433 IEEPAD 0018-9219 Google Scholar

22. 

K. S. Riedel and A. Sidorenko, “Minimum bias multiple taper spectral estimation,” IEEE Trans. Signal Process., 43 (1), 188 –195 (1995). http://dx.doi.org/10.1109/78.365298 ITPRED 1053-587X Google Scholar

23. 

M. S. Bartlett, “Smoothing periodograms from time series with continuous spectra,” Nature, 161 (4096), 686 –687 (1948). http://dx.doi.org/10.1038/161686a0 Google Scholar

24. 

K. K. Parhi and M. Ayinala, “Low-complexity Welch power spectral density computation,” IEEE Trans. Circuits Syst. Regul. Pap., 61 (1), 172 –182 (2014). http://dx.doi.org/10.1109/TCSI.2013.2264711 Google Scholar

25. 

K. Barbé, R. Pintelon and J. Schoukens, “Welch method revisited: nonparametric power spectrum estimation via circular overlap,” IEEE Trans. Signal Process., 58 (2), 553 –565 (2010). http://dx.doi.org/10.1109/TSP.2009.2031724 ITPRED 1053-587X Google Scholar

26. 

P. Mitra and H. Bokil, Observed Brain Dynamics, Oxford University Press, New York (2007). Google Scholar

27. 

M. E. Mann and J. M. Lees, “Robust estimation of background noise and signal detection in climatic time series,” Clim. Change, 33 (3), 409 –445 (1996). http://dx.doi.org/10.1007/BF00142586 CLCHDX 0165-0009 Google Scholar

28. 

S. J. Gibbons, F. Ringdal and T. Kværna, “Detection and characterization of seismic phases using continuous spectral estimation on incoherent and partially coherent arrays,” Geophys. J. Int., 172 (1), 405 –421 (2008). http://dx.doi.org/10.1111/gji.2008.172.issue-1 GJINEA 0956-540X Google Scholar

29. 

S. Haykin, “Cognitive radio: brain-empowered wireless communications,” IEEE J. Sel. Areas Commun., 23 (2), 201 –220 (2005). http://dx.doi.org/10.1109/JSAC.2004.839380 Google Scholar

30. 

I. K. Fodor and P. B. Stark, “Multitaper spectrum estimation for time series with gaps,” IEEE Trans. Signal Process., 48 (12), 3472 –3483 (2000). http://dx.doi.org/10.1109/78.887039 ITPRED 1053-587X Google Scholar

31. 

D. Slepian, “Prolate spheroidal wave functions, Fourier analysis, and uncertainty—V: the discrete case,” Bell Labs Tech. J., 57 (5), 1371 –1430 (1978). http://dx.doi.org/10.1002/bltj.1978.57.issue-5 Google Scholar

32. 

E. Anderson et al., LAPACK Users’ Guide, SIAM, Philadelphia (1999). Google Scholar

33. 

A. Papoulis, “Minimum-bias windows for high-resolution spectral estimates,” IEEE Trans. Inf. Theory, 19 (1), 9 –12 (1973). http://dx.doi.org/10.1109/TIT.1973.1054956 IETTAW 0018-9448 Google Scholar

34. 

E. I. Thorsos, “The validity of the Kirchhoff approximation for rough surface scattering using a Gaussian roughness spectrum,” J. Acoust. Soc. Am., 83 (1), 78 –92 (1988). http://dx.doi.org/10.1121/1.396188 JASMAN 0001-4966 Google Scholar

35. 

C. A. Mack, “Generating random rough edges, surfaces, and volumes,” Appl. Opt., 52 (7), 1472 –1480 (2013). http://dx.doi.org/10.1364/AO.52.001472 APOPAI 0003-6935 Google Scholar

36. 

G. Palasantzas, “Roughness spectrum and surface width of self-affine fractal surfaces via the K-correlation model,” Phys. Rev. B, 48 (19), 14472 –14478 (1993). http://dx.doi.org/10.1103/PhysRevB.48.14472 Google Scholar

37. 

B. D. Bunday et al., “Determination of optimal parameters for CD-SEM measurement of line-edge roughness,” Proc. SPIE, 5375 515 (2004). http://dx.doi.org/10.1117/12.535926 Google Scholar

38. 

J. S. Villarrubia and B. D. Bunday, “Unbiased estimation of linewidth roughness,” Proc. SPIE, 5752 480 (2005). http://dx.doi.org/10.1117/12.599981 Google Scholar

39. 

A. Hiraiwa and A. Nishida, “Image-noise effect on discrete power spectrum of line-edge and line-width roughness,” Jpn. J. Appl. Phys., 50 016602 (2011). http://dx.doi.org/10.7567/JJAP.50.016602 Google Scholar

40. 

Y. Luo and S. A. Savari, “Reduction of metrology error for line-edge roughness measurement from low-dose SEM images,” in 61st Int. Conf. on Electron, Ion, and Photon Beam Technology and Nanofabrication (EIPBN), (2017). http://eipbn.omnibooksonline.com Google Scholar

41. 

V.-K. Murugesan-Kuppuswamy, V. Constantoudis and E. Gogolides, “Contact edge roughness: characterization and modeling,” Microelectron. Eng., 88 (8), 2492 –2495 (2011). http://dx.doi.org/10.1016/j.mee.2011.02.003 MIENEF 0167-9317 Google Scholar

Biography

Yao Luo is currently pursuing her PhD in electrical engineering at Texas A&M University. She received her BS degree in electronic science and technology from Southeast University, China, in 2013. Since then, she attended Texas A&M University and has been working under the supervision of Dr. Serap Savari. Her current research interests include statistical signal processing and data compression with applications in VLSI fabrication.

Serap A. Savari is on the faculty of Texas A&M University. She has served on the program committee of the annual Data Compression Conference since 2000. From 2002 to 2005, she was an associate editor for the IEEE Transactions on Information Theory. She has served on the program committees for several conferences and workshops in information theory, and she joined the program committee of the 33rd European Mask and Lithography Conference (EMLC 2017) in January.

Floating objects

© 2017 Society of Photo-Optical Instrumentation Engineers (SPIE) 1932-5150/2017/$25.00 © 2017 SPIE
Yao Luo and Serap A. Savari "Multitaper and multisegment spectral estimation of line-edge roughness," Journal of Micro/Nanolithography, MEMS, and MOEMS 16(3), 034001 (22 August 2017). https://doi.org/10.1117/1.JMM.16.3.034001
Received: 7 April 2017; Accepted: 24 July 2017; Published: 22 August 2017
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Line edge roughness

Diode pumped solid state lasers

Statistical analysis

Error analysis

Signal to noise ratio

Metrology

Edge roughness


CHORUS Article. This article was made freely available starting 22 August 2018

Back to Top