Figure 1(a) and 1(b) shows flowcharts of the whole GPU-based program for FD-OCT and the fast DEFR, respectively. Non-parallelizable steps that are independent of the number of depth scans were separated from the main processing loop and were implemented on the CPU. The processes on the left [Fig. 1(a)—light gray frame] used the CPU for the initialization, while the ones on the right (dark gray frame) were fully executed on the GPU. Thick arrows denote data flow from host to GPU memory or back, which is a critical resource when analyzing multiple gigabytes of data. During the recording procedure the background was recorded when the sample arm was blocked. The relationship between the spectral index space and the wavenumber -space was calibrated so as to generate a nonlinear index vector which corresponded to a uniform coordinate in -space. Dispersion coefficients were extracted in advance using minimum information entropy estimation of the spatial-domain signal adopted in our work.6 Here, only the second- and third-order dispersion coefficients were determined and used to generate the complex dispersion phase vector and its conjugate term, the inverse dispersion phase vector . This was sufficient in our measurements, since we mainly introduced these orders when inserting non-linear glass into the reference arm and it resulted in an almost optimal axial point spread function that corresponded to the free-space resolution. Then, the threshold for the multi-peak detector (MPD) in the fast DEFR algorithm was calculated from the OCT system background and .16 All these parameters were prepared in prior and transferred to the GPU memory only once in the initialization step. The spectral interference vectors acquired in nonlinear -space were copied from host to GPU memory, and converted from 16-bit integer (acquired spectrum) to 32-bit floating-point type (single precision). Then, background subtraction, up-sampling, -space linearization, spectral shaping, fast DEFR, and log scaling of the magnitude signal were implemented within the computing pipeline of the GPU. The resulting conjugate artifact-removed image was finally converted from 32-bit floating-point back to 16-bit integer for storage outside the GPU or passed further to the GPU’s display buffer. We employed up-sampling by a factor of two and linear-interpolation-based resampling for -space linearization. 18 demonstrated that linear interpolation was superior to the nearest neighbor interpolation, but slightly inferior to cubic spline interpolation, in the reconstruction quality of OCT images. It is noteworthy that the computational time of linear interpolation was exactly the same as that of nearest neighbor interpolation using GPU because of the hard-wired texture memory mechanism.18 Each spectrum was apodized by a Hann window to suppress side-lobes formation in the image and linearize the response within samples of fractional frequencies (i.e., frequencies not exactly matching the sampling before discrete Fourier transform). The detailed fast-DEFR algorithm is shown in Fig. 1(b). The algorithm alternatively can be executed only partially, without iteration to be equivalent to a standard frequency-time conversion, including mapping after numeric dispersion compensation. In the iterative process loop single-sided signal peaks are extracted via the MPD from the dispersion-corrected signal within the suppression-intensity range given by the dispersive broadening of the corresponding mirror peaks. By generating a synthetic double-sided spectrum that contains the artifacts and successive subtraction from the original signal, the original spectrum is reduced to the undetermined signal components, forming the residuum. For a single depth scan, the fast DEFR must perform iterations of fast Fourier transform (FFT) and inverse FFT (iFFT) respectively, and iterations of MPD. Loading the residual signals requires an additional iFFT, which principally can be deactivated. In this work, the residual signals were loaded.