The turbid medium we choose is a slab. The source is set at the midpoint of the top side, which is defined as the origin in our simulation (the Cartesian coordinate is used here). The region of interest in the first step of GP2MC is a square centered at [0, 0, and 30 mm]. Thirty one detectors are placed at the bottom surface, aligned from (0, 0, and 30 mm) to (30, 0, and 30 mm) at intervals of 1 mm. These detectors are inside the region of interest. The geometry of the detector is a square of size . In the first step of GP2MC, a tissue that has spatial homogeneous optical absorption, scattering, and anisotropy coefficients of , , and , respectively, is used to generate the effective seeds ( and remain the same in our study). For fast performance of GP2MC, 3840 threads are set to perform simultaneously during the GPU calculation. Once the GPU kernel program is running, the photon completes 2000 times random walk for each thread. The probability for one photon penetrating into the turbid medium and reaching the region of interest within 200 times of random walk is less than (this result is calculated by using MC simulation). In other words, the probability for each thread to detect more than 10 photons (i.e., the number of effective seeds) within 2000 times of random walk in the region of interest is less than . Therefore, 0.3 Mbyte global device memories (i.e., byte, one thread needs byte to record up to 10 effective seeds) are more than enough to record the effective seeds in all 3840 threads and to avoid contention of global device memory. In the first step of PMC, about photons are detected in the region of interest by injecting photons. In the second step of PMC, about , , , and photons are detected in the region of interest when the scattering coefficient is , , , and , respectively. The number of the detected photons in each of the 31 detectors (corresponding to different locations of ) is shown in Fig. 1(d) for five different values of the scattering coefficient.