Image-guided interventional procedures often require registering multi-modal images to visualize and analyze complementary information. For example, prostate cancer biopsy benefits from fusing transrectal ultrasound (TRUS) imaging with magnetic resonance (MR) imaging to optimize targeted biopsy. However, cross-modal image registration is a challenging task. This is especially true when the appearance of two image modalities are vastly different. Researchers have seeked other methods bridge the modality difference. Haskins et al. 1 designed a deep similarity metric to describe the difference between MR and US images. Some works directly use a DNN to regress the spatial relationship between images .2 Aside from image similarity, Balakrishnan et al.3 also explored using segmentation DICE as a loss function to assess the registration quality. Hu et al.4 used soft probabilistic DICE of multiple landmark segmentations to provide smoother guidance for MR-TRUS registration. Such method, however, requires extensive labeling of not only organs, but small cysts and lesions as well. This level of requirement largely limits the feasibility of such method. Since registration quality is most reliably evaluated with target registration error (TRE), it is sensible to directly make use of the anatomical landmark targets from images. Moreover, while image modalities and thus textures differ, anatomical landmarks are the only information shared across the moving and the fixed images. Song et al. 5 used contrastive loss-guided pre-training methods to maximize the similarity between similar anatomical structures. Sun et al.6 proposed to perform pre-alignment for MR-TRUS registration with manually labeled landmarks on both images. However, such a procedure is still far from automatic due to the requirement of manual input at inference time. Heinrich et al.7 made use of a landmark detection method specifically designed for lung computed tomography (CT) registration, which is not generalizable to other tasks. In this work, we propose to explicitly use the landmarks of prostate to guide the MR-TRUS image registration. We first train a deep neural network to automatically localize a set of meaningful landmarks, and then directly generate the affine registration matrix from the location of these landmarks. For landmark localization, instead of directly training a network to predict the landmark coordinates, we propose to regress a full-resolution distance map of the landmark, which is demonstrated effective in avoiding statistical bias to unsatisfactory performance and thus improving performance. We then use the predicted landmarks to generate the affine transformation matrix, which outperforms the clinicians’ manual rigid registration by a significant margin in terms of TRE.
|