Cross-modal retrieval aims to find alignment relationships between different modalities and then compute the semantic similarities used for ranking. Because of the data distribution difference and inherent heterogeneity gap between modalities, a classic solution is to learn common representations in the common space, which could preserve the discrimination among the samples from different categories and alleviate the cross-modal discrepancy. To achieve this, we propose a method, termed LDCA, to learn discriminative common alignments based on the modal representations. LDCA utilizes a modality invariance loss that pushes away the hardest negative sample to further reduce the cross-modal discrepancy at the feature level. In addition, LDCA seeks alignments in the label space to improve the intra-modal discrimination by an effective cross-modal label loss. Extensive experiments are conducted on five widely used cross-modal datasets to evaluate the proposed LDCA. The integral experimental results prove the method’s superiority, and the comprehensive analyses verify the effectiveness of the method. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Education and training
Semantics
Feature extraction
Visualization
Ablation
Design
Multimedia