Paper
18 November 2024 Defense methods against multi-language and multi-intent LLM attacks
Sunjia Fan, Yichao Yang, Weiqi Huang, Ke Ma, Yucen Liu, Yuxin Zheng
Author Affiliations +
Proceedings Volume 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) ; 134031L (2024) https://doi.org/10.1117/12.3051624
Event: International Conference on Algorithms, High Performance Computing, and Artificial Intelligence, 2024, Zhengzhou, China
Abstract
This paper proposes a defense framework against jailbreak attacks that exploit multi-language and multi-intent inputs. Research indicates these attacks are effective primarily due to two reasons: (1) LLMs may incorrectly capture key points and semantics in low-resource language inputs, generating malicious content; (2) multiple requests within a single input can cause attention flickering, leading to inadequate capture of implicit requests and incorrect responses. The proposed defense framework requires no additional training and works by mapping multi-language inputs to high-resource languages, guiding the model to think multiple times, decompose intents, and reflect. Experimental results show significant effectiveness of this framework in defending against attacks.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Sunjia Fan, Yichao Yang, Weiqi Huang, Ke Ma, Yucen Liu, and Yuxin Zheng "Defense methods against multi-language and multi-intent LLM attacks", Proc. SPIE 13403, International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) , 134031L (18 November 2024); https://doi.org/10.1117/12.3051624
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Defense and security

Reflection

Machine learning

Safety

Education and training

Data modeling

Semantics

Back to Top