The research project is a narrative-based jailbreak framework that exposes vulnerabilities in multimodal large language models through immersive, context-rich prompts.
MIRAGE is a jailbreak framework designed to expose the vulnerabilities of multimodal large language models (MLLMs). Instead of using direct or brute-force prompts, it employs narrative-driven interactions, embedding instructions within stories, character roles, and multi-turn visual context, in order to bypass safety filters. By simulating immersive, realistic scenarios, MIRAGE tricks models into revealing restricted content. Its high success rate demonstrates that current MLLM defenses remain fragile against context-rich, indirect attacks.
An example demonstrating how adopting a detective persona (role-immersion) in a multi-turn visual storytelling framework results in a response in letter format (structured format) containing harmful information from multimodal large language models.
Our method proposed method MIRAGE, inspired by the realm of literary creation, involves two stages: (i) multi-turn visual storytelling and (ii) role-immersion through narrative.
We show the results about the attack success rates (ASR) of various jailbreak methods on six baselines, including Vanilla-Text (Ma et al., 2024), FigStep (Gong et al., 2023), Query-Relevant (Liu et al., 2024), HADES (Li et al., 2025), and Visual-RolePlay (Ma et al., 2024). MIRAGE consistently outperforms prior approaches on both white-box and black-box multimodal models, achieving the highest ASR in most settings. Blue and green highlights indicate the best and second-best results, respectively.
If you want to reference our work, you can use and check the following BibTeX citation:
@article{you2025mirage,title={MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks},author={You, Wenhao and Hooi, Bryan and Wang, Yiwei and Wang, Youke and Ke, Zong and Yang, Ming-Hsuan and Huang, Zi and Cai, Yujun},journal={arXiv preprint arXiv:2503.19134},year={2025}}
References
2025
ECCV
Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li, Hangyu Guo, Kun Zhou, and 2 more authors
@article{li2024images,author={Li, Yifan and Guo, Hangyu and Zhou, Kun and Zhao, Wayne Xin and Wen, Ji-Rong},title={Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models},booktitle={Computer Vision -- ECCV 2024},year={2025},publisher={Springer Nature Switzerland},address={Cham},pages={174--189},isbn={978-3-031-73464-9},url={https://link.springer.com/content/pdf/10.1007/978-3-031-73464-9.pdf}}
2024
arXiv
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte
@article{ma2024visual,title={Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Characte},author={Ma, Siyuan and Luo, Weidi and Wang, Yu and Liu, Xiaogeng and Chen, Muhao and Li, Bo and Xiao, Chaowei},journal={arXiv preprint arXiv:2405.20773},year={2024},url={https://arxiv.org/pdf/2405.20773}}
ECCV
Mm-safetybench: A benchmark for safety evaluation of multimodal large language models
Xin Liu, Yichen Zhu, Jindong Gu, and 3 more authors
@article{liu2024mm,title={Mm-safetybench: A benchmark for safety evaluation of multimodal large language models},author={Liu, Xin and Zhu, Yichen and Gu, Jindong and Lan, Yunshi and Yang, Chao and Qiao, Yu},booktitle={European Conference on Computer Vision},pages={386--403},year={2024},organization={Springer},url={https://arxiv.org/pdf/2311.17600}}
arXiv
Jailbreakv-28k: A benchmark for assessing the robustness of multimodal large language models against jailbreak attacks
Weidi Luo, Siyuan Ma, Xiaogeng Liu, and 2 more authors
@article{luo2024jailbreakv,title={Jailbreakv-28k: A benchmark for assessing the robustness of multimodal large language models against jailbreak attacks},author={Luo, Weidi and Ma, Siyuan and Liu, Xiaogeng and Guo, Xiaoyu and Xiao, Chaowei},journal={arXiv preprint arXiv:2404.03027},year={2024},url={https://arxiv.org/pdf/2404.03027}}
ICML
HarmBench: a standardized evaluation framework for automated red teaming and robust refusal
Mantas Mazeika, Long Phan, Xuwang Yin, and 9 more authors
@article{mazeika2024harmbench,author={Mazeika, Mantas and Phan, Long and Yin, Xuwang and Zou, Andy and Wang, Zifan and Mu, Norman and Sakhaee, Elham and Li, Nathaniel and Basart, Steven and Li, Bo and Forsyth, David and Hendrycks, Dan},title={HarmBench: a standardized evaluation framework for automated red teaming and robust refusal},year={2024},publisher={JMLR.org},booktitle={Proceedings of the 41st International Conference on Machine Learning},articleno={1431},numpages={44},series={ICML'24},url={https://arxiv.org/pdf/2402.04249}}
CVPR
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, and 8 more authors
@article{chen2024internvl,title={Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks},author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and others},booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},pages={24185--24198},year={2024},url={https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_InternVL_Scaling_up_Vision_Foundation_Models_and_Aligning_for_Generic_CVPR_2024_paper.pdf}}
@article{gong2023figstep,title={Figstep: Jailbreaking large vision-language models via typographic visual prompts},author={Gong, Yichen and Ran, Delong and Liu, Jinyuan and Wang, Conglei and Cong, Tianshuo and Wang, Anyu and Duan, Sisi and Wang, Xiaoyun},journal={arXiv preprint arXiv:2311.05608},year={2023},url={https://arxiv.org/pdf/2311.05608}}
arXiv
Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Qingyang Wu, and 1 more author
@article{liu2023visualinstructiontuning,title={Visual Instruction Tuning},author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},year={2023},eprint={2304.08485},archiveprefix={arXiv},primaryclass={cs.CV},url={https://arxiv.org/abs/2304.08485}}
arXiv
Qwen-vl: A frontier large vision-language model with versatile abilities
Jinze Bai, Shuai Bai, Shusheng Yang, and 6 more authors
@article{bai2023qwen,title={Qwen-vl: A frontier large vision-language model with versatile abilities},author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},journal={arXiv preprint arXiv:2308.12966},year={2023},url={https://arxiv.org/pdf/2308.12966}}
arXiv
Gemini: a family of highly capable multimodal models
Gemini Team, Rohan Anil, Sebastian Borgeaud, and 8 more authors
@article{team2023gemini,title={Gemini: a family of highly capable multimodal models},author={Team, Gemini and Anil, Rohan and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Yu, Jiahui and Soricut, Radu and Schalkwyk, Johan and Dai, Andrew M and Hauth, Anja and Millican, Katie and others},journal={arXiv preprint arXiv:2312.11805},year={2023},url={https://arxiv.org/pdf/2312.11805}}