This paper explores the critical safety risk of typographic attacks against Vision-Large-Language-Models (Vision-LLMs) integrated into autonomous driving (AD) systems.This paper explores the critical safety risk of typographic attacks against Vision-Large-Language-Models (Vision-LLMs) integrated into autonomous driving (AD) systems.

Typographic Attacks on Vision-LLMs: Evaluating Adversarial Threats in Autonomous Driving Systems

2025/09/27 09:28
5 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

Abstract

Vision-Large-Language-Models (Vision-LLMs) are increasingly being integrated into autonomous driving (AD) systems due to their advanced visual-language reasoning capabilities, targeting the perception, prediction, planning, and control mechanisms. However, Vision-LLMs have demonstrated susceptibilities against various types of adversarial attacks, which would compromise their reliability and safety. To further explore the risk in AD systems and the transferability of practical threats, we propose to leverage typographic attacks against AD systems relying on the decision-making capabilities of Vision-LLMs. Different from the few existing works developing general datasets of typographic attacks, this paper focuses on realistic traffic scenarios where these attacks can be deployed, on their potential effects on the decision-making autonomy, and on the practical ways in which these attacks can be physically presented. To achieve the above goals, we first propose a dataset-agnostic framework for automatically generating false answers that can mislead Vision-LLMs’ reasoning. Then, we present a linguistic augmentation scheme that facilitates attacks at image-level and region-level reasoning, and we extend it with attack patterns against multiple reasoning tasks simultaneously. Based on these, we conduct a study on how these attacks can be realized in physical traffic scenarios. Through our empirical study, we evaluate the effectiveness, transferability, and realizability of typographic attacks in traffic scenes. Our findings demonstrate particular harmfulness of the typographic attacks against existing Vision-LLMs (e.g., LLaVA, Qwen-VL, VILA, and Imp), thereby raising community awareness of vulnerabilities when incorporating such models into AD systems. We will release our source code upon acceptance.

1 Introduction

Vision-Language Large Models (Vision-LLMs) have seen rapid development over the recent years [1, 2, 3], and their incorporation into autonomous driving (AD) systems have been seriously considered by both industry and academia [4, 5, 6, 7, 8, 9]. The integration of Vision-LLMs into AD systems showcases their ability to convey explicit reasoning steps to road users on the fly and satisfy the need for textual justifications of traffic scenarios regarding perception, prediction, planning, and control, particularly in safety-critical circumstances in the physical world. The core strength of VisionLLMs lies in their auto-regressive capabilities through large-scale pretraining with visual-language alignment [1], making them even able to perform zero-shot optical character recognition, grounded reasoning, visual-question answering, visual-language reasoning, etc. Nevertheless, despite their impressive capabilities, Vision-LLMs are unfortunately not impervious against adversarial attacks that can misdirect the reasoning processes [10]. Any successful attack strategies have the potential to pose critical problems when deploying Vision-LLMs in AD systems, especially those that may even bypass the models’ black-box characteristics. As a step towards their reliable adoption in AD, studying the transferability of adversarial attacks is crucial to raising awareness of practical threats against deployed Vision-LLMs, and to efforts in building appropriate defense strategies for them.

\ In this work, we revisit the shared auto-regressive characteristic of different Vision-LLMs and intuitively turn that strength into a weakness by leveraging typographic forms of adversarial attacks, also known as typographic attacks. Typographic attacks were first studied in the context of the well-known Contrastive Language-Image Pre-training (CLIP) model [11, 12]. Early works in this area focused on developing a general typographic attack dataset targeting multiple-choice answering (such as object recognition, visual attribute detection, and commonsense answering) and enumeration [13]. Researchers also explored multiple-choice self-generating attacks against zero-shot classification [14], and proposed several defense mechanisms, including keyword-training [15] and prompting the model for detailed reasoning [16]. Despite these initial efforts, the methodologies have neither seen a comprehensive attack framework nor been explicitly designed to investigate the impact of typographic attacks on safety-critical systems, particularly those in AD scenarios.

\ Our work aims to fill this research gap by studying typographic attacks from the perspective of AD systems that incorporate Vision-LLMs. In summary, our scientific contributions are threefold:

\ • Dataset-Independent Framework: we introduce a dataset-independent framework designed to automatically generate misleading answers that can disrupt the reasoning processes of Vision-Large Language Models (Vision-LLMs).

\ • Linguistic Augmentation Schemes: we develop a linguistic augmentation scheme aimed at facilitating stronger typographic attacks on Vision-LLMs. This scheme targets reasoning at both the image and region levels and is expandable to multiple reasoning tasks simultaneously.

\ • Empirical Study in Semi-Realistic Scenarios: we conduct a study to explore the possible implementations of these attacks in real-world traffic scenarios.

\ Through our empirical study of typographic attacks in traffic scenes, we hope to raise community awareness of critical typographic vulnerabilities when incorporating such models into AD systems.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Stunning 96% Surge And 50% Plunge Define Volatile Market Session

Stunning 96% Surge And 50% Plunge Define Volatile Market Session

The post Stunning 96% Surge And 50% Plunge Define Volatile Market Session appeared on BitcoinEthereumNews.com. Crypto Gainers And Losers: Stunning 96% Surge And
Share
BitcoinEthereumNews2026/04/03 09:20
Come Back To Me’ To Air At BIFF Before Global Release

Come Back To Me’ To Air At BIFF Before Global Release

The post Come Back To Me’ To Air At BIFF Before Global Release appeared on BitcoinEthereumNews.com. Kim Woo-sung performs onstage during “The Rose: Come Back to Me” premiere during the 2025 Tribeca Festival. Photo by Roy Rochlin/Getty Images for Tribeca Festival) Getty Images for Tribeca Festival The Rose: Come Back To Me will screen three times at the Busan International Film Festival and at additional film festivals worldwide, before its global theatrical release in 2026. The Korean alt-pop indie band known as The Rose is composed of Woosung, Dojoon, Hajoon, and Taegyeom. From their earliest days,busking in Hongdae, the band has captivated audiences with their distinctive genre-blending sound. Their first full-length album Heal sparked the global Heal Together World Tour, drawing over 90,000 fans and leading to high-profile festival appearances, including headlining the Bacardi Stage at Lollapalooza 2023. They reached a new milestone with their sophomore album Dual, which debuted on the Billboard 200. Building on this success, The Rose sold more than 150,000 tickets on their Dawn to Dusk Tour and delivered a show-stopping set at Coachella 2024. This year they went on a global tour, promoting their latest album WRLD alongside their documentary The Rose: Come Back to Me, which premiered at the Tribeca Film Festival in June 2025. “Knowing how dominant Korean culture is globally—from K-Pop Demon Hunters to Parasite—international audiences are all eager to go deeper and learn more” said Diane Quon and Sanjay M. Sharma on behalf of the producing team behind the popular Tribeca doc. “The Rose is as much a music doc as it is a coming-of-age story—about a group of friends finding their own way through the world. It’s a story of heartbreak and healing, conformity and individuality, and ultimately about the transformative power of music around the world.” Hajoon, Taegyeom, Kim Woo-sung and Dojoon perform onstage during “The Rose: Come Back to Me” premiere.. (Photo by Roy…
Share
BitcoinEthereumNews2025/09/19 06:53
Hong Kong Monetary Authority cuts interest rates by 25 basis points

Hong Kong Monetary Authority cuts interest rates by 25 basis points

PANews reported on September 18 that according to Jinshi, the Hong Kong Monetary Authority lowered the benchmark interest rate by 25 basis points to 4.50%, and the Federal Reserve cut interest rates by 25 basis points overnight.
Share
PANews2025/09/18 08:06

Trade GOLD, Share 1,000,000 USDT

Trade GOLD, Share 1,000,000 USDTTrade GOLD, Share 1,000,000 USDT

0 fees, up to 1,000x leverage, deep liquidity