This article reviews the development and application of Vision-Large-Language-Models, focusing on their integration into autonomous driving systems.This article reviews the development and application of Vision-Large-Language-Models, focusing on their integration into autonomous driving systems.

The Integration of Vision-LLMs into AD Systems: Capabilities and Challenges

Par : Hackernoon
2025/09/28 04:00
VisionGame
VISION$0.0002817-0.31%

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

2 Related Work

2.1 Vision-LLMs

Having demonstrated the proficiency of Large Language Models (LLMs) in reasoning across various natural language benchmarks, researchers have extended LLMs with visual encoders to support multimodal understanding. This integration has given rise to various forms of Vision-LLMs, capable of reasoning based on the composition of visual and language inputs.

\ Vision-LLMs Pre-training. The interconnection between LLMs and pre-trained vision models involves the individual pre-training of unimodal encoders on their respective domains, followed by large-scale vision-language joint training [17, 18, 19, 20, 2, 1]. Through an interleaved visual language corpus (e.g., MMC4 [21] and M3W [22]), auto-regressive models learn to process images by converting them into visual tokens, combine these with textual tokens, and input them into LLMs. Visual inputs are treated as a foreign language, enhancing traditional text-only LLMs by enabling visual understanding while retaining their language capabilities. Hence, a straightforward pre-training strategy may not be designed to handle cases where input text is significantly more aligned with visual texts in an image than with the visual context of that image.

\ Vision-LLMs in AD Systems. Vision-LLMs have proven useful for perception, planning, reasoning, and control in autonomous driving (AD) systems [6, 7, 9, 5]. For example, existing works have quantitatively benchmarked the linguistic capabilities of Vision-LLMs in terms of their trustworthiness in explaining the decision-making processes of AD [7]. Others have explored the use of VisionLLMs for vehicular maneuvering [8, 5], and [6] even validated an approach in controlled physical environments. Because AD systems involve safety-critical situations, comprehensive analyses of their vulnerabilities are crucial for reliable deployment and inference. However, proposed adoptions of Vision-LLMs into AD have been straightforward, which means existing issues (e.g., vulnerabilities against typographic attacks) in such models are likely present without proper countermeasures.

\

:::info Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.

:::

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Clause de non-responsabilité : les articles republiés sur ce site proviennent de plateformes publiques et sont fournis à titre informatif uniquement. Ils ne reflètent pas nécessairement les opinions de MEXC. Tous les droits restent la propriété des auteurs d'origine. Si vous estimez qu'un contenu porte atteinte aux droits d'un tiers, veuillez contacter [email protected] pour demander sa suppression. MEXC ne garantit ni l'exactitude, ni l'exhaustivité, ni l'actualité des contenus, et décline toute responsabilité quant aux actions entreprises sur la base des informations fournies. Ces contenus ne constituent pas des conseils financiers, juridiques ou professionnels, et ne doivent pas être interprétés comme une recommandation ou une approbation de la part de MEXC.
Partager des idées

Vous aimerez peut-être aussi

MoonBull Presale Live, Pepe and Mog Coin Insights

MoonBull Presale Live, Pepe and Mog Coin Insights

The post MoonBull Presale Live, Pepe and Mog Coin Insights appeared on BitcoinEthereumNews.com. Crypto News 28 September 2025 | 02:15 Discover the best new crypto MoonBull, live presale benefits, plus engaging insights on Pepe and Mog Coin. Ever sat around wondering which meme coin could be the next bull charging straight into the moon? Crypto enthusiasts chasing peanuts of profit and penguin-like dives into risk know the game never slows down. From the hype-packed antics of Pepe to the cheeky rise of Mog Coin, the meme coin arena keeps buzzing with energy. Yet one project is stealing the show: MoonBull, and its presale is live right now. MoonBull’s presale is live now, giving investors the lowest entry point in a structured model designed for growth. With scarcity built into its 23-stage rollout, automated tokenomics, and referral rewards, MoonBull positions itself as the best new crypto. For anyone who missed past moonshots, this presale could be the golden ticket to financial freedom. MoonBull Presale is Live Now: Why This Bull Stands Out MoonBull ($MOBU) is charging into the crypto arena with a presale strategy unlike anything the meme token space has seen. The project combines hype with structured tokenomics, giving it both entertainment and substance. Its 23-stage presale model creates urgency, scarcity, and momentum. Stage 3 is live now at $0.00004057, while the final stage price stands at $0.00616. That’s a jaw-dropping possibility ROI of 24,540% for early adopters. Imagine putting $15,000 into Stage 1. By the final presale stage, that position could balloon into over $3.6 million worth of tokens. Numbers like this are peanuts only to billionaires, but to regular crypto hunters, it’s life-changing. The staged design ensures steady growth rather than a one-time pump-and-dump, keeping investors strapped in for the ride. MoonBull also introduces staking from Stage 10, with an APY of 95%. Holders will enjoy rewards calculated daily, and while…
SecondLive
LIVE$0.01577-3.01%
Pepe
PEPE$0.00000918-1.60%
MOG Coin
MOG$0.000000633-2.49%
Partager
BitcoinEthereumNews2025/09/28 07:16
Partager
Luke Dashjr denies supporting the creation of a “committee with the power to change the Bitcoin blockchain” via a hard fork

Luke Dashjr denies supporting the creation of a “committee with the power to change the Bitcoin blockchain” via a hard fork

PANews reported on September 28th that, according to CoinDesk, on September 25th, The Rage published an article claiming that Bitcoin Knots maintainer Luke Dashjr supported a hard fork, proposing the establishment of a trusted multi-signature committee with the power to retroactively modify the blockchain, review transactions, and delete illegal content. The article also cited allegedly leaked text messages in which Dashjr stated, "Either Bitcoin dies or we trust someone." The report, which garnered hundreds of thousands of views on the X platform, exacerbated the debate over whether Bitcoin should maintain a neutral settlement layer or developers should filter legitimate uses of the network. Knots flatly denied the allegations, while Dashjr called them slander by bad actors, undermining their efforts to save Bitcoin. The Rage responded with memes, demanding the identity of the leaker. Over the following 24 hours, Dashjr repeatedly reiterated that no one was calling for a hard fork. The controversy highlights a long-standing disagreement between Knots and Bitcoin Core. Knots implements stricter rules that prevent non-monetary data like ordinals and runes, while Bitcoin Core takes a more relaxed approach. Udi Wertheimer, co-founder of the Ordinals project Taproot Wizards, called the report a "malicious attack" and defended Dashjr against its misrepresentation.
ChangeX
CHANGE$0.00165076+0.22%
Octavia
VIA$0.0136-10.52%
Multichain
MULTI$0.03785+4.61%
Partager
PANews2025/09/28 07:37
Partager
The HackerNoon Newsletter: Killing the 7-Day Week (9/27/2025)

The HackerNoon Newsletter: Killing the 7-Day Week (9/27/2025)

How are you, hacker? 🪐 What’s happening in tech today, September 27, 2025? The HackerNoon Newsletter brings the HackerNoon homepage straight to your inbox. On this day, The First Ford Model T Car Assembled in 1908, Locomotion No. 1 in 1825, One-Day Capital in 1777, and we present you with these top quality stories. From At AIESEC in Nigeria’s IYD 2025, Youth Leaders Prove the Future Is Now to The Harsh Math of AI: 78% Adoption, 90%+ Disappointment with Generative AI ROI, let’s dive right in. At AIESEC in Nigeria’s IYD 2025, Youth Leaders Prove the Future Is Now By @ashumerie [ 2 Min read ] Youth voices are shaping the future. Reflections from AIESEC Nigeria’s IYD 2025 on tech, SDGs, storytelling, and community-driven change. Read More. The Harsh Math of AI: 78% Adoption, 90%+ Disappointment with Generative AI ROI By @MichaelJerlis [ 8 Min read ] By 2025, 80% of companies use AI, yet most projects fail to deliver ROI. Discover why AI adoption struggles and what separates winners from laggards. Read More. Killing the 7-Day Week By @benoitmalige [ 3 Min read ] The 7-day week is an outdated script written by empires, factories, and schools. Here’s how killing Sundays, and the week itself, frees you. Read More. 🧑‍💻 What happened in your world this week? It's been said that writing can help consolidate technical knowledge, establish credibility, and contribute to emerging community standards. Feeling stuck? We got you covered ⬇️⬇️⬇️ ANSWER THESE GREATEST INTERVIEW QUESTIONS OF ALL TIME We hope you enjoy this worth of free reading material. Feel free to forward this email to a nerdy friend who'll love you for it.See you on Planet Internet! With love, The HackerNoon Team ✌️
Partager
Hackernoon2025/09/28 00:02
Partager

Actualités tendance

Plus

MoonBull Presale Live, Pepe and Mog Coin Insights

Luke Dashjr denies supporting the creation of a “committee with the power to change the Bitcoin blockchain” via a hard fork

The HackerNoon Newsletter: Killing the 7-Day Week (9/27/2025)

Analyst: After a series of intensive submissions of revised documents, the Solana ETF may be listed within weeks

Ethereum Network Surpasses Tron Crypto In USDT Transfers