GLUE and V&L results show near–full-tune accuracy, strong few-shot transfer, and far lower per-task storage than current adapter/prefix methods.GLUE and V&L results show near–full-tune accuracy, strong few-shot transfer, and far lower per-task storage than current adapter/prefix methods.

Cut Fine-Tuning Cost: Adapt LMs to Multi-Modal Tasks with <1% New Params

:::info Authors:

(1) Zhengkun Zhang, with Equal contribution from Work is done at the internship of Noah’s Ark Lab, Huawei Technologies

(2) Wenya Guo and TKLNDST, CS, Nankai University, China (yangzl@nankai.edu.cn);

(3) Xiaojun Meng, with Equal contribution from Noah’s Ark Lab, Huawei Technologies;

(4) Yasheng Wang, Noah’s Ark Lab, Huawei Technologies;

(5) Yadao Wang, Noah’s Ark Lab, Huawei Technologies;

(6) Xin Jiang, Noah’s Ark Lab, Huawei Technologies;

(7) Qun Liu, Noah’s Ark Lab, Huawei Technologies;

(8) Zhenglu Yang, TKLNDST, CS, Nankai University, China.

:::

Abstract and 1. Introduction

  1. Related Work

  2. Preliminaries

  3. Proposed Method

  4. Experimental Setup

  5. Results and Analysis

  6. Discussion and Conclusion, and References

    \

A. The Connection Between Prefix-tuning and Hypernetwork

B. Number of Tunable Parameters

C. Input-output formats

Abstract

The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of pretrained models growing rapidly, how to perform parameter-efficient fine-tuning has become fairly important for quick transfer learning and deployment. In this paper, we design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks. In particular, we use a shared hypernetwork that takes trainable hyper-embeddings as input, and outputs weights for fine-tuning different small modules in a pretrained language model, such as tuning the parameters inserted into multi-head attention blocks (i.e., prefix-tuning) and feedforward blocks (i.e., adapter-tuning). We define a set of embeddings (e.g., layer, block, task and visual embeddings) as the key components to calculate hyper-embeddings, which thus can support both pure language and V&L tasks. Our proposed framework adds fewer trainable parameters in multi-task learning while achieves superior performances and transfer ability compared to state-of-the-art methods. Empirical results on the GLUE benchmark and multiple V&L tasks confirm the effectiveness of our framework on both textual and visual modalities. [1]

\

1. Introduction

Pretraining and fine-tuning are now the prevalent paradigm in natural language processing, yielding state-of-the-art performances on a variety of down-steam tasks (Devlin et al., 2019). With pre-trained language models (PLMs) growing rapidly in size, it becomes increasingly infeasible to perform conventional fine-tuning on all model parameters, i.e., full fine-tuning. It is even more time & space-consuming for multi-tasking if separate replicas of model parameters are updated and saved per single task.

\ To mitigate these issues, there has recently been one line of research on Parameter-Efficient Language model Tuning (PELT). A few lightweight transfer learning methods have been proposed and they only update a subset of model parameters while freeze the remaining most parameters (Liu et al., 2021b). Extra trainable task-specific model parameters can also be newly introduced to PLMs, such as the widely used adapter-tuning (Houlsby et al., 2019) and prefixtuning (Li & Liang, 2021) methods. The former adaptertuning adds new parameters between transformer layers, while the later prepends tunable prefix vectors to the keys and values of multi-head attention at each layer. Although the number of parameters in the introduced adapter or prefix is much fewer than the original PLM, training these new parameters still requires a lot of resources due to the complex structure of PLMs.

\ Apart from traditional NLP tasks, fine-tuning language models pretrained on pure text corpora to perform various V&L tasks, has merged as a upward trend. Previous methods (e.g., VL-T5 from Cho et al. (2021)) often concatenate visual patch tokens and textual tokens as input to a pretrained language model (e.g., T5 from Raffel et al. (2020)), and then finetune the whole model on V&L tasks. This tuning towards vision-and-language has achieved a noticeable improvement to V&L tasks (Cho et al., 2021). The key advantage therein is that language models with large capacity and semantic interpretation serve as a cornerstone to benefit visual language alignment and modelling in a wide range of V&L tasks.

\ Similarly, training all the parameters of PLMs for handling visual input is time-consuming. It is crucial to explore how a small number of trainable parameters can equip a language model with the ability of handling visual input and V&L tasks. Existing methods typically handle the visual input via a prompt-tuning form, and prepend visual patch tokens (i.e., visual prefix of Frozen in Tsimpoukelli et al. (2021)) to the textual sequence. To reduce the trainable parameters, VL-adapter (Sung et al., 2021) adopts the adapter-tuning technique from NLP to the frozen model VL-T5, which can match the performance of full fine-tuning.

\ Inspired by the recent progress of parameter-efficient tuning, we are motivated to unify a transfer learning framework that supports both language and V&L models in tackling with multitasks. We use a shared hypernetwork (Mahabadi et al., 2021) that is able to take multi-task and multi-modal information as input, and generate weights for tuning different task-specific modules of PLMs in transfer learning. As shown in Figure 1, when finetuning on multitasks, only the shared hypernetwork and its input embedding (namely, hyper-embedding) consisting of layer, block, task and visual embeddings, along with layer normalization, are trained. Such unified parameter-efficient tuning reduces a great number of trainable parameters.

\ We experiment with two task-specific modules that use the weights output by our hypernetwork. They are respectively multi-head attention modules (Li & Liang, 2021) and task-specific adapter (Houlsby et al., 2019). Different from previous methods using visual input in a prompt-tuning manner, we present a novel perspective of adopting visual input to the above prefix-tuning and adapter-tuning modules. Empirical results on GLUE benchmark and multiple V&L tasks confirm the effectiveness of our unified framework.

\ In summary, we make the following contributions:

\ • We propose an unified parameter-efficient framework for vision and language transfer learning, which supports tuning both language and V&L models on multitasks.

\ • We present a novel method of leveraging visual modality as input for a shared hypernetwork, which generates weights for prefix-tuning and adapter-tuning modules.

\ • We demonstrate that our framework scales more efficiently than prior work. Empirical results on GLUE benchmark show the effectiveness of our proposed framework in multi-task learning. Empirical results on multiple vision-and-language tasks evidence its feasibility and usefulness in multi-modal transfer learning.

\ • We also perform extensive experiments on few-shot domain transfer in pure language and V&L scenarios, and results reveal that the learned shared knowledge across multitasks in our framework is able to positively transfer to unseen domain tasks.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] We will release our code to facilitate future work.

Market Opportunity
NEAR Logo
NEAR Price(NEAR)
$1.733
$1.733$1.733
-3.02%
USD
NEAR (NEAR) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Microsoft Corp. $MSFT blue box area offers a buying opportunity

Microsoft Corp. $MSFT blue box area offers a buying opportunity

The post Microsoft Corp. $MSFT blue box area offers a buying opportunity appeared on BitcoinEthereumNews.com. In today’s article, we’ll examine the recent performance of Microsoft Corp. ($MSFT) through the lens of Elliott Wave Theory. We’ll review how the rally from the April 07, 2025 low unfolded as a 5-wave impulse followed by a 3-swing correction (ABC) and discuss our forecast for the next move. Let’s dive into the structure and expectations for this stock. Five wave impulse structure + ABC + WXY correction $MSFT 8H Elliott Wave chart 9.04.2025 In the 8-hour Elliott Wave count from Sep 04, 2025, we saw that $MSFT completed a 5-wave impulsive cycle at red III. As expected, this initial wave prompted a pullback. We anticipated this pullback to unfold in 3 swings and find buyers in the equal legs area between $497.02 and $471.06 This setup aligns with a typical Elliott Wave correction pattern (ABC), in which the market pauses briefly before resuming its primary trend. $MSFT 8H Elliott Wave chart 7.14.2025 The update, 10 days later, shows the stock finding support from the equal legs area as predicted allowing traders to get risk free. The stock is expected to bounce towards 525 – 532 before deciding if the bounce is a connector or the next leg higher. A break into new ATHs will confirm the latter and can see it trade higher towards 570 – 593 area. Until then, traders should get risk free and protect their capital in case of a WXY double correction. Conclusion In conclusion, our Elliott Wave analysis of Microsoft Corp. ($MSFT) suggested that it remains supported against April 07, 2025 lows and bounce from the blue box area. In the meantime, keep an eye out for any corrective pullbacks that may offer entry opportunities. By applying Elliott Wave Theory, traders can better anticipate the structure of upcoming moves and enhance risk management in volatile markets. Source: https://www.fxstreet.com/news/microsoft-corp-msft-blue-box-area-offers-a-buying-opportunity-202509171323
Share
BitcoinEthereumNews2025/09/18 03:50
IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32
Zero Knowledge Proof Sparks 300x Growth Discussion! Bitcoin Cash & Ethereum Cool Off

Zero Knowledge Proof Sparks 300x Growth Discussion! Bitcoin Cash & Ethereum Cool Off

Explore how Bitcoin Cash and Ethereum move sideways while Zero Knowledge Proof (ZKP) gains notice with a live presale auction, working infra, shipping Proof Pods
Share
CoinLive2026/01/18 07:00