This article surveys parameter-efficient tuning, V&L adaptation, and multitask hypernetworks—then frames a unified hyper-embedding PELT approach.This article surveys parameter-efficient tuning, V&L adaptation, and multitask hypernetworks—then frames a unified hyper-embedding PELT approach.

One Tiny Hypernetwork to Rule All Tasks and Modalities

Abstract and 1. Introduction

  1. Related Work
  2. Preliminaries
  3. Proposed Method
  4. Experimental Setup
  5. Results and Analysis
  6. Discussion and Conclusion, and References

A. The Connection Between Prefix-tuning and Hypernetwork

B. Number of Tunable Parameters

C. Input-output formats

2. Related Work

In this section, we review recent research on parameter-efficient tuning for pure language and V&L tasks, as well as the corresponding work for multi-task learning.

2.1. Parameter-efficient tuning

As recent models grow rapidly in size, how to finetune pretrained models with a small number of trainable parameters becomes more crucial. Existing research (He et al., 2021; Lester et al., 2021; Liu et al., 2021a; Mao et al., 2021) have explored a large amount of methods on parameter-efficient tuning. These methods generally include two categories according to whether new trainable parameters are introduced. One category is that only a subset of model parameters can be updated while freezing the remain (Liu et al., 2021b; Lee et al., 2019). The other is introducing a few task-specific new parameters to different parts of pretrained models, such as before multi-head attention (Li & Liang, 2021), after feedforward layers (Houlsby et al., 2019) or Mixed-and-Match methods (MAM adapter) proposed by He et al. (2021).

2.2. Tuning towards Vision-and-Language

In addition, fine-tuning language models pretrained on pure large text corpora have led to noticeable improvements to V&L tasks. This line of research such as VL-T5 (Cho et al., 2021) and Frozen (Tsimpoukelli et al., 2021) attempts to tune large language models (e.g. T5; GPT-3) to achieve transfer learning for V&L tasks. For example, Frozen aligns the image representation into the word representation space of frozen GPT-3 model which thus is able to generate captions for those images. PICa (Yang et al., 2021) utilizes a pretrained image captioner to convert the image into captions that GPT-3 can understand, and then adapt GPT-3 to solve the VQA tasks in a few-shot manner. Sung et al. (2021) introduces a limited set of new trainable parameters to VL-T5 via a adapter-based method that can match the performance of fine-tuning the entire model.

2.3. Multi-task Learning

Learning a unified model to perform well on multiple different tasks (i.e., multi-task learning) is a challenging problem in both NLP and V&L domains. It has to address many challenges such as catastrophic forgetting, and model overfitting in low-resource tasks while underfitting in high-resource tasks (Aharoni et al., 2019). Radford et al. (2019) highlights the ability of language models to perform a wide range of multitasks in a zero-shot setting. As mentioned above, involving task-specific new parameters such as adapter (Houlsby et al., 2019), can be trained for each task separately while keeping the model fixed. von Oswald et al. (2020) propose a task-conditioned hypernetwork to generate all the weights for the targeted model, while Mahabadi et al. (2021) use a shared hypernetwork to only generate weights for a small number of parameters in adapter modules, to allow the model to adapt to each individual task efficiently.

\ Our motivation. Different from mainstream V&L models that append image tokens to the input sequence, we present a novel perspective of merging textual and visual modalities, by using image embedding and task-specific type embedding of multitasks as input to a shared hypernetwork, which generates weights for prefix-tuning and adapt-tuning modules of PLMs. At the same time, we notice a recent paper (He et al., 2022) that was publicly available days ago. This concurrent work shares the similar motivation like us on generating weights for prefix-tuning modules via a hypernetwork, but their method is only targeted at pure language tasks. Our unified framework is able to improve transfer learning in both pure text and vision-to-language multitasks, in a very parameter-efficient manner.

\

3. Preliminaries

3.1. Pretrained Language Models

All of our models are built on top of the state-of-the-art language model, T5 (Raffel et al., 2020), consisting of an encoder-decoder Transformer (Vaswani et al., 2017) with minor modifications. It frames language tasks as sequence-to-sequence generation, and is trained simultaneously on multiple task datasets. This large-scale T5 model achieves state-of-the-art performances across a diverse set of tasks. We use the T5 backbone as it enables training a universal model that interfaces with many downstream language tasks.

3.2. Multi-task Learning Problem formulation

\

\

3.3. Hypernetworks

\

\

4. Proposed Method

\

\

4.1. Hyper-Embeddings for PELT

\

\

\ \ \

\

4.2. HyperPrefix: Incorporate with Prefix-tuning

Prefix-tuning (Li & Liang, 2021) prepends a number of taskspecific trainable prefix vectors to the parameters of multihead attention (i.e., keys and values) at each transformer layer. In the original implementation, the prefix vectors of each attention block are reparameterized by a two-layer feed-forward network:

\

\

\

4.3. HyperPELT: Incorporate with Adapter

\

\ Note that in Section 4.2, we use the prefix length N as the dimension for hyper-embeddings. We utilize an adaptive pooling operation on hyper-embeddings to adjust the dimension for adapter hypernetwork. Note that due to we extend the dimension of the components of hyper-embeddings in the last section, we utilize an adaptive pooling operation for hyper-embeddings to adjust the dimension for adapter hypernetwork.

4.4. VL-HyperPELT: Incorporate with Visual Modality

\

\

\

:::info Authors:

(1) Zhengkun Zhang, with Equal contribution from Work is done at the internship of Noah’s Ark Lab, Huawei Technologies

(2) Wenya Guo and TKLNDST, CS, Nankai University, China (yangzl@nankai.edu.cn);

(3) Xiaojun Meng, with Equal contribution from Noah’s Ark Lab, Huawei Technologies;

(4) Yasheng Wang, Noah’s Ark Lab, Huawei Technologies;

(5) Yadao Wang, Noah’s Ark Lab, Huawei Technologies;

(6) Xin Jiang, Noah’s Ark Lab, Huawei Technologies;

(7) Qun Liu, Noah’s Ark Lab, Huawei Technologies;

(8) Zhenglu Yang, TKLNDST, CS, Nankai University, China.

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
Hyperlane Logo
Hyperlane Price(HYPER)
$0.13413
$0.13413$0.13413
-1.77%
USD
Hyperlane (HYPER) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32
Edges higher ahead of BoC-Fed policy outcome

Edges higher ahead of BoC-Fed policy outcome

The post Edges higher ahead of BoC-Fed policy outcome appeared on BitcoinEthereumNews.com. USD/CAD gains marginally to near 1.3760 ahead of monetary policy announcements by the Fed and the BoC. Both the Fed and the BoC are expected to lower interest rates. USD/CAD forms a Head and Shoulder chart pattern. The USD/CAD pair ticks up to near 1.3760 during the late European session on Wednesday. The Loonie pair gains marginally ahead of monetary policy outcomes by the Bank of Canada (BoC) and the Federal Reserve (Fed) during New York trading hours. Both the BoC and the Fed are expected to cut interest rates amid mounting labor market conditions in their respective economies. Inflationary pressures in the Canadian economy have cooled down, emerging as another reason behind the BoC’s dovish expectations. However, the Fed is expected to start the monetary-easing campaign despite the United States (US) inflation remaining higher. Investors will closely monitor press conferences from both Fed Chair Jerome Powell and BoC Governor Tiff Macklem to get cues about whether there will be more interest rate cuts in the remainder of the year. According to analysts from Barclays, the Fed’s latest median projections for interest rates are likely to call for three interest rate cuts by 2025. Ahead of the Fed’s monetary policy, the US Dollar Index (DXY), which tracks the Greenback’s value against six major currencies, holds onto Tuesday’s losses near 96.60. USD/CAD forms a Head and Shoulder chart pattern, which indicates a bearish reversal. The neckline of the above-mentioned chart pattern is plotted near 1.3715. The near-term trend of the pair remains bearish as it stays below the 20-day Exponential Moving Average (EMA), which trades around 1.3800. The 14-day Relative Strength Index (RSI) slides to near 40.00. A fresh bearish momentum would emerge if the RSI falls below that level. Going forward, the asset could slide towards the round level of…
Share
BitcoinEthereumNews2025/09/18 01:23
Zero Knowledge Proof Sparks 300x Growth Discussion! Bitcoin Cash & Ethereum Cool Off

Zero Knowledge Proof Sparks 300x Growth Discussion! Bitcoin Cash & Ethereum Cool Off

Explore how Bitcoin Cash and Ethereum move sideways while Zero Knowledge Proof (ZKP) gains notice with a live presale auction, working infra, shipping Proof Pods
Share
CoinLive2026/01/18 07:00