AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University (wang5075@purdue.edu);

(2) Sahaj Agarwal, Microsoft (sahagar@microsoft.com);

(3) Subhabrata Mukherjee, Microsoft Research (submukhe@microsoft.com);

(4) Xiaodong Liu, Microsoft Research (xiaodl@microsoft.com);

(5) Jing Gao, Purdue University (jinggao@purdue.edu);

(6) Ahmed Hassan Awadallah, Microsoft Research (hassanam@microsoft.com);

(7) Jianfeng Gao, Microsoft Research (jfgao@microsoft.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Why It Could Outperform Pepe Coin And Tron With Over $7m Already Raised

Why It Could Outperform Pepe Coin And Tron With Over $7m Already Raised

The post Why It Could Outperform Pepe Coin And Tron With Over $7m Already Raised appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 20:26 While meme tokens like Pepe Coin and established networks such as Tron attract headlines, many investors are now searching for projects that combine innovation, revenue-sharing and real-world utility. BlockchainFX ($BFX), currently in presale at $0.024 ahead of an expected $0.05 launch, is quickly becoming one of the best cryptos to buy today. With $7m already secured and a unique model spanning multiple asset classes, it is positioning itself as a decentralised super app and a contender to surpass older altcoins. Early Presale Pricing Creates A Rare Entry Point BlockchainFX’s presale pricing structure has been designed to reward early participants. At $0.024, buyers secure a lower entry price than later rounds, locking in a cost basis more than 50% below the projected $0.05 launch price. As sales continue to climb beyond $7m, each new stage automatically increases the token price. This built-in mechanism creates a clear advantage for early investors and explains why the project is increasingly cited in “best presales to buy now” discussions across the crypto space. High-Yield Staking Model Shares Platform Revenue Beyond its presale appeal, BlockchainFX is creating a high-yield staking model that gives holders a direct share of platform revenue. Every time a trade occurs on its platform, 70% of trading fees flow back into the $BFX ecosystem: 50% of collected fees are automatically distributed to stakers in both BFX and USDT. 20% is allocated to daily buybacks of $BFX, adding demand and price support. Half of the bought-back tokens are permanently burned, steadily reducing supply. Rewards are based on the size of each member’s BFX holdings and capped at $25,000 USDT per day to ensure sustainability. This structure transforms token ownership from a speculative bet into an income-generating position, a rare feature among today’s altcoins. A Multi-Asset Platform…
Share
BitcoinEthereumNews2025/09/18 03:35
Crypto Shows Mixed Reaction To Rate Cuts and Powell’s Speech

Crypto Shows Mixed Reaction To Rate Cuts and Powell’s Speech

The post Crypto Shows Mixed Reaction To Rate Cuts and Powell’s Speech appeared on BitcoinEthereumNews.com. Jerome Powell gave a speech justifying the Fed’s decision to push one rate cut today. Even though a cut took place as predicted, most leading cryptoassets began falling after a momentary price boost. Additionally, Powell directly addressed President Trump’s attempts to influence Fed policy, claiming that it didn’t impact today’s decisions. In previous speeches, he skirted around this elephant in the room. Sponsored Sponsored Powell’s FOMC Speech The FOMC just announced its decision to cut US interest rates, a highly-telegraphed move with substantial market implications. Jerome Powell, Chair of the Federal Reserve, gave a speech to help explain this moderate decision. In his speech, Powell discussed several negative economic factors in the US right now, including dour Jobs Reports and inflation concerns. These contribute to a degree of fiscal uncertainty which led Powell to stick with his conservative instincts, leaving tools available for future action. “At today’s meeting, the Committee decided to lower the target range…by a quarter percentage point… and to continue reducing the size of our balance sheet. Changes to government policies continue to evolve, and their impacts on the economy remain uncertain,” he claimed. Crypto’s Muted Response The Fed is in a delicate position, balancing the concerns of inflation and employment. This conservative approach may help explain why crypto markets did not react much to Powell’s speech: Bitcoin (BTC) Price Performance. Source: CoinGecko Sponsored Sponsored Bitcoin, alongside the other leading cryptoassets, exhibited similar movements during the rate cuts and Powell’s speech. Although there were brief price spikes immediately after the announcement, subsequent drops ate these gains. BTC, ETH, XRP, DOGE, ADA, and more all fell more than 1% since the Fed’s announcement. Breaking with Precedent However, Powell’s speech did differ from his previous statements in one key respect: he directly addressed claims that President Trump is attacking…
Share
BitcoinEthereumNews2025/09/18 09:01
Here’s why Polygon price is at risk of a 25% plunge

Here’s why Polygon price is at risk of a 25% plunge

Polygon price continued its freefall, reaching its lowest level since April 21, as the broader crypto sell-off gained momentum. Polygon (POL) dropped to $0.1915, down 32% from its highest point in May and 74% below its 2024 peak. The crash…
Share
Crypto.news2025/06/19 00:56