AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.AdaMix, a parameter-efficient fine-tuning method, outperforms full model fine-tuning in few-shot NLU tasks across benchmarks like GLUE. Using prompt-based strategies without extra validation or unlabeled data, AdaMix consistently boosts performance with both BERT and RoBERTa encoders, demonstrating stability and efficiency in few-shot scenarios.

Smarter AI Training with Few-Shot Natural Language Tasks

2025/10/02 17:00

Abstract and 1. Introduction

  1. Background

    2.1 Mixture-of-Experts

    2.2 Adapters

  2. Mixture-of-Adaptations

    3.1 Routing Policy

    3.2 Consistency regularization

    3.3 Adaptation module merging and 3.4 Adaptation module sharing

    3.5 Connection to Bayesian Neural Networks and Model Ensembling

  3. Experiments

    4.1 Experimental Setup

    4.2 Key Results

    4.3 Ablation Study

  4. Related Work

  5. Conclusions

  6. Limitations

  7. Acknowledgment and References

Appendix

A. Few-shot NLU Datasets B. Ablation Study C. Detailed Results on NLU Tasks D. Hyper-parameter

A Few-shot NLU Datasets

Data. In contrast to the fully supervised setting in the above experiments, we also perform fewshot experiments following the prior study (Wang et al., 2021) on six tasks including MNLI (Williams et al., 2018), RTE (Dagan et al., 2005; Bar Haim et al., 2006; Giampiccolo et al., 2007; Bentivogli et al., 2009), QQP[1] and SST-2 (Socher et al.). The results are reported on their development set following (Zhang et al., 2021). MPQA (Wiebe et al., 2005) and Subj (Pang and Lee, 2004) are used for polarity and subjectivity detection, where we follow (Gao et al., 2021) to keep 2, 000 examples for testing. The few-shot model only has access to |K| labeled samples for any task. Following true few-shot learning setting (Perez et al., 2021; Wang et al., 2021), we do not use any additional validation set for any hyper-parameter tuning or early stopping. The performance of each model is reported after fixed number of training epochs. For a fair comparison, we use the same set of few-shot labeled instances for training as in (Wang et al., 2021). We train each model with 5 different seeds and report average performance with standard deviation across the runs. In the few-shot experiments, we follow (Wang et al., 2021) to train AdaMix via the prompt-based fine-tuning strategy. In contrast to (Wang et al., 2021), we do not use any unlabeled data.

\

B Ablation Study

\ Table 11: Ablation study demonstrating the impact of parameter sharing in AdaMix adapter framework.

\

C Detailed Results on NLU Tasks

The results on NLU tasks are included in Table 1 and Table 13. The performance AdaMix with RoBERTa-large encoder achieves the best performance in terms of different task metrics in the GLUE benchmark. AdaMix with adapters is the

\ \ Table 12: Varying the bottleneck dimension of adapters in AdaMix with BERT-base and RoBERTa-large encoder. * denotes the bottleneck dimension used in AdaMix with adapters.

\ \ only PEFT method which outperforms full model fine-tuning on all the tasks and on average score. Additionally, the improvement brought by AdaMix is more significant with BERT-base as the encoder, demonstrating 2.2% and 1.2% improvement over the performance of full model fine-tuning and the best performing baseline UNIPELT with BERTbase. The improvement is observed to be consistent as that with RoBERTa-large on every task. The NLG results are included in Table 4 and 5.

D Hyper-parameter

Detailed hyper-parameter configuration for different tasks presented in Table 15 and Table 16.

\

:::info Authors:

(1) Yaqing Wang, Purdue University (wang5075@purdue.edu);

(2) Sahaj Agarwal, Microsoft (sahagar@microsoft.com);

(3) Subhabrata Mukherjee, Microsoft Research (submukhe@microsoft.com);

(4) Xiaodong Liu, Microsoft Research (xiaodl@microsoft.com);

(5) Jing Gao, Purdue University (jinggao@purdue.edu);

(6) Ahmed Hassan Awadallah, Microsoft Research (hassanam@microsoft.com);

(7) Jianfeng Gao, Microsoft Research (jfgao@microsoft.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

[1] https://www.quora.com/q/quoradata/

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

US Dollar Index (DXY) hovers near multi-week low ahead of US PCE data

US Dollar Index (DXY) hovers near multi-week low ahead of US PCE data

The post US Dollar Index (DXY) hovers near multi-week low ahead of US PCE data appeared on BitcoinEthereumNews.com. The US Dollar Index (DXY), which tracks the Greenback against a basket of currencies, struggles to capitalize on the overnight bounce from its lowest level since late October and trades with a mild negative bias during the Asian session on Friday. The index is currently placed around the 99.00 mark, down less than 0.10% for the day, as traders now await the crucial US inflation data before placing fresh directional bets. The September US Personal Consumption Expenditure (PCE) Price Index will be published later today and will be scrutinized for more cues about the Federal Reserve’s (Fed) future rate-cut path. This, in turn, will play a key role in determining the next leg of a directional move for the Greenback. In the meantime, dovish US Federal Reserve (Fed) expectations overshadow Thursday’s upbeat US labor market reports and continue to act as a headwind for the buck. Recent comments from several Fed officials suggested that another interest rate cut in December is all but certain. The CME Group’s FedWatch Tool indicates an over 85% probability of a move next week. Furthermore, reports suggest that White House National Economic Council Director Kevin Hassett is seen as the frontrunner to become the next Fed Chair and is expected to enact US President Donald Trump’s calls for lower rates, which, in turn, favors the USD bears. Nevertheless, the DXY remains on track to register losses for the second straight week, and the fundamental backdrop suggests that the path of least resistance for the index remains to the downside. Hence, any attempted recovery is more likely to get sold into and remain limited. US Dollar Price Last 7 Days The table below shows the percentage change of US Dollar (USD) against listed major currencies last 7 days. US Dollar was the strongest against the Swiss…
Share
BitcoinEthereumNews2025/12/05 13:43
SSP Stock Surges 11% On FY25 Earnings And European Rail Review

SSP Stock Surges 11% On FY25 Earnings And European Rail Review

The post SSP Stock Surges 11% On FY25 Earnings And European Rail Review appeared on BitcoinEthereumNews.com. SSP Group stock rebounded strongly today. (Photo Illustration by Pavlo Gonchar/SOPA Images/LightRocket via Getty Images) SOPA Images/LightRocket via Getty Images Shares in travel food retailer SSP Group rose sharply today after the company posted solid FY25 results, highlighting good growth in two of its four regional divisions, and a decision to review its under‑performing Continental European rail business. The food and beverage (F&B) company’s stock closed 11.3% up in London on the back of a revenue rise of 7.8% (at constant currency) to £3.6 billion ($4.8 billion) in the 12 months to September. Operating profit jumped by 12.7% to £223 million ($298 million). Under statutory IFRS reporting, however, operating profit fell 58% to £86 million, which SSP said in a statement “reflected £183 million of non‑underlying expenses and impairment charges.” The decision to review its rail business in Continental Europe—the biggest of the F&B giant’s four divisions by revenue at £1,205 million ($1,607 million)—was welcomed by the market, given its weak performance of 2% like-for-like (LFL) growth. A carrot was also dangled— a reward to shareholders arising from the July IPO of SSP’s Indian joint venture Travel Food Services (TFS) with K Hospitality, India’s largest privately held F&B company. SSP Group CEO Patrick Coveney said in a statement: “We acknowledge there is more to do to strengthen our operational performance, most notably in Continental Europe, where we have now reset our team, model, and balance sheet, and have a range of initiatives underway. In addition, we are launching a wide-ranging review of our rail business in Continental Europe. We are also considering options to realise value for our shareholders in line with the delivery of the TFS free float requirement.” SSP currently retains a 50.01% stake in TFS and said: “We believe that India’s market potential, combined with TFS’s attractive…
Share
BitcoinEthereumNews2025/12/05 13:37
‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

The post ‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out? appeared on BitcoinEthereumNews.com. LOVE ISLAND GAMES — Episode 201 — Pictured: Ariana Madix — (Photo by: Ben Symons/PEACOCK via Getty Images) Ben Symons/PEACOCK via Getty Images We’ve got a text! It’s time for another season of Love Island Games. With fan-favorites returning in hopes of winning the $250,000 cash prize, read on to learn more about Love Island Games Season 2, including the release schedule so you don’t miss a second of drama. Love Island Games is a spinoff in the Love Island franchise that first premiered in 2023. The show follows a similar format to the original series, but with one major twist: all contestants are returning Islanders from previous seasons of Love Island from around the world, including the USA, UK, Australia and more. Another big difference is that games take on much more importance in Love Island Games than the mothership version, with the results “determining advantages, risks, and even who stays and who goes,” according to Peacock. Vanderpump Rules star Ariana Madix is taking over hosting duties for Love Island Games Season 2, replacing Love Island UK star Maya Jama who hosted the first season. Iain Stirling returns as the show’s narrator, while UK alum Maura Higgins will continue to host the Saturday show Love Island: Aftersun. ForbesWho’s In The ‘Love Island Games’ Season 2 Cast? Meet The IslandersBy Monica Mercuri Jack Fowler and Justine Ndiba were named the first-ever winners of Love Island Games in 2023. Justine had previously won Love Island USA Season 2 with Caleb Corprew, while Jack was a contestant on Love Island UK Season 4. In March 2024, Fowler announced on his Instagram story that he and Justine decided to remain “just friends.” The Season 2 premiere revealed the first couples of the season: Andrea Carmona and Charlie Georgios, Andreina Santos-Marte and Tyrique Hyde,…
Share
BitcoinEthereumNews2025/09/18 04:50