Evaluates MIVPG performance on single-image datasets. Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.Evaluates MIVPG performance on single-image datasets. Enhancements from PPEG and MIL are critical for discerning patterns in small datasets, mitigating the impact of data scarcity on MLLM performance.

Data Scarcity and MLLMs: Using MIL to Uncover Latent Patterns in Single-Image Tasks

2025/11/18 10:01
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

4.2. Scenario 1: Samples with Single Image

We start by assessing the performance of our method on common single-image datasets to validate the effectiveness of considering Multiple Instance Learning through the addition of Pyramid Positional Encoding Generator for each

\ Figure 4. Experiment Results on MSCOCO. We adopt the metrics used in [22]. It is evident that the incorporation of MIL modules enhances the QFormer in the majority of cases.

\ layer containing MIVPG. Following the fine-tuning baseline in BLIP2, we choose MSCOCO[23] as the evaluation dataset and employ the Karpathy validation and testing set split. The original training set contains approximately 560K image-text pairs. Given that most existing MIL methods are tailored for small datasets, we evaluate performance across various sizes of training subsets obtained through random sampling. In this dataset, we treat patches as individual instances, and each sample comprises only one image, indicating that N = 1.

\ The result from the MSCOCO dataset is shown in Figure 4. It reveals that the enhancements achieved through the use of PPEG are more noticeable when working with smaller datasets. As the dataset size increases, the difference in performance becomes less significant. This can be attributed to the fact that in cases of limited data, models often struggle to discern latent and implicit patterns. Therefore, more sophisticated modules are required to uncover deeper relationships within the data. Conversely, existing MLLMs are typically pretrained on extensive datasets, which tend to mitigate the impact of data scarcity. In practical applications, we demonstrate that one can draw upon MIL techniques to enhance MLLMs performance in scenarios where there is insufficient data for the downstream task.

\ Table 1. Experiments on the PatchGastricADC22 dataset[36], we evaluate our proposed method against baselines from [36], considering four widely-adopted metrics. Augmented baselines, denoted as aug, which signifies a model trained with data augmentation.

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington (wxz9204@mavs.uta.edu);

(2) Wenyi Wu, Amazon (wenyiwu@amazon.com);

(3) Qi Li, Amazon (qlimz@amazon.com);

(4) Rob Barton, Amazon (rab@amazon.com);

(5) Boxin Du, Amazon (boxin@amazon.com);

(6) Shioulin Sam, Amazon (shioulin@amazon.com);

(7) Karim Bouyarmane, Amazon (bouykari@amazon.com);

(8) Ismail Tutar, Amazon (ismailt@amazon.com);

(9) Junzhou Huang, The University of Texas at Arlington (jzhuang@uta.edu).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
SCARCITY Logo
SCARCITY Price(SCARCITY)
$0.00859
$0.00859$0.00859
0.00%
USD
SCARCITY (SCARCITY) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Iran proposes reopening Strait of Hormuz to US, excludes nuclear terms

Iran proposes reopening Strait of Hormuz to US, excludes nuclear terms

The post Iran proposes reopening Strait of Hormuz to US, excludes nuclear terms appeared on BitcoinEthereumNews.com. Iran has proposed reopening the Strait of Hormuz
Share
BitcoinEthereumNews2026/04/30 05:49
Supreme Court signals it may deal Trump major setback in mass deportation crusade

Supreme Court signals it may deal Trump major setback in mass deportation crusade

Conservative justices on the Supreme Court showed signs of leaning towards blocking Trump's effort to deport millions of immigrants. Politico reported on Wednesday
Share
Rawstory2026/04/30 06:27
One Of Frank Sinatra’s Most Famous Albums Is Back In The Spotlight

One Of Frank Sinatra’s Most Famous Albums Is Back In The Spotlight

The post One Of Frank Sinatra’s Most Famous Albums Is Back In The Spotlight appeared on BitcoinEthereumNews.com. Frank Sinatra’s The World We Knew returns to the Jazz Albums and Traditional Jazz Albums charts, showing continued demand for his timeless music. Frank Sinatra performs on his TV special Frank Sinatra: A Man and his Music Bettmann Archive These days on the Billboard charts, Frank Sinatra’s music can always be found on the jazz-specific rankings. While the art he created when he was still working was pop at the time, and later classified as traditional pop, there is no such list for the latter format in America, and so his throwback projects and cuts appear on jazz lists instead. It’s on those charts where Sinatra rebounds this week, and one of his popular projects returns not to one, but two tallies at the same time, helping him increase the total amount of real estate he owns at the moment. Frank Sinatra’s The World We Knew Returns Sinatra’s The World We Knew is a top performer again, if only on the jazz lists. That set rebounds to No. 15 on the Traditional Jazz Albums chart and comes in at No. 20 on the all-encompassing Jazz Albums ranking after not appearing on either roster just last frame. The World We Knew’s All-Time Highs The World We Knew returns close to its all-time peak on both of those rosters. Sinatra’s classic has peaked at No. 11 on the Traditional Jazz Albums chart, just missing out on becoming another top 10 for the crooner. The set climbed all the way to No. 15 on the Jazz Albums tally and has now spent just under two months on the rosters. Frank Sinatra’s Album With Classic Hits Sinatra released The World We Knew in the summer of 1967. The title track, which on the album is actually known as “The World We Knew (Over and…
Share
BitcoinEthereumNews2025/09/18 00:02

Roll the Dice & Win Up to 1 BTC

Roll the Dice & Win Up to 1 BTCRoll the Dice & Win Up to 1 BTC

Invite friends & share 500,000 USDT!