By introducing Generative Data Diversity Enhancement (GDDE) and conducting a thorough examination of data distribution inconsistencies, DiverGen promotes generative data augmentation for example segmentation. DiverGen recognizes that a lack of real data biases model learning and extends the learnable distribution using three complementary diversity axes: generative model diversity (combining Stable Diffusion and DeepFloyd-IF outputs), prompt diversity (using ChatGPT-generated descriptions), and category diversity (adding ImageNet-based categories).By introducing Generative Data Diversity Enhancement (GDDE) and conducting a thorough examination of data distribution inconsistencies, DiverGen promotes generative data augmentation for example segmentation. DiverGen recognizes that a lack of real data biases model learning and extends the learnable distribution using three complementary diversity axes: generative model diversity (combining Stable Diffusion and DeepFloyd-IF outputs), prompt diversity (using ChatGPT-generated descriptions), and category diversity (adding ImageNet-based categories).

How Generative Data Expands AI’s Understanding of the Real World

Abstract and 1 Introduction

  1. Related Work

  2. Our Proposed DiverGen

    3.1. Analysis of Data Distribution

    3.2. Generative Data Diversity Enhancement

    3.3. Generative Pipeline

  3. Experiments

    4.1. Settings

    4.2. Main Results

    4.3. Ablation Studies

  4. Conclusions, Acknowledgments, and References

\ Appendix

A. Implementation Details

B. Visualization

3.1. Analysis of Data Distribution

Existing methods [28, 29, 34] often attribute the role of generative data to addressing class imbalance or data scarcity. In this paper, we provide an explanation for two main questions from the perspective of distribution discrepancy.

\ Why does generative data augmentation enhance model performance? We argue that there exist discrepancies between the model learned distribution of the limited real training data and the distribution of real-world data. The role of adding generative data is to alleviate the bias of the real training data, effectively mitigating overfitting the training data.

\ First, to intuitively understand the discrepancies between different data sources, we use CLIP [21] image encoder to extract the embeddings of images from different data sources, and then use UMAP [18] to reduce dimensions for visualization. Visualization of data distributions on different sources is shown in Figure 1. Real-world data (LVIS [8] train and LVIS val) cluster near the center, while generative data (Stable Diffusion [22] and IF [24]) are more dispersed, indicating that generative data can expand the data distribution that the model can learn.

\ hen, to characterize the distribution learned by the model, we employ the free energy formulation used by Joseph et al. [10]. This formulation transforms the logits outputted by the classification head into an energy function. The formulation is shown below:

\

\

\

\ Table 1. Results of train-val gap on different data sources. With the augmentation of generative data, all TVG of LVIS are lower than LVIS + Gen, showing that adding generative data can effectively alleviate overfitting to the training data.

\ What types of generative data are beneficial for improving model performance? We argue that there are also discrepancies between the distribution of the generative data and the real-world data distribution. If these discrepancies are not properly addressed, the full potential of the generative model cannot be attained.

\ We divide the generative data into ‘frequent’, ‘common’, and ‘rare’ [8] groups, and train three models using each group of data as instance paste source. The inference results are shown in Table 2. We find that the metrics on the corresponding category subset are lowest when training with only one group of data. We consider model performance to be primarily influenced by the quality and diversity of data. Given that the quality of generative data is relatively consistent, we contend insufficient diversity in the data can mislead the distribution that the model can learn and a more comprehensive understanding is obtained by the model from a diverse set of data. Therefore, we believe that using diverse generative data enables models to better adapt to these discrepancies, improving model performance.

\ Table 2. Results of different category data subset for training. The metrics on the corresponding category subset are lowest when training with only one group of data, showing insufficient diversity in the data can mislead the distribution that the model can learn. Blue font means the lowest value in models using generative data.

\ \

3.2. Generative Data Diversity Enhancement

Through the analysis above, we find that the diversity of generative data is crucial for improving model performance. Therefore, we design a series of strategies to enhance data diversity at three levels: category diversity, prompt diversity, and generative model diversity, which help the model to better adapt to the distribution discrepancy between generative data and real data.

\ Category diversity. The above experiments show that including data from partial categories results in lower performance than incorporating data from all categories. We believe that, akin to human learning, the model can learn features beneficial to the current category from some other categories. Therefore, we consider increasing the diversity of data by adding extra categories. First, we select some extra categories besides LVIS from ImageNet-1K [23] categories based on WordNet [5] similarity. Then, the generative data from LVIS and extra categories are mixed for training, requiring the model to learn to distinguish all categories. Finally, we truncate the parameters in the classification head corresponding to the extra categories during inference, ensuring that the inferred category range remains within LVIS.

\ Prompt diversity. The output images of the text2image generative model typically rely on the input prompts. Existing methods [34] usually generate prompts by manually designing templates, such as “a photo of a single {category_name}.” When the data scale is small, designing prompts manually is convenient and fast. However, when generating a large scale of data, it is challenging to scale the number of manually designed prompts correspondingly. Intuitively, it is essential to diversify the prompts to enhance data diversity. To easily generate a large number of prompts, we choose large language model, like ChatGPT, to enhance the prompt diversity. We have three requirements for the large language model: 1) each prompt should be as different as possible; 2) each prompt should ensure that there is only one object in the image; 3) prompts should describe different attributes of the category. For example, if the category is food, prompts should cover attributes like color, brand, size, freshness, packaging type, packaging color, etc. Limited by the inference cost of ChatGPT, we use the manually designed prompts as the base and only use ChatGPT to enhance the prompt diversity for a subset of categories. Moreover, we also leverage the controllability of the generative model, adding the constraint “in a white background” after each prompt to make the background of output images simple and clear, which reduces the difficulty of mask annotation.

\ Generative model diversity. The quality and style of output images vary across generative models, and the data distribution learned solely from one generative model’s data is limited. Therefore, we introduce multiple generative models to enhance the diversity of data, allowing the model to learn from wider data distributions. We selected two commonly used generative models, Stable Diffusion [22] (SD) and DeepFloyd-IF [24] (IF). We use Stable Diffusion V1.5, generating images with a resolution of 512 × 512, and use images output from Stage II of IF with a resolution of 256 × 256. For each category in LVIS, we generated 1k images using two models separately. Examples from different generative models are shown in Figure 2.

\ Figure 2. Examples of various generative models. The samples generated by different generative models vary, even within the same category.

\

3.3. Generative Pipeline

The generative pipeline of DiverGen is built upon XPaste [34]. It can be divided into four stages: instance generation, instance annotation, instance filtration and instance augmentation. The overview of DiverGen is illustrated in Figure 3.

\ Instance generation. Instance generation is a crucial stage for enhancing data diversity. In this stage, we employ our proposed Generative Data Diversity Enhancement (GDDE), as mentioned in Sec 3.2. In category diversity enhancement, we utilize the category information from LVIS [8] categories and extra categories selected from ImageNet-1K [23]. In prompt diversity enhancement, we utilize manually designed prompts and ChatGPT designed prompts to enhance prompt diversity. In model diversity enhancement, we employ two generative models, SD and IF.

\ Instance annotation. We employ SAM [12] as our annotation model. SAM is a class-agnostic promptable segmenter that outputs corresponding masks based on input prompts, such as points, boxes, etc. In instance generation, leveraging the controllability of the generative model, the generative images have two characteristics: 1) each image predominantly contains only one foreground object; 2) the background of the images is relatively simple. Therefore, we introduce a SAM-background (SAM-bg) annotation strategy. SAM-bg takes the four corner points of an image as input prompts for SAM to obtain the background mask, then inverts the background mask as the mask of the foreground object. Due to the conditional constraints during the instance generation stage, this strategy is simple but effective in producing high-quality masks.

\ Instance filtration. In the instance filtration stage, X-Paste utilizes the CLIP score (similarity between images and text) as the metric for image filtering. However, we observe that the CLIP score is ineffective in filtering low-quality images. In contrast to the similarity between images and text, we think the similarity between images can better filter out low-quality images. Therefore, we propose a new metric called CLIP inter-similarity. We use the image encoder of CLIP [21] to extract image embeddings for objects in the training set and generative images, then calculate the similarity between them. If the similarity is too low, it indicates a significant disparity between the generative and real images, suggesting that it is probably a poor-quality image and needs to be filtered.

\ Instance augmentation. We use the augmentation strategy proposed by X-Paste [34] but do not use the data retrieved from the network or the instances in LVIS [8] training set as the paste data source, only use the generative data as the paste data source.

\ Figure 3. Overview of the DiverGen pipeline. In instance generation, we enhance data diversity at three levels: category diversity, prompt diversity, and generative model diversity. Next, we use SAM-background to obtain high-quality masks. Then, we use CLIP inter-similarity to filter out low-quality data. At last, we use the instance paste strategy to increase model learning efficiency on generative data.

\

:::info Authors:

(1) Chengxiang Fan, with equal contribution from Zhejiang University, China;

(2) Muzhi Zhu, with equal contribution from Zhejiang University, China;

(3) Hao Chen, Zhejiang University, China (haochen.cad@zju.edu.cn);

(4) Yang Liu, Zhejiang University, China;

(5) Weijia Wu, Zhejiang University, China;

(6) Huaqi Zhang, vivo Mobile Communication Co..

(7) Chunhua Shen, Zhejiang University, China (chunhuashen@zju.edu.cn).

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Piyasa Fırsatı
Sleepless AI Logosu
Sleepless AI Fiyatı(AI)
$0.03533
$0.03533$0.03533
-3.91%
USD
Sleepless AI (AI) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

More On-Chain Activity as Over 131,000 Cardano Transactions Feature NIGHT Tokens

More On-Chain Activity as Over 131,000 Cardano Transactions Feature NIGHT Tokens

The launch of NIGHT, the native token of Midnight, has significantly impacted the number of transactions across the broader Cardano ecosystem. Cardano founder Charles
Paylaş
Coinstats2025/12/18 15:13
What is Ethereum’s Fusaka Upgrade? Everything You Need to Know

What is Ethereum’s Fusaka Upgrade? Everything You Need to Know

Over the past few weeks, one of the most talked-about topics within the crypto community has been Ethereum’s Fusaka upgrade. What exactly is this upgrade, and how does it affect the Ethereum blockchain and the average crypto investor? This article will be the only explainer guide you need to understand the details of this upgrade within the Ethereum ecosystem. Why Does Ethereum Undergo Upgrades? To understand what the Fusaka upgrade will achieve, it is essential to comprehend what Ethereum’s upgrades aim to accomplish. The layer-1 Ethereum network was originally designed as a proof-of-work (PoW) blockchain. This implied that miners were actively behind the block mining process. While this consensus mechanism ensured security for the L1 blockchain, it also triggered slower transactions. The Ethereum development team unveiled a detailed roadmap, outlining various upgrades that will fix most of the network’s issues. These problems include its scalability issue, which refers to the network’s ability to process transactions faster. Currently, the Ethereum blockchain processes fewer transactions per second compared to most blockchains using the proof-of-stake (PoS) consensus mechanism. Over the past decade, Ethereum’s developers have implemented most of these upgrades, enhancing the blockchain’s overall performance. Here is a list of the upgrades that Ethereum has undergone: Frontier: July 2015 Frontier Thawing: September 2015 Homestead: March 2016 DAO Fork: July 2016 Tangerine Whistle: October 2016 Spurious Dragon: November 2016 Byzantium: October 2017 Constantinople: February 2019 Petersburg: February 2019 Istanbul: December 2019 Muir Glacier: January 2020 Berlin: April 2021 London: August 2021 Arrow Glacier: December 2021 Gray Glacier: June 2022 The Merge: September 2022 Bellatrix: September 2022 Paris: September 2022 Shanghai: April 2023 Capella: April 2023 Dencun (Cancun-Deneb): March 2024 Pectra (Prague-Electra): May 2025 Most of these upgrades (forks) addressed various Ethereum Improvement Proposals (EIPs) geared towards driving the blockchain’s growth. For instance, the Merge enabled the transition from the PoW model to a proof of stake (PoS) algorithm. This brought staking and network validators into the Ethereum mainnet. Still, this upgrade failed to unlock the much-needed scalability. For most of Ethereum’s existence, it has housed layer-2 networks, which leverage Ethereum’s infrastructure to tackle the scalability issue. While benefiting from the L1 blockchain’s security and decentralization, these L2 networks enable users to execute lightning-fast transactions. Last year’s Dencun upgrade made transacting on layer-2 networks even easier with the introduction of proto-danksharding (EIP-4844). Poised to address the scalability issue, this upgrade introduces data blobs. You can think of these blobs as temporary, large data containers that enable cheaper, yet temporary, storage of transactions on L2 networks. The effect? It reduces gas fees, facilitating cheaper transaction costs on these L2 rollups. The Pectra upgrade, unveiled earlier this year, also included EIPs addressing the scalability issue plaguing the Ethereum ecosystem. The upcoming upgrade, Fusaka, will help the decade-old blockchain network to become more efficient by improving the blob capacity. What is Ethereum’s Fusaka Upgrade? Fusaka is an upgrade that addresses Ethereum’s scalability issue, thereby making the blockchain network more efficient. As mentioned earlier, Fusaka will bolster the blob capacity for layer-2 blockchains, which refers to the amount of temporary data the network can process. This will help facilitate faster transactions on these L2 scaling solutions. It is worth noting that upon Fusaka’s completion, users will be able to save more when performing transactions across layer-2 networks like Polygon, Arbitrum, and Base. The upgrade has no direct positive impact on the L1 blockchain itself. On September 18th, Christine Kim, representing Ethereum core developers, confirmed the launch date for Fusaka via an X post. Following an All Core Developers Consensus (ACDC) call, the developer announced that the Ethereum Fusaka upgrade will take place on December 3rd. Ahead of the upgrade, there will be three public testnets. Fusaka will first be deployed on Holesky around October 1st. If that goes smoothly, it will move to Sepolia on October 14th. Finally, it will be on the Hoodi testnet on October 28th. Each stage provides developers and node operators with an opportunity to identify and address bugs, run stress tests, and verify that the network can effectively handle the new features. Running through all three testnets ensures that by the time the upgrade is ready for mainnet, it will have been thoroughly tested in different environments. Crucial to the Fusaka upgrade are the Blob Parameter Only (BPO) forks, which will enhance the blob capacity without requiring end-users of the blockchain network to undergo any software changes. For several months, the Ethereum development team has been working towards unveiling the BPO-1 and BPO-2 forks. Blockchain developers have pooled resources to develop Fusaka through devnets. Following performances from devnet-5, developers within the ecosystem confirmed that the BPO upgrades will come shortly after the Fusaka mainnet debut. Approximately two weeks after the mainnet launch, on December 17th, the BPO-1 fork will increase the blob target/max from 6/9 to 10/15. Then, two weeks later, on January 7th, 2026, the BPO-2 fork is expected to expand capacity further to a metric of 14/21. Ultimately, the Fusaka upgrade would have doubled the blob capacity, marking a pivotal move for the Ethereum ecosystem. Impact on the Ethereum Ecosystem Admittedly, the Ethereum ecosystem is expected to see more developers and users join the bandwagon. With the introduction of faster and cheaper transactions, developers and business owners can explore more efficient ways to build on the L1 blockchain. This means we can see initiatives like crypto payment solutions and more decentralized finance (DeFi) projects enter the Ethereum bandwagon. Users, on the other hand, will benefit as they execute cheaper on-chain transactions. Despite the benefits from this initiative, some in the crypto community worry about the reduction in Ethereum’s gwei (the smallest unit of the Ether coin). Shortly after the Dencun upgrade, Ethereum’s median gas fee dropped to 1.7 gwei. Fast-forward to the present, and the median gas fee sits at 0.41 gwei, according to public data on Dune. This drop hints at the drastic reduction in gas fees, which could affect those staking their crypto holdings on the L1 blockchain, making it less attractive to stakers. Since the Fusaka upgrade aims to reduce the L2 network gas fee further, some observers may worry that crypto stakers will receive fewer block rewards. Time will tell if the Ethereum development team will explore new incentives for those participating in staking. Will Ether’s Price Pump? There is no guarantee that Ether (ETH) will jump following Fusaka’s launch in December. This is because the second-largest cryptocurrency saw no significant price movement during past major upgrades. According to data from CoinMarketCap, ETH sold for approximately $4,400 at the time of writing. Notably, the coin saw its current all-time high (ATH) of $4,900 roughly a month ago. The price pump was fueled by consistent Ether acquisitions by exchange-traded fund (ETF) buyers and crypto treasury firms. Source: CoinMarketCap Although these upgrades do not guarantee a surge in ETH’s price, they have a lasting impact on the underlying Ethereum blockchain. Conclusion Over the past 10 years, the Ethereum network has had no rest as it constantly ships out new upgrades to make its mainnet more scalable. The Fusaka upgrade aims to make Ethereum layer-2 networks cheaper to use. To ensure its smooth usage, several testnets are lined up. Stay tuned for updates on how Ethereum will be post-Fusaka. The post What is Ethereum’s Fusaka Upgrade? Everything You Need to Know appeared first on Cointab.
Paylaş
Coinstats2025/09/20 06:57
Vitalik Buterin Suggests Simplifying Ethereum to Boost User Understanding

Vitalik Buterin Suggests Simplifying Ethereum to Boost User Understanding

The post Vitalik Buterin Suggests Simplifying Ethereum to Boost User Understanding appeared on BitcoinEthereumNews.com. Ethereum trustlessness requires broader
Paylaş
BitcoinEthereumNews2025/12/18 15:13