The proposed method mainly consists of three components: DBD with fused label (FL), DBD. with dusting the input space ( DIS), and knowledge consolidation (KC) The ablation study on these three components is shown in Fig. 5. It can be seen that all components contributes to the continual knowledge accumulation with new data.The proposed method mainly consists of three components: DBD with fused label (FL), DBD. with dusting the input space ( DIS), and knowledge consolidation (KC) The ablation study on these three components is shown in Fig. 5. It can be seen that all components contributes to the continual knowledge accumulation with new data.

Ablation: The Role of Fused Labels and Teacher EMA in Instance-Incremental Learning

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

    \

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

5.3. Ablation study

All ablation studies are implemented on Cifar-100 dataset.

\ Figure 5. Effect of three components in DBD: fused label (FL), dusted input space (DIS), and knowledge consolidation (KC). It can be seen that all components contributes to the continual knowledge accumulation with new data.

\ Figure 6. t-SNE visualization of features after first IIL phase, where the numbers denotes classes.

\ Effect of each component. The proposed method mainly consists of three components: DBD with fused label (FL), DBD with dusting the input space (DIS), and knowledge consolidation (KC). The ablation study on these three components is shown in Fig. 5. It can be seen that DBD with all components has the largest performance promotion in all phases. Although DBD with only DIS fails to enhance the model (which can be understood), it still shows great potential in resisting CF contrasted to fine-tuning with early stopping. The bad performance of fine-tuning also verifies our analysis that learning with one-hot label causes the decision boundary shifting to other than broadening to the new data. Different from previous distillation base on one-hot label, fused label well balances the need for retaining old knowledge and learning from new observation. Combining the boundary-aware distillation with knowledge consolidation, the model can better tame the knowledge learning and retaining problem with only new data. Consolidating the knowledge to teacher model during learning not only releases the student model in learning new knowledge, but also an effective way to avoid overfitting to new data.

\ Impact of fused labels. In Sec. 4.1, the different learning demands for new outer samples and new inner samples in DBD are analyzed. We propose to use fused label for unifying knowledge retaining and learning on new data. Features learned in the first incremental phase with fused label and one-hot label are contrasted in Fig. 6. Learning with fused label not only retains most of the feature’s distribution but also has better separability (more dispersed) compared to the base model M0. While learning with one-hot label

\ Table 2. Results of incremental sub-population learning on Entity-30 benchmark. Unseen, All and Fi denote the average test accuracy on unseen subclasses in incremental data, on all seen (base data) and unseen subclasses, and the average forgetting rate over all test data. More details of the metrics can be found in ISL [13].

\ Table 3. Performance of the student network in IIL tasks by assigning different labels to new samples as learning target.

\ changes the feature distribution into a elongated shape.

\ Tab. 3 shows the performance of student model when applying different learning target to the new inner samples and new outer samples, which reveals the influence of different labels in boundary distillation. As can be seen, utilizing one-hot label for learning new samples degrades the model with a large forgetting rate. Training the inner samples with teacher score and outer samples with one-hot labels is similar with existing rehearsal-based methods which distills knowledge using the teacher’s prediction on exemplars and learns from new data with annotated labels. Such a kind of manner reduces forgetting rate but benefits less in learning new. When applying fused labels to outer samples, the student model can well aware the existing DB and has much better performance. Notably, using the fused label for all new samples achieves the best performance. FL benefits retaining the old knowledge as well as enlarging the DB.

\ Impact of DIS. To locate and distill the learned decision boundary, we dust the input space with strong Gaussian noises as perturbations. The Gaussian noises are different from the one used in data augmentation because of its high deviation. In data augmentation, the image after strong augmentation should not change its label. However, there is no such limit in DIS. We hope to relocate inputs to

\ Table 4. Analysis of sensitivity to the noise intensity and DIS loss on Cifar-100. The student’s performance is reported as it is directly affected by DIS.

\ the peripheral area among classes other than keeping them in the same category as they are. In our experiments, the Gaussian noises in pixel intensity obey N(0, 10). Sensitivity to the noise intensity and related DIS loss are shown in Tab. 4. It can be seen that when noise intensity is small as used in data augmentation, the promotion is little considering the result of base model is 64.34%. Best result is attained when the deviation of noise δ = 10. When the noise intensity is too large, it might push all input images as outer samples and do no help in locating the decision boundary. Moreover, it will significantly alter the parameters of batch normalization layers in the network, which deteriorates the model. Visualization of the dusted input image can be found in our supplementary material. Different to noise intensity, the model is less sensitive to the DIS loss factor λ.

\ Impact of knowledge consolidation. It has been proved theoretically that consolidating the knowledge to teacher model is capable to achieve a model with better generalization on both of the old task and the new task. Fig. 7 left shows the model performance with and without the KC. No matter applied on the vanilla distillation method or the proposed DBD, the knowledge consolidation demonstrates great potential in accumulating knowledge from the student model and promoting the model’s performance. However, not all model EMA strategy can works. As shown in Fig. 7 right, traditional EMA that implements after every iteration fails to accumulate knowledge, where the teacher always performs inferior to the student. Too frequent EMA will cause the teacher model soon collapsed to the student

\ Figure 7. Knowledge consolidation (KC). Left: the solid lines draw the results with KC and the dashed lines are without KC. It shows that the KC significantly promotes the performance. Right: Influence of different model EMA strategies on the teacher (t, solid line) and student (s, dashed line) model.

\ model, which causes forgetting problem and limits the following learning. Lowering the EMA frequency to every epoch (EMA epoch1) or every 5 epoch (EMA epoch5) performs better, which satisfies our theoretical analysis to keep total updating steps n properly small. Our KC-EMA which empirically performs EMA every 5 epoch with adaptive momentum attains the best result.

\

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Tokyo’s Metaplanet Launches Miami Subsidiary to Amplify Bitcoin Income

Tokyo’s Metaplanet Launches Miami Subsidiary to Amplify Bitcoin Income

Metaplanet Inc., the Japanese public company known for its bitcoin treasury, is launching a Miami subsidiary to run a dedicated derivatives and income strategy aimed at turning holdings into steady, U.S.-based cash flow. Japanese Bitcoin Treasury Player Metaplanet Opens Miami Outpost The new entity, Metaplanet Income Corp., sits under Metaplanet Holdings, Inc. and is based […]
Share
Coinstats2025/09/18 00:32
How A 130-Year-Old Course Reimagined The Golf Experience

How A 130-Year-Old Course Reimagined The Golf Experience

The post How A 130-Year-Old Course Reimagined The Golf Experience appeared on BitcoinEthereumNews.com. An aerial view of Storm King Golf Club, a reimagined golf experience that’s scheduled to open in 2026. Erik Matuszewski In the rolling hills of New York’s Hudson Valley, just 56 miles from Manhattan and minutes from West Point, a revolutionary new golf course is reimagining how golf can be played, experienced, and shared. Named after the nearby mountain that overlooks the property, Storm King Golf Club packs more variety and possibility in 63 acres than many courses four times its size, offering 40 distinct hole configurations, five different 9-hole routing options, and a 19-hole par 3 layout. “The idea was to create a unique place where people could experience golf in a way that’s fun and interesting to them,” said founder David Gang, a software executive who purchased the course about five years ago with a vision to reimagine golf and challenge convention along the way. Storm King is a far cry from the original facility that opened in 1894; today, it’s a wild looking, choose-your-own-adventure playground where golfers can craft their journey based on skill level, mood, or simple curiosity about what lies around the next bend. The facility boasts 12 green complexes totaling 225,000 square feet of putting surface, nearly four times that of an iconic property like Pebble Beach Golf Links, which has 63,000 square feet across all 18 holes. “Our brains have been wired for golf in a very traditional way forever,” says Gang, an avid golfer who co-founded Brightspot, a leading content management system. There are unusual design shapes and unique routing options at Storm King, which was built to focus on versatility, playability and sustainability. Erik Matuszewski “We think about 9 holes, 18 holes, par 3s, par 4s, and par 5s. They’re very set in our minds,” he added. “So, when you come…
Share
BitcoinEthereumNews2025/09/18 18:44
IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32