This article details the selection and implementation of classic and SOTA incremental learning methods for benchmarking the new Instance-Incremental Learning setting.This article details the selection and implementation of classic and SOTA incremental learning methods for benchmarking the new Instance-Incremental Learning setting.

CIL Methods Fail IIL: Why Existing Baselines Struggle with Model Promotion

2025/11/12 23:00
5분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

    \

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

10. Implementation details

\ Selection of compared methods. As few existing method is proposed for the IIL setting, we reproduce several classic and SOTA CIL methods by referring to their original code or paper with the minimum revision.

\ LwF [12] is one of the earliest incremental learning algorithms based on deep learning, which propose to use knowledge distillation for resisting knowledge forgetting. Considering the significance of this method, a lot of CIL methods are still comparing with this baseline.

\ iCarl [22] is base on the LwF method and propose to use old exemplars for label-level knowledge distillation.

\ PODNet [4] implements old knowledge distillation at the feature level, which is different from the former two.

\ Der [31] which expends the network dynamically and attains the best CIL results given task id. Expending the neural network shows great power in learning new knowledge and retain old knowledge. Although in new IIL setting the adding of new parameters is limited, we are glad to know the performance of the method with dynamic network in the new IIL setting.

\ OnPro [29] uses online prototypes to enhance the existing boundaries with only the data visible at the current time step, which satisfies our setting without old data. Their motion to make the learned feature more generalizable to new tasks is also consist with our method to promote the model continually utilizing only new data.

\ online learning [6] can be applied to the hybrid-incremental learning. Thus, it can be implemented directly in the new IIL setting. It proposes a modified cross-distillation by smoothing student predictions with teacher predictions for old knowledge retaining, which is different with our method to alter the learning target by fusing annotated label with the teacher predictions.

\ Besides above CIL methods, ISL [13] is one of the scarce IIL methods that can be directly implemented in our IIL setting. Different from our setting that aims to address all newly achieved instances, ISL is proposed for incremental sub-population learning. Hence, we not only applied ISL in our setting for comparative purposes but also evaluated ours following their setting.

\ Our extensive experiments real that neither existing CIL methods nor IIL methods can tame the proposed IIL learning problem. Existing methods, especially the CIL methods, primarily concentrate on mitigating catastrophic forgetting, demonstrating limited effectiveness in learning from new data. In real-world applications, enhancing the model with additional instances and achieving more generalizable features is crucial. This underscores the importance and relevance of our proposed IIL setting.

\ Details for reproduction of compared methods. To reproduce existing rehearsal-based methods, including iCarl [22], PODNet [4], Der [31], OnPro [29], although no old data is available in the new IIL setting, we still set a memory of 20 exemplars per class. For all compared methods, we trained the base model for their own if necessary. For example, some of the methods utilize more comprehensive data augmentation methods and have a higher base performance than others. We kept the data augmentation part in reproducing these methods, even in the IIL learning phase.

\ For LwF [12], as there is no need to add new classification head, we directly implement their training loss on the old classification head. iCarl [22] is implemented using their provided codes on Tensorflow. PODNet [4] and Der [31] are implemented based on their own codes using Pytorch. For online learning [6], we reproduce it only using the part that is related to learning from new instances of old classes in the first training phase. ISL [13] is also implemented based on its original codes with the original setting of the learning rate, i.e. lr=0.05 for the base training and lr=0.005 for the IIL learning phase. OnPro [29] utilize a quite strong data augmentation during training, which leads to a slightly low accuracy of the based model compared to other methods. As OnPro trains the base model for

\ Figure 10. The original images and corresponding dusted images.

\ only one epoch in the online training, which is not sufficient to achieve a strong base model, we change it to 60 epochs for training the base model as other methods. For the IIL learning phase, we keep the one epoch setting as it is because we found training more epochs don’t lead to better performance. The learning rate of OnPro used in training the based model is 5e-4 and decays with a ratio of 0.1 at epoch 12, 30, 50. In the IIL traing phase, lr=5e-7, which is far smaller than ours (0.01).

\

:::info Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.

:::


:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

\

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!