Buy Crypto Markets Spot FuturesGOLD Earn Event Center

Sparse Spectral Training (SST) introduces a low-rank optimization technique that enhances both Euclidean and hyperbolic neural networks. Tested on machine translation benchmarks like IWSLT and Multi30K, SST consistently outperformed LoRA, ReLoRA*, and even full-rank training, delivering higher BLEU scores and preventing overfitting in high-dimensional hyperbolic spaces. The results highlight SST’s ability to generalize efficiently while maintaining stability and robustness across architectures.Sparse Spectral Training (SST) introduces a low-rank optimization technique that enhances both Euclidean and hyperbolic neural networks. Tested on machine translation benchmarks like IWSLT and Multi30K, SST consistently outperformed LoRA, ReLoRA*, and even full-rank training, delivering higher BLEU scores and preventing overfitting in high-dimensional hyperbolic spaces. The results highlight SST’s ability to generalize efficiently while maintaining stability and robustness across architectures.

Generalizing Sparse Spectral Training Across Euclidean and Hyperbolic Architectures

Author: Hackernoon

Source: Hackernoon

2025/10/29 19:10

4 min read

LIKE$0.001206-8.28%

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Table of Links

Abstract and 1. Introduction

Related Work
Low Rank Adaptation

3.1 LoRA and 3.2 Limitation of LoRA

3.3 ReLoRA*
Sparse Spectral Training

4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

4.3 Why SVD Initialization is Important

4.4 SST Balances Exploitation and Exploration

4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST
Experiments

5.1 Machine Translation

5.2 Natural Language Generation

5.3 Hyperbolic Graph Neural Networks
Conclusion and Discussion
Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

5 Experiments

To validate our Sparse Spectral Training (SST) approach, we conducted experiments on both Euclidean and hyperbolic neural networks, demonstrating the generalization of SST across various neural network architectures and embedding geometries.

\ We compared SST with full-rank training, LoRA, and ReLoRA*. The key distinctions between ReLoRA* and ReLoRA [5] is that ReLoRA includes a full-rank training as "warm start", making it not an end-to-end memory-efficient method. Moreover, ReLoRA* resets all optimizer states for low-rank parameters, unlike ReLoRA, which resets 99%.

\ For our experiments, all linear layers in the baseline models were modified to their low-rank counterparts. Hyperparameters and implementation details are provided in Appendix F.

\ Further comparisons of SST with the contemporaneous work GaLore [16] are elaborated in Appendix H, highlighting SST’s superior performance in low-rank configurations. Ablation studies are documented in Appendix I.

5.1 Machine Translation

We employ the vanilla transformer [10] as the Euclidean transformer and HyboNet [12] as the hyperbolic transformer. Our experiments include three widely-used machine translation datasets: IWSLT’14 English-to-German [33], IWSLT’17 German-to-English [34], and Multi30K German-toEnglish [35]. For IWSLT’14, the hyperparameters are aligned with those from HyboNet.

\ Table 1 presents BLEU scores for IWSLT’14 across various dimensions and ranks (r). The results confirm that SST consistently outperforms other low-rank methods. Notably, some BLEU scores for the hyperbolic transformer are zero, due to the training process encountering NaN losses, whereas SST maintains stability throughout.

\ \ Table 2: Comparison of BLEU scores on Multi30k and IWSLT’17 datasets using Euclidean Transformer (dimension = 512), r = 32. Scores highlighted in bold represent the highest performance achieved by low-rank methods.

\ \ Previous hyperbolic neural network articles have predominantly focused on low-dimensional configurations [25, 36, 37]. A key characteristic of hyperbolic space is its exponential growth in volume with distance from a reference point, which is significantly more rapid than the polynomial growth seen in Euclidean space [38]. This expansive nature makes hyperbolic spaces particularly prone to overfitting as dimensionality increases. By imposing constraints on the parameter search space of hyperbolic neural networks, SST prevents the overfitting typically associated with such high-dimensional settings. This spectral sparse constraint enhances the stability and robustness of our models, ensuring consistent performance during training.

\ Further comparative results on the Multi30K and IWSLT’17 datasets using the standard dimensions for vanilla Euclidean transformers are documented in Table 2. Here, SST not only surpasses other low-rank methods but also demonstrates superior performance compared to full-rank training.

:::info Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Don't Miss $200,000 U-Fest

Get mystery boxes, 12% APR & $200 new user gifts!

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.