Sparse Spectral Training (SST) introduces a low-rank optimization technique that enhances both Euclidean and hyperbolic neural networks. Tested on machine translation benchmarks like IWSLT and Multi30K, SST consistently outperformed LoRA, ReLoRA*, and even full-rank training, delivering higher BLEU scores and preventing overfitting in high-dimensional hyperbolic spaces. The results highlight SST’s ability to generalize efficiently while maintaining stability and robustness across architectures.Sparse Spectral Training (SST) introduces a low-rank optimization technique that enhances both Euclidean and hyperbolic neural networks. Tested on machine translation benchmarks like IWSLT and Multi30K, SST consistently outperformed LoRA, ReLoRA*, and even full-rank training, delivering higher BLEU scores and preventing overfitting in high-dimensional hyperbolic spaces. The results highlight SST’s ability to generalize efficiently while maintaining stability and robustness across architectures.

Generalizing Sparse Spectral Training Across Euclidean and Hyperbolic Architectures

2025/10/29 19:10
4 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1. Introduction

  1. Related Work

  2. Low Rank Adaptation

    3.1 LoRA and 3.2 Limitation of LoRA

    3.3 ReLoRA*

  3. Sparse Spectral Training

    4.1 Preliminaries and 4.2 Gradient Update of U, VT with Σ

    4.3 Why SVD Initialization is Important

    4.4 SST Balances Exploitation and Exploration

    4.5 Memory-Efficient Implementation for SST and 4.6 Sparsity of SST

  4. Experiments

    5.1 Machine Translation

    5.2 Natural Language Generation

    5.3 Hyperbolic Graph Neural Networks

  5. Conclusion and Discussion

  6. Broader Impacts and References

Supplementary Information

A. Algorithm of Sparse Spectral Training

B. Proof of Gradient of Sparse Spectral Layer

C. Proof of Decomposition of Gradient of Weight

D. Proof of Advantage of Enhanced Gradient over Default Gradient

E. Proof of Zero Distortion with SVD Initialization

F. Experiment Details

G. Singular Value Pruning

H. Evaluating SST and GaLore: Complementary Approaches to Memory Efficiency

I. Ablation Study

5 Experiments

To validate our Sparse Spectral Training (SST) approach, we conducted experiments on both Euclidean and hyperbolic neural networks, demonstrating the generalization of SST across various neural network architectures and embedding geometries.

\ We compared SST with full-rank training, LoRA, and ReLoRA*. The key distinctions between ReLoRA* and ReLoRA [5] is that ReLoRA includes a full-rank training as "warm start", making it not an end-to-end memory-efficient method. Moreover, ReLoRA* resets all optimizer states for low-rank parameters, unlike ReLoRA, which resets 99%.

\ For our experiments, all linear layers in the baseline models were modified to their low-rank counterparts. Hyperparameters and implementation details are provided in Appendix F.

\ Further comparisons of SST with the contemporaneous work GaLore [16] are elaborated in Appendix H, highlighting SST’s superior performance in low-rank configurations. Ablation studies are documented in Appendix I.

5.1 Machine Translation

We employ the vanilla transformer [10] as the Euclidean transformer and HyboNet [12] as the hyperbolic transformer. Our experiments include three widely-used machine translation datasets: IWSLT’14 English-to-German [33], IWSLT’17 German-to-English [34], and Multi30K German-toEnglish [35]. For IWSLT’14, the hyperparameters are aligned with those from HyboNet.

\ Table 1 presents BLEU scores for IWSLT’14 across various dimensions and ranks (r). The results confirm that SST consistently outperforms other low-rank methods. Notably, some BLEU scores for the hyperbolic transformer are zero, due to the training process encountering NaN losses, whereas SST maintains stability throughout.

\ \ Table 2: Comparison of BLEU scores on Multi30k and IWSLT’17 datasets using Euclidean Transformer (dimension = 512), r = 32. Scores highlighted in bold represent the highest performance achieved by low-rank methods.

\ \ Previous hyperbolic neural network articles have predominantly focused on low-dimensional configurations [25, 36, 37]. A key characteristic of hyperbolic space is its exponential growth in volume with distance from a reference point, which is significantly more rapid than the polynomial growth seen in Euclidean space [38]. This expansive nature makes hyperbolic spaces particularly prone to overfitting as dimensionality increases. By imposing constraints on the parameter search space of hyperbolic neural networks, SST prevents the overfitting typically associated with such high-dimensional settings. This spectral sparse constraint enhances the stability and robustness of our models, ensuring consistent performance during training.

\ Further comparative results on the Multi30K and IWSLT’17 datasets using the standard dimensions for vanilla Euclidean transformers are documented in Table 2. Here, SST not only surpasses other low-rank methods but also demonstrates superior performance compared to full-rank training.

\

:::info Authors:

(1) Jialin Zhao, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(2) Yingtao Zhang, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI) and Department of Computer Science;

(3) Xinghang Li, Department of Computer Science;

(4) Huaping Liu, Department of Computer Science;

(5) Carlo Vittorio Cannistraci, Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Computer Science, and Department of Biomedical Engineering Tsinghua University, Beijing, China.

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

XRP Signals Imminent Breakout — Is A 10% Rally Coming?

XRP Signals Imminent Breakout — Is A 10% Rally Coming?

The post XRP Signals Imminent Breakout — Is A 10% Rally Coming? appeared on BitcoinEthereumNews.com. Buyers have been quietly stepping in at lower prices every
Share
BitcoinEthereumNews2026/04/26 07:01
Trump urges journalist to leave Pakistan as Iran peace talks stall

Trump urges journalist to leave Pakistan as Iran peace talks stall

The post Trump urges journalist to leave Pakistan as Iran peace talks stall appeared on BitcoinEthereumNews.com. Trump’s call for a Washington Post journalist to
Share
BitcoinEthereumNews2026/04/26 06:50
Live Nation CEO says demand is unmistakable, concert tickets are underpriced

Live Nation CEO says demand is unmistakable, concert tickets are underpriced

The post Live Nation CEO says demand is unmistakable, concert tickets are underpriced appeared on BitcoinEthereumNews.com. Live Nation CEO Michael Rapino and Smith Entertainment Group CEO Ryan Smith said this week live events are more central than ever to culture and commerce in a post-pandemic world. The executives spoke at CNBC Sport and Boardroom’s Game Plan conference on Tuesday, saying the demand for in-person events has been unmistakable. “No matter what you bring to that table that day, you unite around that one shared experience,” Rapino said. “For those two hours, I tend to drop whatever baggage I have and have a shared moment.” According to Goldman Sachs, the live music industry is expected to grow at a 7.2% compounded annual rate through 2030, fueled by millennials and Gen Z. Smith bought the Utah Jazz in 2020 and launched a new NHL franchise in the state in 2024. “In sports, we’re really media companies,” Smith said. “We’ve got talent, we’ve got distribution. We’re putting on a show or a wedding or something every night.” Get the CNBC Sport newsletter directly to your inbox The CNBC Sport newsletter with Alex Sherman brings you the biggest news and exclusive interviews from the worlds of sports business and media, delivered weekly to your inbox. Subscribe here to get access today. Rapino also emphasized how the economics of music have shifted. With streaming revenue dwarfed by touring income, live shows have become one of artists’ primary sources of revenue. “The artist is going to make 98% of their money from the show,” he said. “We just did Beyonce’s tour. She’s got 62 transport trucks outside. That’s a Super Bowl she’s putting on every night.” Despite headlines about rising ticket prices, Rapino argued that concerts are still underpriced compared to sporting events. “In sports, I joke it’s like a badge of honor to spend 70 grand for Knicks courtside,” Rapino said.…
Share
BitcoinEthereumNews2025/09/18 01:41

Roll the Dice & Win Up to 1 BTC

Roll the Dice & Win Up to 1 BTCRoll the Dice & Win Up to 1 BTC

Invite friends & share 500,000 USDT!