In this study, we address the crucial problem of instability in hyperbolic deep learning, particularly in the learning of the curvature of the manifold. Naive techniques have a fundamental weakness that the authors point out: performance deteriorates when the curvature parameter is updated before the model parameters are updated, invalidating the Riemannian gradients and projections. They address this by presenting a new ordered projection schema that re-projects the model parameters onto the new manifold after first updating the curvature and then projecting them to a stable tangent space.In this study, we address the crucial problem of instability in hyperbolic deep learning, particularly in the learning of the curvature of the manifold. Naive techniques have a fundamental weakness that the authors point out: performance deteriorates when the curvature parameter is updated before the model parameters are updated, invalidating the Riemannian gradients and projections. They address this by presenting a new ordered projection schema that re-projects the model parameters onto the new manifold after first updating the curvature and then projecting them to a stable tangent space.

Understanding Training Stability in Hyperbolic Neural Networks

2025/10/28 22:52

Abstract and 1. Introduction

  1. Related Work

  2. Methodology

    3.1 Background

    3.2 Riemannian Optimization

    3.3 Towards Efficient Architectural Components

  3. Experiments

    4.1 Hierarchical Metric Learning Problem

    4.2 Standard Classification Problem

  4. Conclusion and References

3.1 Background

\

3.2 Riemannian Optimization

Optimizers for Learned Curvatures In their hyperbolic learning library GeoOpt, Kochurov et al. [21] attempt to make the curvature of the hyperbolic space a learnable parameter. However, we have found no further work that makes proper use of this feature. Additionally, our empirical tests show that this approach often results in higher levels of instability and performance degradation. We attribute these issues to the naive implementation of curvature updates, which fails to incorporate the updated hyperbolic operations into the learning algorithm. Specifically, Riemannian optimizers rely on Riemannian projections of Euclidean gradients and projected momentums onto the tangent spaces of gradient vectors. These operations depend on the current properties of the manifold that houses the hyperbolic parameters being updated. From this, we can identify one main issue with the naive curvature learning approach.

\ The order in which parameters are updated is crucial. Specifically, if the curvature of the space is updated before the hyperbolic parameters, the Riemannian projections and tangent projections of the gradients and momentums become invalid. This happens because the projection operations start using the new curvature value, even though the hyperbolic parameters, hyperbolic gradients, and momentums have not yet been reprojected onto the new manifold.

\ To resolve this issue, we propose a projection schema and an ordered parameter update process. To sequentialize the optimization of model parameters, we first update all manifold and Euclidean parameters, and then update the curvatures after. Next, we parallel transport all Riemannian gradients and project all hyperbolic parameters to the tangent space at the origin using the old curvature value. Since this tangent space remains invariant when the manifold curvature changes, we can assume the points now lie on the tangent space of the new origin as well. We then re-project the hyperbolic tensors back onto the manifold using the new curvature value and parallel transport the Riemannian gradients to their respective parameters. This process can be illustrated in algorithm 1.

\

\ Riemannian AdamW Optimizer Recent works, especially with transformers, rely on the AdamW optimizer proposed by Loshchilov and Hutter [26] for training. As of current, there is no established Riemannian variant of this optimizer. We attempt to derive AdamW for the Lorentz manifold and argue a similar approach could be generalized for the Poincaré ball. The main difference between AdamW and Adam is the direct weight regularization which is more difficult to perform in the Lorentz space given the lack of an intuitive subtraction operation on the manifold. To resolve this, we model the regularized parameter instead as a weighted centroid with the origin. The regularization schema becomes:

\

\

\ As such, we propose a maximum distance rescaling function on the tangent of the origin to conform with the representational capacity of hyperbolic manifolds.

\

\ Specifically, we apply it when moving parameters across different manifolds. This includes moving from the Euclidean space to the Lorentz space and moving between Lorentz spaces of different curvatures. We also apply the scaling after Lorentz Boosts and direct Lorentz concatenations [31]. Additionally, we add this operation after the variance-based rescaling in the batchnorm layer. This is because we run into situations where adjusting to the variance pushes the points outside the radius during the operation.

3.3 Towards Efficient Architectural Components

Lorentz Convolutional Layer In their work, Bdeir et al. [1] relied on dissecting the convolution operation into a window-unfolding followed by a modified version of the Lorentz Linear layer by Chen et al. [3]. However, an alternative definition for the Lorentz Linear layer is offered by Dai et al. [5] based on a direct decomposition of the operation into a Lorentz boost and a Lorentz rotation. We follow the dissection scheme by Bdeir et al. [1] but rely on Dai et al. [5]s’ alternate definition of the Lorentz linear transformation. The core transition here would be moving from a matrix multiplication on the spatial dimensions followed by a reprojection, to learning an individual rotation operation and a Lorentz Boost.

\

\ out = LorentzBoost(TanhScaling(RotationConvolution(x)))

\ where TanhRescaling is the operation described in 2 and RotationConvolution is a normal convolution parameterized through the procedure in 2, where Orthogonalize is a Cayley transformation similar to [16]. We use the Cayley transformation in particular because it always results in an orthonormal matrix with a positive determinant which prevents the rotated point from being carried to the lower sheet of the hyperboloid.

\ Lorentz-Core Bottleneck Block In an effort to expand on the idea of hybrid hyperbolic encoders [1], we designed the Lorentz Core Bottleneck blocks for Hyperbolic Resnet-based models. This is similar to a standard Euclidean bottleneck block except we replace the internal 3x3 convolutional layer with our efficient convolutional layer as seen in figure 1. We are then able to benefit from a hyperbolic structuring of the embeddings in each block while maintaining the flexibility and speed of Euclidean models. We interpret this integration as a form of hyperbolic bias that can be adopted into Resnets without strict hyperbolic modeling.

Specifically, we apply it when moving parameters across different manifolds. This includes moving from the Euclidean space to the Lorentz space and moving between Lorentz spaces of different curvatures. We also apply the scaling after Lorentz Boosts and direct Lorentz concatenations [31]. Additionally, we add this operation after the variance-based rescaling in the batchnorm layer. This is because we run into situations where adjusting to the variance pushes the points outside the radius during the operation.

\

:::info Authors:

(1) Ahmad Bdeir, Data Science Department, University of Hildesheim (bdeira@uni-hildesheim.de);

(2) Niels Landwehr, Data Science Department, University of Hildesheim (landwehr@uni-hildesheim.de).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

“Bitcoin After Dark” ETF targets gains while the world sleeps

“Bitcoin After Dark” ETF targets gains while the world sleeps

The post “Bitcoin After Dark” ETF targets gains while the world sleeps appeared on BitcoinEthereumNews.com. A proposed exchange-traded fund is built to chase Bitcoin’s price action while the U.S. market is shut on Wall Street. The product is named the Nicholas Bitcoin and Treasuries AfterDark ETF, according to a filing dated December 9 was sent to the Securities and Exchange Commission. The fund opens Bitcoin-linked trades “after the U.S. financial markets close” and exits those positions “shortly after the next day’s open.” Trading is locked into the overnight window, and of course the fund will not hold Bitcoin directly. At least 80% of assets would be used on Bitcoin futures, exchange-traded products, other Bitcoin ETFs, and options tied to those ETFs and ETPs. The rest can sit in Treasuries. The filing said that the goal is to use price action that forms when the equity market is offline. Exposure stays inside listed products only. No spot tokens, no on-chain custody, and all positions reset each morning after the open. After-hours trading drives ETF flows Bespoke Investment Group tracked a test using the iShares Bitcoin Trust ETF (IBIT), and reported that “buying at the U.S. market close and selling at the next open since January 2024 produced a 222% gain.” The same test flipped to daytime only showed “a 40.5% loss from buying at the open and selling at the close.” That gap is the return spread the AfterDark ETF is built to target. Source: Bespoke Bitcoin last traded at $92,320, down nearly 1% on the day, down about 12% over the past month, and little changed since the start of the year. ETF filings across crypto keep expanding. Products tied to Aptos, Sui, Bonk, and Dogecoin are now in the pipeline. The pace picked up after President Donald Trump pushed for softer rules at the SEC and the Commodity Futures Trading Commission. After that push,…
Share
BitcoinEthereumNews2025/12/11 07:46
XRP Price Prediction: $2.35 Target Within 4 Weeks Despite Near-Term Consolidation

XRP Price Prediction: $2.35 Target Within 4 Weeks Despite Near-Term Consolidation

The post XRP Price Prediction: $2.35 Target Within 4 Weeks Despite Near-Term Consolidation appeared on BitcoinEthereumNews.com. Jessie A Ellis Dec 10, 2025 10:59 XRP price prediction points to $2.35 target by January 2025, though immediate consolidation around $2.10 pivot expected before breakout above $2.29 resistance. With XRP trading at $2.07 and showing mixed technical signals, this comprehensive Ripple forecast examines the convergence of analyst predictions and technical indicators to determine whether the cryptocurrency is positioned for a meaningful breakout or further consolidation. XRP Price Prediction Summary • XRP short-term target (1 week): $2.20 (+6.3%) – Testing immediate resistance at $2.29 • Ripple medium-term forecast (1 month): $2.25-$2.40 range – Consensus aligns with technical breakout levels • Key level to break for bullish continuation: $2.29 immediate resistance, then $2.70 strong resistance • Critical support if bearish: $2.00 psychological level, with $1.82 as strong support floor Recent Ripple Price Predictions from Analysts The latest XRP price prediction consensus from December 9th reveals cautious optimism among major analysts. Changelly’s bearish short-term outlook targets $2.09, citing weakening moving average trends, while LiteFinance projects a broader $2.00-$2.35 range over 12 months based on the current descending channel pattern. BTCC’s Ripple forecast offers the most bullish near-term view with a $2.20-$2.70 target range, assuming stable market conditions. This aligns closely with our technical analysis showing strong resistance at $2.70. The most intriguing long-term prediction comes from InvestingHaven, projecting $2.12-$4.48 for 2026, contingent on institutional adoption acceleration. The convergence around $2.20-$2.35 across multiple forecasts suggests this represents a realistic XRP price target for the coming month, supported by technical levels rather than speculative positioning. XRP Technical Analysis: Setting Up for Measured Breakout Current Ripple technical analysis reveals a cryptocurrency in consolidation mode, with the RSI at 44.24 indicating neither oversold nor overbought conditions. The MACD histogram’s positive 0.0057 reading suggests early bullish momentum is building,…
Share
BitcoinEthereumNews2025/12/11 08:02