Buy Crypto Markets Spot FuturesUSOIL Earn Event Centre

WormHole is a novel algorithm that bridges the gap between traversal-based and index-based graph search methods. It enables fast, scalable shortest-path queries on massive real-world networks without requiring full graph access or large preprocessing overhead. By exploiting core-periphery structures, WormHole constructs a compact sublinear index, delivers near-exact paths, and can be combined with existing methods for even faster results.WormHole is a novel algorithm that bridges the gap between traversal-based and index-based graph search methods. It enables fast, scalable shortest-path queries on massive real-world networks without requiring full graph access or large preprocessing overhead. By exploiting core-periphery structures, WormHole constructs a compact sublinear index, delivers near-exact paths, and can be combined with existing methods for even faster results.

Bridging the Gap Between BFS and Indexing for Large Graphs

Author: Hackernoon

Source: Hackernoon

2025/10/15 19:00

7 min read

REAL$0.05702+5.69%

CORE$0.08026+0.41%

NEAR$1.2754+4.20%

For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

:::info Authors:

(1) Talya Eden, Bar-Ilan University (talyaa01@gmail.com);

(2) Omri Ben-Eliezer, MIT (omrib@mit.edu);

(3) C. Seshadhri, UC Santa Cruz (sesh@ucsc.edu).

:::

Table of Links

Abstract and 1. Introduction

1.1 Our Contribution

1.2 Setting

1.3 The algorithm

Related Work
Algorithm

3.1 The Structural Decomposition Phase

3.2 The Routing Phase

3.3 Variants of WormHole
Theoretical Analysis

4.1 Preliminaries

4.2 Sublinearity of Inner Ring

4.3 Approximation Error

4.4 Query Complexity
Experimental Results

5.1 WormHole𝐸, WormHole𝐻 and BiBFS

5.2 Comparison with index-based methods

5.3 WormHole as a primitive: WormHole𝑀

References

ABSTRACT

Computing distances and finding shortest paths in massive realworld networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS), which have no preprocessing step but are slow on individual distance inquiries. On the other hand are indexing-based approaches, which create and maintain a large index. This allows for answering individual inquiries very fast; however, index creation is prohibitively expensive even for moderately large networks. For a graph with 30 million edges, the index created by the state-of-the-art is about 40 gigabytes. We seek to bridge these two extremes: quickly answer distance inquiries without the need for costly preprocessing.

\ WormHole offers several additional advantages over existing methods: (i) it does not require reading the whole graph and can thus be used in settings where access to the graph is ratelimited; (ii) unlike the vast majority of index-based algorithms, it returns paths, not just distances; and (iii) for faster inquiry times, it can be combined effectively with other index-based solutions, by running them only on the sublinear core.

1 INTRODUCTION

Scalable computation of distances and shortest paths in a large network is one of the most fundamental algorithmic challenges in graph mining and graph learning tasks, with applications across science and engineering. Examples of such applications include the identification of important genes or species in biological and ecological networks [18], driving directions in road networks [1–3], redistribution of task processing from mobile devices to cloud [59], computer network design and security [24, 30, 31, 58], and identifying a set of users with the maximum influence in a social network [32, 56], among many others. Thus, a long line of work [5, 21, 25, 28, 62] has developed over the years, constructing scalable algorithms for distance computation for a variety of real-life tasks.

\ The simplest methods for answering a shortest path inquiry (𝑠,𝑡) use traversals, among which the most basic is a breadth first search (BFS) starting from 𝑠 until we reach 𝑡. However, the inquiry time for BFS is linear in the network size, which is much too slow for real-world networks.1 A popular modification, Bidirectional BFS(BiBFS), runs BFS fromboth𝑠 and𝑡, alternatingbetween the two, until both ends meet.It has well been observed in the literature that BiBFS performs surprisingly well for shortest path inquiries on a wide range of networks (see, e.g., [8, 12, 62] and the many references within). Because BiBFS does not require any prior knowledge on the network structure, it is suitable when the number of shortest path inquiries being made is relatively small. However, pure traversal-based approaches do not scale well when one is required to answer a large number of shortest pair inquiries. As we show in Figure 1, BiBFS ends up seeing the whole graph within just a few hundred inquiries.

\ Alongline ofmodern approaches tackles the distance computation problem in a fundamentally different manner, by preprocessing the network and creating an index. The index, in turn, supports extremely fast real-time computation of distances. This line of work has been been investigated extensively in recent years, with Pruned Landmark Labeling (PLL, Akiba et al. [5]) being perhaps the most influential approach.

\ In virtually allindex-based methods, pre-processinginvolves choosing a subset L of nodes, called landmarks; computing all shortest paths among them; and keeping an index of the distance of every node in the network to every landmark. Thus, the space requirement for the index is at least of order 𝑛· |L|, where 𝑛 is the total number of nodes. Naively, this memory requirement can be as bad as quadratic in 𝑛. Despite several improvements to beat the quadratic footprint, existing hub

\ and landmark-based approaches methods continue to have high cost, and can become infeasible even for moderately-sized graphs [40].

\ Notably, most index-based approaches only return distances in the graph, and not the paths themselves. The first concrete systematic investigation of solutions outputting shortest paths was made by Zhang et al. [62]. Their work points out that while existing index-based solutions can be adapted to also output shortest paths, these adaptations incur a very high additional space cost on top of that required for distance computations. The authors of [62] then proposed a new approach called monotonic landmark labelling (MLL) for saving on the index construction space cost. While their algorithm is the current state of the art for this problem, it is again index-based, meaning that the preprocessing cost is still rather expensive. Improving the computational complexity of the construction phase remains a fundamental challenge.

\ Beyond the computational constraints, it is sometimes simply unrealistic to assume access to the whole network; examples of scenarios where access is only given via limited query access include, e.g., social network analysis through external APIs [10], page downloads in web graphs [14], modern lightweight monitoring solutions in enterprise security [26, 60], and state space exploration in software testing, reinforcement learning and robotics [27], among many others. Existing indexing-based approaches are unsuitable for these scenarios since they require reading the whole graph as a prerequisite. Traversal-based methods such as BiBFS are suitable, but as mentioned they do not scale well if one requires multiple distance computations.

\ The limitations of indexing-based and traversal-based methods give rise to a natural question of whether there is a middle ground solution, with preprocessing that is more efficient than in index-based approaches and inquiry time that is faster than in traversal-based approaches. Namely, we ask:

\ Is it possible to answer shortest-path inquiries in large networks very quickly, without constructing an expensive index, or even seeing the whole graph?

\ A general solution for any arbitrary graph is perhaps impossible; however, real-world social and information networks admit order and structure that can be exploited. In this work we address this question positively for a slightly relaxed version of the shortest path problem on such networks. Inspired by the core-periphery structure of large networks [43, 50, 52, 63], we provide a solution which constructs a sublinearly-sized index and answers inquiries by querying a strictly sublinear subset of vertices. In particular, our solution does not need to access the whole network. The algorithm returns approximate shortest paths, where the approximation error is additive and very small (almost always zero or one). In practice, the setup time is negligible (a few minutes for billion-edges graphs), and inquiry times improve on those of BiBFS. Moreover, it can be easily combined with existing index-based solutions, to further improve on the inquiry times. We also include theoretical results that shed light on the empirical performance.

:::info This paper is available on arxiv under CC BY 4.0 license.

:::

[1] To avoid confusion, throughout this paper we use the term inquiry to indicate a request (arriving as an input in real-time to our data structure) to compute a short path SP(𝑠,𝑡) between 𝑠 and 𝑡. The term query refers to the act, taken by the algorithm itself, of retrieving information about a specific node. For more details on the query model we consider, see §1.2.)

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.