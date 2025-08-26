No SAM, No CLIP, No Problem: How Open‑YOLO 3D Segments Faster

By: Hackernoon
2025/08/26 16:10
YOLO
YOLO$0.00000000918+0.84%
OpenGPU
OPEN$0.0000000751-2.72%

:::info Authors:

(1) Mohamed El Amine Boudjoghra, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ([email protected]);

(2) Angela Dai, Technical University of Munich (TUM) ([email protected]);

(3) Jean Lahoud, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ( [email protected]);

(4) Hisham Cholakkal, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ([email protected]);

(5) Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Aalto University ([email protected]);

(6) Salman Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University ([email protected]);

(7) Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University ([email protected]).

:::

Abstract and 1 Introduction

  1. Related works
  2. Preliminaries
  3. Method: Open-YOLO 3D
  4. Experiments
  5. Conclusion and References

A. Appendix

Abstract

Recent works on open-vocabulary 3D instance segmentation show strong promise, but at the cost of slow inference speed and high computation requirements. This high computation cost is typically due to their heavy reliance on 3D clip features, which require computationally expensive 2D foundation models like Segment Anything (SAM) and CLIP for multi-view aggregation into 3D. As a consequence, this hampers their applicability in many real-world applications that require both fast and accurate predictions. To this end, we propose a fast yet accurate open-vocabulary 3D instance segmentation approach, named Open-YOLO 3D, that effectively leverages only 2D object detection from multi-view RGB images for open-vocabulary 3D instance segmentation. We address this task by generating class-agnostic 3D masks for objects in the scene and associating them with text prompts. We observe that the projection of class-agnostic 3D point cloud instances already holds instance information; thus, using SAM might only result in redundancy that unnecessarily increases the inference time. We empirically find that a better performance of matching text prompts to 3D masks can be achieved in a faster fashion with a 2D object detector. We validate our Open-YOLO 3D on two benchmarks, ScanNet200 and Replica, under two scenarios: (i) with ground truth masks, where labels are required for given object proposals, and (ii) with class-agnostic 3D proposals generated from a 3D proposal network. Our OpenYOLO 3D achieves state-of-the-art performance on both datasets while obtaining up to ∼16× speedup compared to the best existing method in the literature. On ScanNet200 val. set, our Open-YOLO 3D achieves mean average precision (mAP) of 24.7% while operating at 22 seconds per scene. Code and model are available at github.com/aminebdj/OpenYOLO3D

\

1 Introduction

3D instance segmentation is a computer vision task that involves the prediction of masks for individual objects in a 3D point cloud scene. It holds significant importance in fields like robotics and augmented reality. Due to its diverse applications, this task has garnered increasing attention in recent years. Researchers have long focused on methods that typically operate within a closed-set framework, limiting their ability to recognize objects not present in the training data. This constraint poses challenges, particularly when novel objects must be identified or categorized in unfamiliar environments. Recent methods [34, 42] address the problem of novel class segmentation, but they suffer from slow inference that ranges from 5 minutes for small scenes to 10 minutes for large scenes

\ Figure 1: Open-vocabulary 3D instance segmentation with our Open-YOLO 3D. The proposed Open-YOLO 3D is capable of segmenting objects in a zero-shot manner. Here, We show the output for a ScanNet200 [38] scene with various prompts, where our model yields improved performance compared to the recent Open3DIS [34]. We show zoomed-in images of hidden predicted instances in the colored boxes. Additional results are in Figure 4 and suppl. material.

\ due to their reliance on computationally heavy foundation models like SAM [23] and CLIP [55] along with heavy computation for lifting 2D CLIP feature to 3D.

\ Open-vocabulary 3D instance segmentation is important for robotics tasks such as, material handling where the robot is expected to perform operations from text-based instructions like moving specific products, loading and unloading goods, and inventory management while being fast in the decision-making process. Although state-of-the-art open-vocabulary 3D instance segmentation methods show high promise in terms of generalizability to novel objects, they still operate in minutes of inference time due to their reliance on heavy foundation models such as SAM. Motivated by recent advances in 2D object detection [7], we look into an alternative approach that leverages fast object detectors instead of utilizing computationally expensive foundation models.

\ This paper proposes a novel open-vocabulary 3D instance segmentation method, named Open-YOLO 3D, that utilizes efficient, joint 2D-3D reasoning, using 2D bounding box predictions to replace computationally-heavy segmentation models. We employ an open-vocabulary 2D object detector to generate bounding boxes with their class labels for all frames corresponding to the 3D scene; on the other side, we utilize a 3D instance segmentation network to generate 3D class-agnostic instance masks for the point clouds, which proves to be much faster than 3D proposal generation methods from 2D instances [34, 32]. Unlike recent methods [42, 34] which use SAM and CLIP to lift 2D clip features to 3D for prompting the 3D mask proposal, we propose an alternative approach that relies on the bounding box predictions from 2D object detectors which prove to be significantly faster than CLIP-based methods. We utilize the predicted bounding boxes in all RGB frames corresponding to the point cloud scene to construct a Low Granularity (LG) label map for every frame. One LG label map is a two-dimensional array with the same height and width as the RGB frame, with the bounding box areas replaced by their predicted class label. Next, we use intrinsic and extrinsic parameters to project the point cloud scene onto their respective LG label maps with top-k visibility for final class prediction. We present an example output of our method in Figure 1. Our contributions are following:

\ • We introduce a 2D object detection-based approach for open-vocabulary labeling of 3D instances, which greatly improves the efficiency compared to 2D segmentation approaches.

\ • We propose a novel approach to scoring 3D mask proposals using only bounding boxes from 2D object detectors.

\ • Our Open-YOLO 3D achieves superior performance on two benchmarks, while being considerably faster than existing methods in the literature. On ScanNet200 val. set, our Open-YOLO 3D achieves an absolute gain of 2.3% at mAP50 while being ∼16x faster compared to the recent Open3DIS [34].

\

:::info This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

U.S. Senators Release Legislative Principles for Digital Asset Market Structure

U.S. Senators Release Legislative Principles for Digital Asset Market Structure

PANews reported on June 24 that according to the official website of U.S. Senator Cynthia Lummis of Wyoming, and Tim Scott, Chairman of the Senate Banking Committee, jointly issued the
U
U$0.0123-1.60%
Juneo Supernet
JUNE$0.061+10.10%
Share
PANews2025/06/24 22:51
Share
Bitcoin programmable layer project Hemi completes $15 million in financing, led by YZi Labs and others

Bitcoin programmable layer project Hemi completes $15 million in financing, led by YZi Labs and others

PANews reported on August 26th that Hemi, the Bitcoin programmable layer project, has secured $15 million in a new funding round led by YZi Labs, Republic Digital, and HyperChain Capital, with participation from Breyer Capital, Big Brain Holdings, Crypto.com, DNA Fund, Selini Capital, Protein Capital, Quantstamp, and Web3.com Ventures. This brings the total raised to $30 million, which will be used to advance the development of a network that combines Bitcoin's security with Ethereum's smart contract capabilities. This round of funding will support ecosystem expansion and subsequent token generation activities. The core of the Hemi stack is the Hemi Virtual Machine (hVM), designed to embed a full Bitcoin node within the EVM. Through cross-chain "tunneling" and a proof-of-stake consensus mechanism, it enables operations such as lending and asset portfolios on the Bitcoin network. Co-founder Jeff Garzik (an early Bitcoin core developer) stated that Bitcoin doesn't need a refactor, but rather supporting tools to unlock its potential. The project claims to have over 100,000 verified users and 400,000 community members, and has integrated or collaborated with over 70 projects, including Sushi, LayerZero, and MetaMask.
Threshold
T$0.01592-2.33%
CROSS
CROSS$0.22803+3.80%
TokenFi
TOKEN$0.01295-2.04%
Share
PANews2025/08/26 20:05
Share
IOTA Miner Opens Cloud Mining for XRP and BTC Users

IOTA Miner Opens Cloud Mining for XRP and BTC Users

In the rapidly changing cryptocurrency market, simplicity, efficiency, and stable returns remain paramount for investors. For those seeking passive income with minimal investment, cloud mining offers a promising option. This article will explain the principles and unique advantages of cloud mining, focusing on the industry-leading IOTA Miner platform. This platform specializes in cloud mining services for major cryptocurrencies like Bitcoin. Offering security, transparency, and a low barrier to entry, it helps investors earn daily profits. The Unique Charm of Cloud Mining Cloud mining has long been favored by investors worldwide for its ease of operation, low barriers to entry, and stable returns. Compared to traditional Bitcoin mining, cloud mining requires no expensive mining machines, complex technical requirements, or 24/7 maintenance. With trusted platforms like IOTA Miner, users can remotely rent computing power, which is then automatically processed by professional data centers to mine major cryptocurrencies like Bitcoin and Ethereum. This significantly reduces equipment and maintenance costs while allowing investors to share in their daily returns. Whether beginners or veterans looking to expand their asset portfolio, cloud mining offers opportunities for passive income. IOTA Miner: The Perfect Combination of Laziness and Profit IOTA Miner takes cloud mining to a new level of convenience, making it an ideal choice for beginners of major cryptocurrencies like Bitcoin and Ethereum. The platform’s user-friendly interface allows even beginners to quickly get started and easily begin their profitable journey. With IOTA Miner, laziness is a strategic move—no need to purchase expensive mining machines, endure the noise and heat, or worry about household electricity consumption. Leveraging professional mining farms worldwide and utilizing renewable energy sources like solar and wind power, the platform not only effectively reduces mining costs but also maintains environmental protection by feeding excess power back into the grid, achieving truly green mining. Over 9 million users worldwide have chosen and trusted IOTA Miner for its combination of stable returns and top-tier security. Without expensive equipment, users simply sign a contract via their computer or mobile phone to remotely rent powerful computing power, easily mining major cryptocurrencies like Bitcoin and Ethereum, and automatically receive daily returns. No barriers to entry, lower risk—IOTA Miner is leading the new trend in cloud mining, making it easy for anyone to start their own passive income journey. Profit Potential IOTA Miner makes it easy to realize your dream of passive cryptocurrency income without expensive hardware or complex technology. Whether you’re a Bitcoin, Ethereum, or XRP holder, you can leverage IOTA Miner’s high-performance computing power to steadily grow your wealth. Safe and Reliable Cloud Mining Platform In the volatile cryptocurrency market, security and trust are paramount. IOTA Miner offers industry-leading security and transparent operations to maximize the protection of user funds and returns. With its legal and compliant system and the trust of millions of users worldwide, it has become the choice of both novice and experienced investors, allowing you to focus on returns, not risks. Why Choose IOTA Miner Cloud Mining Signup Bonus: Sign up and receive a $15 newbie bonus, plus a steady $0.60 daily profit. Diversified Contracts: We offer a variety of hashrate contracts to meet different investment objectives and risk profiles. Stable Passive Income: Daily profits are automatically deposited into your wallet, with no additional effort required. Zero Technical Requirements: No hardware purchases or maintenance required. Global Support: Compatible with a wide range of major cryptocurrencies (BTC, ETH, XRP, DOGE, SOL, LTC, USDT, USDC, and more). Top-tier Security: Secure your funds and data with McAfee® and Cloudflare® protection. How to Get Started with IOTA Miner Cloud Mining Register an Account Visit the official IOTA Miner website to create a free personal account. Choose a Mining Plan Choose the cryptocurrency cloud mining plan that suits your profit goals. Start Mining Now No hardware purchases required; IOTA Miner’s high-performance computing infrastructure will automatically run for you. Earn Daily Income Passive income is automatically settled daily, allowing your assets to steadily grow in value. Diverse IOTA Miner Cloud Mining Contracts IOTA Miner offers a variety of flexible cloud mining contracts to meet the needs of various investors. Whether you’re a cryptocurrency novice or an experienced investor, you’ll find a solution that’s right for you. These contracts offer stable returns, lower risk, and easy access to ongoing passive income. Join IOTA Miner and leverage the platform’s advanced mining technology and renewable energy support for an efficient and environmentally friendly mining experience. Summary IOTA Miner simplifies the complex mining process, allowing you to profit daily without having to maintain mining equipment. It combines user-friendliness, security, and stable returns, providing a convenient online channel for global investors. Join IOTA Miner and download the mobile app today
Bitcoin
BTC$109,819.02-1.38%
MIOTAC
IOTA$0.1921-1.98%
XRP
XRP$2.9102-1.12%
Share
CryptoNews2025/08/26 20:06
Share

Trending News

More

U.S. Senators Release Legislative Principles for Digital Asset Market Structure

Bitcoin programmable layer project Hemi completes $15 million in financing, led by YZi Labs and others

IOTA Miner Opens Cloud Mining for XRP and BTC Users

Hellish bull market: Star traders lose 700 million in floating profits, and survival is not based on luck

SharpLink, a listed company, increased its holdings by 56,533 ETH, bringing its total ETH holdings to approximately $3.7 billion.