Multimodal AI workloads are breaking Spark and Ray. See how Daft’s streaming model runs 7× faster and more reliably across audio, video, and image pipelines.Multimodal AI workloads are breaking Spark and Ray. See how Daft’s streaming model runs 7× faster and more reliably across audio, video, and image pipelines.

Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It

2025/11/03 13:19

Multimodal AI workloads break traditional data engines. They need to embed documents, classify images, and transcribe audio, not just run aggregations and joins. These multimodal workloads are tough: memory usage balloons mid-pipeline, processing requires both CPU and GPU, and a single machine can't handle the data volume.

This post provides a comprehensive comparison of Daft and Ray Data for multimodal data processing, examining their architectures and performance. Benchmarks across large-scale audio, video, document, and image workloads found Daft ran 2-7x faster than Ray Data and 4-18x faster than Spark, while finishing jobs reliably.

The Multimodal Data Challenge

Multimodal data processing presents unique challenges:

  1. Memory Explosions: A compressed image like a JPEG inflates 20x in memory once decoded. A single video file can be decoded into thousands of frames, each being megabytes.
  2. Heterogeneous Compute: These workloads stress CPU, GPU, and network simultaneously. Processing steps include resampling, feature extraction, transcription, downloading, decoding, resizing, normalizing, and classification.
  3. Data Volume: The benchmarked workloads included 113,800 audio files from Common Voice 17, 10,000 PDFs from Common Crawl, 803,580 images from ImageNet, and 1,000 videos from Hollywood2.

Introducing the Contenders

Daft

Daft is designed to handle petabyte-scale workloads with multimodal data (audio, video, images, text, embeddings) as first-class citizens.

Key features include:

  • Native multimodal operations: Built-in image decoding/encoding/cropping/resizing, text and image embedding/classification APIs, LLM APIs, text tokenization, cosine similarity, URL downloads/uploads, reading video to image frames
  • Declarative DataFrame/SQL API: With schema validation and query optimizer that automatically handles projection pushdowns, filter pushdowns, and join reordering - optimizations users get "for free" without manual tuning
  • Comprehensive I/O support: Native readers and writers for Parquet, CSV, JSON, Lance, Iceberg, Delta Lake, and WARC formats, tightly integrated with the streaming execution model

Ray Data

Ray Data is a data processing library built on top of Ray, a framework for building distributed Python applications.

Key features include:

  • Low-level operators: Provides operations like map_batches that work directly on PyArrow record batches or pandas DataFrames
  • Ray ecosystem integration: Tight integration with Ray Train for distributed training and Ray Serve for model serving

Architecture Deep Dive

Daft's Streaming Execution Model

Daft's architecture revolves around its Swordfish streaming execution engine. Data is always "in flight": batches flow through the pipeline as soon as they are ready. For a partition of 100k images, the first 1000 can be fed into model inference while the next 1000 are being downloaded or decoded. The entire partition never has to be fully materialized in an intermediate buffer.

Backpressure mechanism: If GPU inference becomes the bottleneck, the upstream steps automatically slow down so memory usage remains bounded.

Adaptive batch sizing: Daft shrinks batch sizes on memory-heavy operations like url_download or image_decode, keeping throughput high without ballooning memory usage.

Flotilla distributed engine: Daft's distributed runner deploys one Swordfish worker per node, enabling the same streaming execution model to scale across clusters.

Ray Data's Execution Model

Ray Data streams data between heterogeneous operations (e.g., CPU → GPU) that users define via classes or resource requests. Within homogeneous operations, Ray Data fuses sequential operations into the same task and executes them sequentially, which can cause memory issues without careful tuning of block sizes. You can work around this by using classes instead of functions in map/map_batches, but this materializes intermediates in Ray's object store, adding serialization and memory copy overhead. Ray's object store is by default only 30% of machine memory, and this limitation can lead to excessive disk spilling.

Performance Benchmarks

Based on recent benchmarks conducted on identical AWS clusters (8 x g6.xlarge instances with NVIDIA L4 GPUs, each with 4 vCPUs, 16 GB memory, and 100 GB EBS volume), here's how the two frameworks compare:

| Workload | Daft | Ray Data | Spark | |----|----|----|----| | Audio Transcription (113,800 files) | 6m 22s | 29m 20s (4.6x slower) | 25m 46s (4.0x slower) | | Document Embedding (10,000 PDFs) | 1m 54s | 14m 32s (7.6x slower) | 8m 4s (4.2x slower) | | Image Classification (803,580 images) | 4m 23s | 23m 30s (5.4x slower) | 45m 7s (10.3x slower) | | Video Object Detection (1,000 videos) | 11m 46s | 25m 54s (2.2x slower) | 3h 36m (18.4x slower) |

Why Such Large Performance Differences?

Several architectural decisions contribute to Daft's performance advantages:

  1. Native Operations vs Python UDFs: Daft has native multimodal expressions including image decoding/encoding/cropping/resizing, text and image embedding/classification APIs, LLM APIs, text tokenization, cosine similarity, URL downloads/uploads, and reading video to image frames. These native multimodal expressions are highly optimized in Daft. In Ray Data you have to write your own Python UDFs that use external dependencies like Pillow, numpy, spacy, huggingface, etc. This comes at the cost of extra data movement because these libraries each have their own data format.
  2. Memory Management - Streaming vs Materialization: Daft streams data through network, CPU, and GPU in a continuous stream without materializing entire partitions. Ray Data fuses sequential operations which can cause memory issues. While you can work around this by using classes to materialize intermediates in the object store, this adds serialization and memory copy overhead.
  3. Resource Utilization: Daft pipelines everything inside a single Swordfish worker, which has control over all resources of the machine. Data asynchronously streams from cloud storage, into the CPUs to run pre-processing steps, then into GPU memory for inference, and back out for results to be uploaded. CPUs, GPUs, and the network stay saturated together for optimal throughput. In contrast, Ray Data by default reserves a CPU core for I/O-heavy operations like downloading large videos, which can leave that core unavailable for CPU-bound processing work, requiring manual tuning of fractional CPU requests to optimize resource usage.

When to Choose Which?

Based on the benchmark results and architectural differences:

Daft shows significant advantages for:

  • Multimodal data processing (images, documents, video, audio)
  • Workloads requiring reliable execution without extensive tuning
  • Complex queries with joins, aggregations, and multiple transformations
  • Teams preferring DataFrame/SQL semantics

Ray Data may be preferred when:

  • You have tight integration needs with the Ray ecosystem (Ray Train, Ray Serve)
  • You need fine-grained control over CPU/GPU allocation per operation

What Practitioners Are Saying

Is Daft battle-tested enough for production?

When Tim Romanski of Essential AI set out to taxonomize 23.6 billion web documents from Common Crawl (24 trillion tokens), his team pushed Daft to its limits - scaling from local development to 32,000 requests per second per VM. As he shared in a panel discussion: "We pushed Daft to the limit and it's battle tested… If we had to do the same thing in Spark, we would have to have the JVM installed, go through all of its nuts and bolts just to get something running. So the time to get something running in the first place was a lot shorter. And then once we got it running locally, we just scaled up to multiple machines."

What gap does Daft fill in the Ray ecosystem?

CloudKitchens rebuilt their entire ML infrastructure around what they call the "DREAM stack" (Daft, Ray, poEtry, Argo, Metaflow). When selecting their data processing layer, they identified specific limitations with Ray Data and chose Daft to complement Ray's compute capabilities. As their infrastructure team explained, "one issue with the Ray library for data processing, Ray Data, is that it doesn't cover the full range of DataFrame/ETL functions and its performance could be improved." They chose Daft because "it fills the gap of Ray Data by providing amazing DataFrame APIs" and noted that "in our tests, it's faster than Spark and uses fewer resources."

How does Daft perform on even larger datasets?

A data engineer from ByteDance commented on Daft's 300K image processing demonstration, sharing his own experience with an even larger image classification workload: "Not just 300,000 images - we ran image classification evaluations on the ImageNet dataset with approximately 1.28 million images, and Daft was about 20% faster than Ray Data." Additionally, in a separate technical analysis of Daft's architecture, he praised its "excellent execution performance and resource efficiency" and highlighted how it "effortlessly enables streaming processing of large-scale image datasets."

Resources

  • Benchmarks for Multimodal AI Workloads - Primary source for performance data and architectural comparisons
  • Benchmark Code Repository - Open-source code to reproduce all benchmarks
  • Distributed Data Community Slack - Join the community to discuss with Daft developers and users

\

Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen service@support.mexc.com ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future

Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future

BitcoinWorld Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future In the dynamic world of decentralized computing, exciting developments are constantly shaping the future. Today, all eyes are on Akash Network, the innovative supercloud project, as it proposes a significant change to its tokenomics. This move aims to strengthen the value of its native token, AKT, and further solidify its position in the competitive blockchain space. The community is buzzing about a newly submitted governance proposal that could introduce a game-changing Burn Mint Equilibrium (BME) model. What is the Burn Mint Equilibrium (BME) for Akash Network? The core of this proposal revolves around a concept called Burn Mint Equilibrium, or BME. Essentially, this model is designed to create a balance in the token’s circulating supply by systematically removing a portion of tokens from existence. For Akash Network, this means burning an amount of AKT that is equivalent to the U.S. dollar value of fees paid by network users. Fee Conversion: When users pay for cloud services on the Akash Network, these fees are typically collected in various cryptocurrencies or stablecoins. AKT Equivalence: The proposal suggests converting the U.S. dollar value of these collected fees into an equivalent amount of AKT. Token Burn: This calculated amount of AKT would then be permanently removed from circulation, or ‘burned’. This mechanism creates a direct link between network utility and token supply reduction. As more users utilize the decentralized supercloud, more AKT will be burned, potentially impacting the token’s scarcity and value. Why is This Proposal Crucial for AKT Holders? For anyone holding AKT, or considering investing in the Akash Network ecosystem, this proposal carries significant weight. Token burning mechanisms are often viewed as a positive development because they can lead to increased scarcity. When supply decreases while demand remains constant or grows, the price per unit tends to increase. Here are some key benefits: Increased Scarcity: Burning tokens reduces the total circulating supply of AKT. This makes each remaining token potentially more valuable over time. Demand-Supply Dynamics: The BME model directly ties the burning of AKT to network usage. Higher adoption of the Akash Network supercloud translates into more fees, and thus more AKT burned. Long-Term Value Proposition: By creating a deflationary pressure, the proposal aims to enhance AKT’s long-term value, making it a more attractive asset for investors and long-term holders. This strategic move demonstrates a commitment from the Akash Network community to optimize its tokenomics for sustainable growth and value appreciation. How Does BME Impact the Decentralized Supercloud Mission? Beyond token value, the BME proposal aligns perfectly with the broader mission of the Akash Network. As a decentralized supercloud, Akash provides a marketplace for cloud computing resources, allowing users to deploy applications faster, more efficiently, and at a lower cost than traditional providers. The BME model reinforces this utility. Consider these impacts: Network Health: A stronger AKT token can incentivize more validators and providers to secure and contribute resources to the network, improving its overall health and resilience. Ecosystem Growth: Enhanced token value can attract more developers and projects to build on the Akash Network, fostering a vibrant and diverse ecosystem. User Incentive: While users pay fees, the potential appreciation of AKT could indirectly benefit those who hold the token, creating a circular economy within the supercloud. This proposal is not just about burning tokens; it’s about building a more robust, self-sustaining, and economically sound decentralized cloud infrastructure for the future. What Are the Next Steps for the Akash Network Community? As a governance proposal, the BME model will now undergo a period of community discussion and voting. This is a crucial phase where AKT holders and network participants can voice their opinions, debate the merits, and ultimately decide on the future direction of the project. Transparency and community engagement are hallmarks of decentralized projects like Akash Network. Challenges and Considerations: Implementation Complexity: Ensuring the burning mechanism is technically sound and transparent will be vital. Community Consensus: Achieving broad agreement within the diverse Akash Network community is key for successful adoption. The outcome of this vote will significantly shape the tokenomics and economic model of the Akash Network, influencing its trajectory in the rapidly evolving decentralized cloud landscape. The proposal to introduce a Burn Mint Equilibrium model represents a bold and strategic step for Akash Network. By directly linking network usage to token scarcity, the project aims to create a more resilient and valuable AKT token, ultimately strengthening its position as a leading decentralized supercloud provider. This move underscores the project’s commitment to innovative tokenomics and sustainable growth, promising an exciting future for both users and investors in the Akash Network ecosystem. It’s a clear signal that Akash is actively working to enhance its value proposition and maintain its competitive edge in the decentralized future. Frequently Asked Questions (FAQs) 1. What is the main goal of the Burn Mint Equilibrium (BME) proposal for Akash Network? The primary goal is to adjust the circulating supply of AKT tokens by burning a portion of network fees, thereby creating deflationary pressure and potentially enhancing the token’s long-term value and scarcity. 2. How will the amount of AKT to be burned be determined? The proposal suggests burning an amount of AKT equivalent to the U.S. dollar value of fees paid by users on the Akash Network for cloud services. 3. What are the potential benefits for AKT token holders? Token holders could benefit from increased scarcity of AKT, which may lead to higher demand and appreciation in value over time, especially as network usage grows. 4. How does this proposal relate to the overall mission of Akash Network? The BME model reinforces the Akash Network‘s mission by creating a stronger, more economically robust ecosystem. A healthier token incentivizes network participants, fostering growth and stability for the decentralized supercloud. 5. What is the next step for this governance proposal? The proposal will undergo a period of community discussion and voting by AKT token holders. The community’s decision will determine if the BME model is implemented on the Akash Network. If you found this article insightful, consider sharing it with your network! Your support helps us bring more valuable insights into the world of decentralized technology. Stay informed and help spread the word about the exciting developments happening within Akash Network. To learn more about the latest crypto market trends, explore our article on key developments shaping decentralized cloud solutions price action. This post Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future first appeared on BitcoinWorld.
Paylaş
Coinstats2025/09/22 21:35