Building for large systems and long-running background jobs.Credit: Ilias Chebbi on Unsplash Months ago, I assumed the role that required building infrastBuilding for large systems and long-running background jobs.Credit: Ilias Chebbi on Unsplash Months ago, I assumed the role that required building infrast

Building Spotify for Sermons.

2025/12/11 21:15

Building for large systems and long-running background jobs.

Credit: Ilias Chebbi on Unsplash

Months ago, I assumed the role that required building infrastructure for media(audio) streaming. But beyond serving audio as streamable chunks, there were long-running media processing jobs and an extensive RAG pipeline that catered to transcription, transcoding, embedding, and sequential media updates. Building an MVP with a production mindset had us reiterate till we achieved a seamless system. Our approach has been one where we integrated features and the underlying stack of priorities.

Of Primary concern:

Over the course of building, each iteration came as a response to immediate and often “encompassing” need. Initial concern was queuing jobs, which readily sufficed with Redis; we simply fired and forgot. Bull MQ in the NEST JS framework gave us an even better control over retries, backlogs, and the dead-letter queue. Locally and with a few payloads in production, we got the media flow right. We were soon burdened by the weight of Observability:
Logs → Record of jobs (requests, responses, errors).
Metrics → How much / how often these jobs run, fail, complete, etc.
Traces → The path a job took across services (functions/methods called within the flow path).

You can solve some of these by designing APIs and building a custom dashboard to plug them into, but the problem of scalability will suffice. And in fact, we did design the APIs.

Building for Observability

The challenge of managing complex, long-running backend workflows, where failures must be recoverable, and state must be durable, Inngest became our architectural salvation. It fundamentally reframed our approach: each long-running background job becomes a background function, triggered by a specific event.

For instance, an Transcription.request event will trigger a TranscribeAudio function. This function might contain step-runs for: fetch_audio_metadata, deepgram_transcribe, parse_save_trasncription, and notify_user.

Deconstructing the Workflow: The Inngest Function and Step-runs

The core durability primitive is the step-runs. A background function is internally broken down into these step-runs, each containing a minimal, atomic block of logic.

  • Atomic Logic: A function executes your business logic step by step. If a step fails, the state of the entire run is preserved, and the run can be retried. This restarts the function from the beginning. Individual steps or step-runs cannot be retried in isolation.
  • Response Serialization: A step-run is defined by its response. This response is automatically serialized, which is essential for preserving complex or strongly-typed data structures across execution boundaries. Subsequent step-runs can reliably parse this serialized response, or logic can be merged into a single step for efficiency.
  • Decoupling and Scheduling: Within a function, we can conditionally queue or schedule new, dependent events, enabling complex fan-out/fan-in patterns and long-term scheduling up to a year. Errors and successes at any point can be caught, branched, and handled further down the workflow.

Inngest function abstract:

import { inngest } from 'inngest-client';

export const createMyFunction = (dependencies) => {
return inngest.createFunction(
{
id: 'my-function',
name: 'My Example Function',
retries: 3, // retry the entire run on failure
concurrency: { limit: 5 },
onFailure: async ({ event, error, step }) => {
// handle errors here
await step.run('handle-error', async () => {
console.error('Error processing event:', error);
});
},
},
{ event: 'my/event.triggered' },
async ({ event, step }) => {
const { payload } = event.data;

// Step 1: Define first step
const step1Result = await step.run('step-1', async () => {
// logic for step 1
return `Processed ${payload}`;
});

// Step 2: Define second step
const step2Result = await step.run('step-2', async () => {
// logic for step 2
return step1Result + ' -> step 2';
});

// Step N: Continue as needed
await step.run('final-step', async () => {
// finalization logic
console.log('Finished processing:', step2Result);
});

return { success: true };
},
);
};

The event-driven model of Inngest provides granular insight into every workflow execution:

  • Comprehensive Event Tracing: Every queued function execution is logged against its originating event. This provides a clear, high-level trail of all activities related to a single user action.
  • Detailed Run Insights: For each function execution (both successes and failures), Inngest provides detailed logs via its ack (acknowledge) and nack (negative acknowledgment) reporting. These logs include error stack traces, full request payloads, and the serialized response payloads for every individual step-run.
  • Operational Metrics: Beyond logs, we gained critical metrics on function health, including success rates, failure rates, and retry count, allowing us to continuously monitor the reliability and latency of our distributed workflows.

Building for Resilience

The caveat to relying on pure event processing is that while Inngest efficiently queues function executions, the events themselves are not internally queued in a traditional messaging broker sense. This absence of an explicit event queue can be problematic in high-traffic scenarios due to potential race conditions or dropped events if the ingestion endpoint is overwhelmed.

To address this and enforce strict event durability, we implemented a dedicated queuing system as a buffer.

AWS Simple Queue System (SQS) was the system of choice (though any robust queuing system is doable), given our existing infrastructure on AWS. We architected a two-queue system: a Main Queue and a Dead Letter Queue (DLQ).

We established an Elastic Beanstalk (EB) Worker Environment specifically configured to consume messages directly from the Main Queue. If a message in the Main Queue fails to be processed by the EB Worker a set number of times, the Main Queue automatically moves the failed message to the dedicated DLQ. This ensures no event is lost permanently if it fails to trigger or be picked up by Inngest. This worker environment differs from a standard EB web server environment, as its sole responsibility is message consumption and processing (in this case, forwarding the consumed message to the Inngest API endpoint).

UNDERSTANDING LIMITS AND SPECIFICATIONS

An understated and rather pertinent part of building enterprise-scale infrastructure is that it consumes resources, and they are long-running. Microservices architecture provides scalability per service. Storage, RAM, and timeouts of resources will come into play. Our specification for AWS instance type, for example, moved quickly from t3.micro to t3.small, and is now pegged at t3.medium. For long-running, CPU-intensive background jobs, horizontal scaling with tiny instances fails because the bottleneck is the time it takes to process a single job, not the volume of new jobs entering the queue.

Jobs or functions like transcoding, embedding are typically CPU-bound and Memory-bound. CPU-bound because they require sustained, intense CPU usage, and Memory-Bound because they often require substantial RAM to load large models or handle large files or payloads efficiently.

Ultimately, this augmented architecture, placing the durability of SQS and the controlled execution of an EB Worker environment directly upstream of the Inngest API, provided essential resiliency. We achieved strict event ownership, eliminated race conditions during traffic spikes, and gained a non-volatile dead letter mechanism. We leveraged Inngest for its workflow orchestration and debugging capabilities, while relying on AWS primitives for maximum message throughput and durability. The resulting system is not only scalable but highly auditable, successfully translating complex, long-running backend jobs into secure, observable, and failure-tolerant micro-steps.


Building Spotify for Sermons. was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Solana Faces Massive DDoS Attack Without Performance Issues

Solana Faces Massive DDoS Attack Without Performance Issues

Solana successfully countered a major DDoS attack without affecting users. The network maintained transaction confirmation times around 450 milliseconds. Continue
Share
Coinstats2025/12/17 13:08
A ‘Star Wars’ Actor Rewrites The Entire New Trilogy They Starred In

A ‘Star Wars’ Actor Rewrites The Entire New Trilogy They Starred In

The post A ‘Star Wars’ Actor Rewrites The Entire New Trilogy They Starred In appeared on BitcoinEthereumNews.com. It feels like we don’t hear all that much from actor John Boyega that much, outside of when he’s talking about Star Wars as of late. And in a recent Popverse interview, he went so far as to rework the entire trilogy, in terms of what he’d do differently, as he’s been vocal about what he believed went wrong with the original. Here’s what he said: “It would be mad. First of all, we’re not getting rid of Han Solo, Luke Skywalker, all these people. We’re not doing that. The first thing we’re going to do is fulfill their story, fulfill their legacy. We’re going to make a good moment of handing on the baton.” “Luke Skywalker wouldn’t be disappearing on a rock … Hell no. Standing there and he’s, like, a projector? I would want to give those characters way more way more” By the end of the trilogy, all three major Star Wars leads are dead. Han Solo killed by his son, Kylo Ren. Luke Skywalker fading into the ether after force projecting himself to face Kylo Ren. Leia had to be written off due to the tragic death of Carrie Fisher during the production of the trilogy. So Boyega would halt at least the first two deaths, as it did come off as strange that “passing the baton” was mainly killing all the big characters. He continues: “Our new characters will not be overpowered in these movies. They won’t just grab stuff and know what to do with it… No. You’ve got to struggle like every other character in this franchise.” This is likely a reference to both Rey and himself. Rey was frequently criticized as a “Mary Sue,” possessing immense power and skill in everything from flying to fighting to the force despite growing up as…
Share
BitcoinEthereumNews2025/09/25 02:37
Discover Mono Protocol: The $2M-Backed Project Built to Simplify Development, Launch Faster, and Monetize Every Transaction

Discover Mono Protocol: The $2M-Backed Project Built to Simplify Development, Launch Faster, and Monetize Every Transaction

Developing in Web3 has often meant navigating fragmented systems, high transaction costs, and complex cross-chain infrastructure. Mono Protocol introduces a new approach that brings clarity and efficiency to this landscape. It focuses on three powerful outcomes: simplify development, launch faster, and monetize every transaction.  By unifying balances, streamlining execution, and integrating monetization at the core, […]
Share
Cryptopolitan2025/09/18 21:28