Kubernetes v1.34 introduces a new Pod Replacement Policy for Jobs, giving engineers deterministic control over how failed Pods are rescheduled. This improves reliability, performance, and data-locality for batch workloads at scale.Kubernetes v1.34 introduces a new Pod Replacement Policy for Jobs, giving engineers deterministic control over how failed Pods are rescheduled. This improves reliability, performance, and data-locality for batch workloads at scale.

Kubernetes Adds Predictable Pod Replacement for Jobs in v1.34 Release

2025/12/08 04:10
5 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Kubernetes has become the go-to platform for running not just long-lived services, but also batch workloads like data processing, ETL pipelines, machine learning training, CI/CD pipelines, and scientific simulations. These workloads typically rely on the Job API, which ensures that a specified number of Pods run to completion.

Until now, Kubernetes has had limited flexibility when a Job’s Pod failed or was evicted. Pod replacement behavior was often unpredictable: would the replacement Pod get scheduled on the same node? a nearby node? or anywhere in the cluster?

With Kubernetes v1.34, a new feature lands: Pod Replacement Policy for Jobs, driven by KEP-3015. This allows users to explicitly control how replacement Pods are scheduled, improving reliability, performance, and efficiency of batch workloads.

Why Pod Replacement Matters

When a Pod belonging to a Job fails (e.g., due to node drain, eviction, OOM, or hardware issue), Kubernetes creates a replacement Pod. However:

  • The replacement may land anywhere in the cluster.
  • If the Pod had local data (e.g., cached dataset, scratch disk, node-local SSD), the replacement Pod may not find it.
  • If the Pod had NUMA or GPU locality, the replacement might end up with suboptimal hardware.
  • In multi-zone clusters, scheduling a replacement Pod across zones could increase latency and cross-zone costs.

For workloads that depend on node affinity or cached state, this can be a real problem.

Current behavior:

By default, Kubernetes’ controller replaces pods as soon as they start terminating, which can lead to multiple pods running for the same task at the same time, especially in indexed Jobs. This can result in issues with workloads that require exactly one Pod per task, such as certain machine learning frameworks.

Starting replacement pods before old pods are terminated fully can cause other problems like extra cluster resources being used for running replacement pods.

Feature: Pod Replacement Policy feature

This feature, Kubernetes jobs will have two pod replacement policies to choose from:

  • TerminatingOrFailed (default): will create a replacement Pod as soon as the old one starts terminating.

  • Failed: waits until the old Pod is fully terminated and reaches the Failed state before creating a new one pod

    Using policy: Failed ensures that only one Pod runs for a task at a time

:::info Quick Demo: We will try to demo Pod Replacement Policy for Jobs feature for both scenarios

:::

SCENARIO 1: default behavior TerminatingOrFailed: demo steps.

\

  1. setup local kubernetes cluster (with minkube)

brew install minikube # start local cluster minikube start --kubernetes-version=v1.34.0

![fig: start kubernetes minikube server](https://miro.medium.com/v2/resize:fit:1400/1\*NnzqlUegwVbO8gWy8eZBNA.png)

# verify cluster is running kubectl get nodes # verify kubernetes version: v1.34.0

![fig: check kubernetes nodes & version](https://miro.medium.com/v2/resize:fit:770/1\*HaNAGHJ1gYfez8SO8mJk3w.png)

\

  1. define k8s job config with podReplacementPolicy: TerminatingOrFailed, apply job & monitor pods

# worker-job.yaml apiVersion: batch/v1 kind: Job meta name: worker-job spec: completions: 2 parallelism: 1 podReplacementPolicy: TerminatingOrFailed template: spec: restartPolicy: Never containers: - name: worker image: busybox command: ["sh", "-c", "echo Running; sleep 30"]

\

kubectl apply -f worker-job.yaml

# monitor pods are running kubectl get pods -l job-name=worker-job

\

\ \

  1. delete job pod manually and observe behavior

# delete pods associated with job:worker-job kubectl delete pod -l job-name=worker-job

\

scenario 2 : Delayed Replacement with Failed Policy: demo steps

  1. define k8s job config with podReplacementPolicy: Failed, apply job & monitor pods

# worker-job-failed.yaml apiVersion: batch/v1 kind: Job meta name: worker-job-failed spec: completions: 2 parallelism: 1 podReplacementPolicy: Failed template: spec: restartPolicy: Never containers: - name: worker image: busybox command: ["sh", "-c", "echo Running; sleep 1000"]

\

# monitor pods are running kubectl get pods -l job-name=worker-job-failed

\

\

  1. delete job pod manually and observe behavior

# delete pods associated with job:worker-job-failed kubectl delete pod -l job-name=worker-job-failed

behavior: replacement pod:worker-job-failed-q98qx is created only after the old pod:worker-job-failed-sg42q fully terminates, there is no overlap between old and new pod.

Benefits

  1. Improved Reliability: Jobs are now self-healing. A single pod failure no longer risks halting an entire workload. This makes Kubernetes jobs more trustworthy for critical processes.
  2. Reduced Operational Burden: Previously, operators often had to monitor jobs manually or write custom controllers/scripts to handle pod replacement. With this built-in capability, operational overhead is significantly reduced.
  3. Efficient Resource Utilization: Failed pods that linger without progress waste CPU and memory. Automatic replacement ensures resources are recycled effectively.

Better User Experience: For developers, running jobs becomes less error-prone. Teams can focus on business logic instead of constantly monitoring for pod failures.

Best Practices

  1. Tune restart policies: Use Never or OnFailure appropriately depending on workload characteristics.
  2. Monitor metrics: Use Prometheus/Grafana to track pod replacement events.
  3. Set resource requests/limits: Prevent unnecessary failures by properly sizing pods.
  4. Validate thresholds: Ensure replacement policies are configured to avoid endless restart loops.
  5. Test in staging: Before deploying to production, simulate pod failures in a staging cluster to observe replacement behavior.

Use Cases

  1. Machine Learning Workloads: Training models can take hours or days, and pod failures are inevitable. Automatic replacement ensures training jobs continue without manual restarts, making ML pipelines more resilient.
  2. Data Pipelines: ETL jobs or distributed data processing tasks often involve multiple pods running in parallel. Replacing failed pods ensures the pipeline completes successfully without operator intervention.

Takeaways

Pod replacement policy gives control over Pod creation timing to avoid overlaps, optimizes cluster resources by preventing temporary extra pods,and offers flexibility to choose the right policy for your job workloads based on your requirements and resource constraints

Reference(s)

  • https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release/ \n

\ \ \ \

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Edges higher ahead of BoC-Fed policy outcome

Edges higher ahead of BoC-Fed policy outcome

The post Edges higher ahead of BoC-Fed policy outcome appeared on BitcoinEthereumNews.com. USD/CAD gains marginally to near 1.3760 ahead of monetary policy announcements by the Fed and the BoC. Both the Fed and the BoC are expected to lower interest rates. USD/CAD forms a Head and Shoulder chart pattern. The USD/CAD pair ticks up to near 1.3760 during the late European session on Wednesday. The Loonie pair gains marginally ahead of monetary policy outcomes by the Bank of Canada (BoC) and the Federal Reserve (Fed) during New York trading hours. Both the BoC and the Fed are expected to cut interest rates amid mounting labor market conditions in their respective economies. Inflationary pressures in the Canadian economy have cooled down, emerging as another reason behind the BoC’s dovish expectations. However, the Fed is expected to start the monetary-easing campaign despite the United States (US) inflation remaining higher. Investors will closely monitor press conferences from both Fed Chair Jerome Powell and BoC Governor Tiff Macklem to get cues about whether there will be more interest rate cuts in the remainder of the year. According to analysts from Barclays, the Fed’s latest median projections for interest rates are likely to call for three interest rate cuts by 2025. Ahead of the Fed’s monetary policy, the US Dollar Index (DXY), which tracks the Greenback’s value against six major currencies, holds onto Tuesday’s losses near 96.60. USD/CAD forms a Head and Shoulder chart pattern, which indicates a bearish reversal. The neckline of the above-mentioned chart pattern is plotted near 1.3715. The near-term trend of the pair remains bearish as it stays below the 20-day Exponential Moving Average (EMA), which trades around 1.3800. The 14-day Relative Strength Index (RSI) slides to near 40.00. A fresh bearish momentum would emerge if the RSI falls below that level. Going forward, the asset could slide towards the round level of…
Share
BitcoinEthereumNews2025/09/18 01:23
How to earn from cloud mining: IeByte’s upgraded auto-cloud mining platform unlocks genuine passive earnings

How to earn from cloud mining: IeByte’s upgraded auto-cloud mining platform unlocks genuine passive earnings

The post How to earn from cloud mining: IeByte’s upgraded auto-cloud mining platform unlocks genuine passive earnings appeared on BitcoinEthereumNews.com. contributor Posted: September 17, 2025 As digital assets continue to reshape global finance, cloud mining has become one of the most effective ways for investors to generate stable passive income. Addressing the growing demand for simplicity, security, and profitability, IeByte has officially upgraded its fully automated cloud mining platform, empowering both beginners and experienced investors to earn Bitcoin, Dogecoin, and other mainstream cryptocurrencies without the need for hardware or technical expertise. Why cloud mining in 2025? Traditional crypto mining requires expensive hardware, high electricity costs, and constant maintenance. In 2025, with blockchain networks becoming more competitive, these barriers have grown even higher. Cloud mining solves this by allowing users to lease professional mining power remotely, eliminating the upfront costs and complexity. IeByte stands at the forefront of this transformation, offering investors a transparent and seamless path to daily earnings. IeByte’s upgraded auto-cloud mining platform With its latest upgrade, IeByte introduces: Full Automation: Mining contracts can be activated in just one click, with all processes handled by IeByte’s servers. Enhanced Security: Bank-grade encryption, cold wallets, and real-time monitoring protect every transaction. Scalable Options: From starter packages to high-level investment contracts, investors can choose the plan that matches their goals. Global Reach: Already trusted by users in over 100 countries. Mining contracts for 2025 IeByte offers a wide range of contracts tailored for every investor level. From entry-level plans with daily returns to premium high-yield packages, the platform ensures maximum accessibility. Contract Type Duration Price Daily Reward Total Earnings (Principal + Profit) Starter Contract 1 Day $200 $6 $200 + $6 + $10 bonus Bronze Basic Contract 2 Days $500 $13.5 $500 + $27 Bronze Basic Contract 3 Days $1,200 $36 $1,200 + $108 Silver Advanced Contract 1 Day $5,000 $175 $5,000 + $175 Silver Advanced Contract 2 Days $8,000 $320 $8,000 + $640 Silver…
Share
BitcoinEthereumNews2025/09/17 23:48
BlockchainFX or Based Eggman $GGs Presale: Which 2025 Crypto Presale Is Traders’ Top Pick?

BlockchainFX or Based Eggman $GGs Presale: Which 2025 Crypto Presale Is Traders’ Top Pick?

Traders compare Blockchain FX and Based Eggman ($GGs) as token presales compete for attention. Explore which presale crypto stands out in the 2025 crypto presale list and attracts whale capital.
Share
Blockchainreporter2025/09/18 00:30