This article introduces a predictive framework that optimizes data-center disk scrubbing. Instead of treating drives as simply “healthy” or “failing,” a Mondrian Conformal Prediction model assigns each disk a health confidence score to guide targeted maintenance. Combined with a workload predictor using a Probabilistically Weighted Fuzzy Time Series (PWFTS) algorithm, it determines the best time to perform scrubbing when system load is low. The result: reduced downtime, improved efficiency, and lower carbon emissions in large-scale storage systems.This article introduces a predictive framework that optimizes data-center disk scrubbing. Instead of treating drives as simply “healthy” or “failing,” a Mondrian Conformal Prediction model assigns each disk a health confidence score to guide targeted maintenance. Combined with a workload predictor using a Probabilistically Weighted Fuzzy Time Series (PWFTS) algorithm, it determines the best time to perform scrubbing when system load is low. The result: reduced downtime, improved efficiency, and lower carbon emissions in large-scale storage systems.

How Predictive Algorithms Are Making Data Center Disk Scrubbing Smarter

2025/10/07 19:00
5 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1. Introduction

  1. Motivation and design goals

  2. Related Work

  3. Conformal prediction

    4.1. Mondrian conformal prediction (MCP)

    4.2. Evaluation metrics

  4. Mondrian conformal prediction for Disk Scrubbing: our approach

    5.1. System and Storage statistics

    5.2. Which disk to scrub: Drive health predictor

    5.3. When to scrub: Workload predictor

  5. Experimental setting and 6.1. Open-source Baidu dataset

    6.2. Experimental results

  6. Discussion

    7.1. Optimal scheduling aspect

    7.2. Performance metrics and 7.3. Power saving from selective scrubbing

  7. Conclusion and References

5.2. Which disk to scrub: Drive health predictor

In a normal data center setting, all disk drives are classified as either healthy or unhealthy. Unhealthy disks are supposed to be dying or imminently failing, thus they are not marked for scrubbing, while healthy disks are marked for scrubbing.

\ In our approach, we propose to assign a relative ’degree of health’ score to each disk. Drives that are marked as of No concern are either dying/imminently failing or completely healthy, while those marked as of Concern have different degrees of health other than failing or healthy. The conformal prediction framework then classifies the ”No-concern” and ”Concern” drives, and only selects the disks which are in the set of ”Concern” drives for further ranking. These are the drives which are concerning to us and is used as input for the scrubbing scheduler.

\ Our focus, as shown in Figure 2, is on identifying disks in the system that are currently of concern or may become concerning soon, and only selecting those disks for scrubbing. This approach reduces the number of disks meant for scrubbing, since even completely healthy drives are not scrubbed, making the process more efficient and targeted. By doing so, we optimize time, power, and energy consumption and reduce the carbon footprint of data centers.

\ Figure 2: Quantifying the health of disk drives: The disks which are healthy and nonhealthy are not selected for scrubbing, while the disks of concern are marked for scrubbing.

\ When dealing with disk drives in a usual data center environment, failures are rare over a period of time, resulting in a highly imbalanced dataset with a small number of failed disks and the majority of disks being healthy. To handle this imbalanced data, we adopt a Mondrian Conformal Prediction approach, in order to get the prediction labels ”0”: failed and ”1”: healthy, along with their confidence score that serves as a health score in our case. This means that our MCP algorithm selects disks with a confidence score depending on the threshold chosen by the administrator.

\ For instance, if the administrator sets a threshold of 1%, this will lead to excluding disks with health scores above 99% as healthy or failing (depending on the label) and only selecting disks with a health score lower than 99% for scrubbing. Furthermore, the selected drives can be mapped to distinct scrubbing frequencies. Thus, drives with poor health scores may require more frequent scrubbing (every week), while those with good health scores will need less frequent scrubbing (every 3 months). For the same threshold of 1%, the administrator can then map the disk health with a scrubbing frequency, as in Table 1.

\ Table 1: Mapping of the disk health with the scrubbing frequency based on health score.

\

5.3. When to scrub: Workload predictor

After identifying the disks to be scrubbed using the drive health predictor engine, the next step is to determine the optimal time to perform scrubbing using the workload predictor. This component needs to consider the availability of system resources, i.e. disk and CPU utilization information in the system and storage statistics subsystem.

\ The workload predictor employs a Probabilistically Weighted Fuzzy Time Series algorithm (PWFTS), as detailed in (Orang et al., 2020). This algorithm forecasts n-days ahead system utilization, by predicting the system utilization percentage for the next 12 hours, with 1-hour intervals. Then, this information is combined with one of the three possible scrubbing cycles (A, B, or C as in Table 1) obtained from the drive health predictor. Finally, the scrubbing is triggered. During the 1-hour interval, if the scrubbing is complete, then we stop, if not, the administrator is notified. The high-level flowchart for the system workload predictor is outlined in Figure 3.

\ \ Figure 3: Flowchart of the workload predictor using the PWFTS algorithm.

\ \ In Figure 4, we showcase the n-days ahead forecasting of the system utilization percentage. It is evident from the figure that the system exhibits a lower load on day 0 and a higher load on day 2. Consequently, scheduling the scrubbing operations at day 0, when the system is under a lower load, would be more favorable. This approach optimizes the utilization of system resources, ensuring efficient scrubbing of the disks, and leading to lower processing time, lower energy consumption, and a reduced carbon footprint of the data center.

\ \ Figure 4: Probability distribution of system utilization percentage for n-days ahead forecasting.

\ \ \

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::


:::info Authors:

(1) Rahul Vishwakarma, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (rahuldeo.vishwakarma01@student.csullb.edu);

(2) Jinha Hwang, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (jinha.hwang01@student.csulb.edu);

(3) Soundouss Messoudi, HEUDIASYC - UMR CNRS 7253, Universit´e de Technologie de Compiegne, 57 avenue de Landshut, 60203 Compiegne Cedex - France (soundouss.messoudi@hds.utc.fr);

(4) Ava Hedayatipour, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (ava.hedayatipour@csulb.edu).

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Virginia Republicans rage against ex-GOP governor: 'Missing in action' while eyeing 2028

Virginia Republicans rage against ex-GOP governor: 'Missing in action' while eyeing 2028

Republicans in Virginia are turning on the state's former GOP governor, Glenn Youngkin, according to the Wall Street Journal, accusing him of being "missing in
Share
Alternet2026/03/10 00:31
Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

The post Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC appeared on BitcoinEthereumNews.com. Franklin Templeton CEO Jenny Johnson has weighed in on whether the Federal Reserve should make a 25 basis points (bps) Fed rate cut or 50 bps cut. This comes ahead of the Fed decision today at today’s FOMC meeting, with the market pricing in a 25 bps cut. Bitcoin and the broader crypto market are currently trading flat ahead of the rate cut decision. Franklin Templeton CEO Weighs In On Potential FOMC Decision In a CNBC interview, Jenny Johnson said that she expects the Fed to make a 25 bps cut today instead of a 50 bps cut. She acknowledged the jobs data, which suggested that the labor market is weakening. However, she noted that this data is backward-looking, indicating that it doesn’t show the current state of the economy. She alluded to the wage growth, which she remarked is an indication of a robust labor market. She added that retail sales are up and that consumers are still spending, despite inflation being sticky at 3%, which makes a case for why the FOMC should opt against a 50-basis-point Fed rate cut. In line with this, the Franklin Templeton CEO said that she would go with a 25 bps rate cut if she were Jerome Powell. She remarked that the Fed still has the October and December FOMC meetings to make further cuts if the incoming data warrants it. Johnson also asserted that the data show a robust economy. However, she noted that there can’t be an argument for no Fed rate cut since Powell already signaled at Jackson Hole that they were likely to lower interest rates at this meeting due to concerns over a weakening labor market. Notably, her comment comes as experts argue for both sides on why the Fed should make a 25 bps cut or…
Share
BitcoinEthereumNews2025/09/18 00:36
Wall Street Bull Warns! “US Stock Markets Could Collapse, Bitcoin (BTC) Could Fall Further!”

Wall Street Bull Warns! “US Stock Markets Could Collapse, Bitcoin (BTC) Could Fall Further!”

Wall Street bull Ed Yardeni raised the probability of a US stock market crash to 35 percent and warned of further selling pressure on Bitcoin. Continue Reading
Share
Bitcoinsistemi2026/03/10 00:34