This article introduces a novel disk scrubbing framework powered by Mondrian Conformal Prediction (MCP) to optimize maintenance in data storage systems. The approach uses system and storage statistics — including SMART parameters, Background Media Scanning (BMS) data, and CPU/disk utilization metrics — to predict drive health and workload patterns. By turning these predictions into scrubbing frequencies and schedules, the system intelligently prioritizes drives that require attention, thereby reducing downtime, extending disk lifespan, and improving overall storage reliability.This article introduces a novel disk scrubbing framework powered by Mondrian Conformal Prediction (MCP) to optimize maintenance in data storage systems. The approach uses system and storage statistics — including SMART parameters, Background Media Scanning (BMS) data, and CPU/disk utilization metrics — to predict drive health and workload patterns. By turning these predictions into scrubbing frequencies and schedules, the system intelligently prioritizes drives that require attention, thereby reducing downtime, extending disk lifespan, and improving overall storage reliability.

Why Predictive AI Might Be the Future of Disk Hygiene

2025/10/07 02:09
4 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1. Introduction

  1. Motivation and design goals

  2. Related Work

  3. Conformal prediction

    4.1. Mondrian conformal prediction (MCP)

    4.2. Evaluation metrics

  4. Mondrian conformal prediction for Disk Scrubbing: our approach

    5.1. System and Storage statistics

    5.2. Which disk to scrub: Drive health predictor

    5.3. When to scrub: Workload predictor

  5. Experimental setting and 6.1. Open-source Baidu dataset

    6.2. Experimental results

  6. Discussion

    7.1. Optimal scheduling aspect

    7.2. Performance metrics and 7.3. Power saving from selective scrubbing

  7. Conclusion and References

5. Mondrian conformal prediction for Disk Scrubbing: our approach

In contrast to the conventional studies mentioned above, we propose a novel approach for disk drive scrubbing based on Mondrian conformal prediction to quantitatively assess the health status of disk drives and use it as a metric for selecting drives for scrubbing. Figure 1 shows a high-level overview of the proposed method.

\ Figure 1: Overall approach of Mondrian conformal disk drive scrubbing.

\ The proposed architecture consists of three subsystems. The first subsystem is responsible for collecting storage and system statistics, which includes retrieving disk drive data from the storage array, as well as capturing CPU and disk busy statuses. The second subsystem, referred to as the drive health predictor engine, predicts the health status of the drives. It uses MCP to output a set of ”No concern” drive disks, i.e. unhealthy/dying drives that can be flagged for manual diagnostics by experts (not discussed in this paper) or completely healthy drives that do not need any scrubbing, as well as a set of ”Concern” disks with assigned health scores based on the predictor’s confidence, which then are turned into scrubbing frequencies with the scrubbing frequency indicator. The underlying non-conformity score used is margin error function. The third subsystem is the workload predictor engine, which first predicts the resources’ utilization percentage by taking into account SAR logs[2], and then combine this result with the scrubbing frequencies in order to schedule when and how frequently disk drive scrubbing is performed. Finally, the scrubbing operation is triggered on the storage array based on the scrubbing cycle. In the following subsections, each component of the overall architecture is described in detail.

5.1. System and Storage statistics

The main components of this subsystem are:

\ • SMART: stands for Self-Monitoring, Analysis, and Reporting Technology, and refers to a set of predefined parameters provided by device manufacturers that offer insights into various aspects of a storage device’s performance, including temperature, error rates, reallocated sectors, and more. Each attribute has a threshold value assigned by the manufacturer, indicating the acceptable limit for that parameter. When a parameter exceeds its threshold value, it may indicate a potential issue with the storage device. We use SMART parameters as input features for the drive health predictor engine.

\ • BMS: stands for Background Media Scanning, and is a passive process that differs from disk scrubbing, which actively scans the disk for errors during idle periods without reading or writing data. BMS involves scanning the disk for errors in the background without interrupting normal operations. In our proposed architecture, we also extract this BMS feature, which is a numerical value for the number of times it encounters errors while performing a scan on the same drive, and feed it to the drive health predictor engine.

\ • Disk and CPU busy time: The performance of a drive is heavily dependent on its critical processes, such as data access and write speed. The numeric values range between 1 to 100 in terms of percentage and change over time with a sampling period of 1 hour. These system statistics are extracted from the SAR logs (standard logs for system utilization) and converted into time series data, which can then be used by the workload predictor engine.

\ \

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::


[2] The System Activity Report is a command that provides information about different aspects of system performance. For example, data on CPU usage, memory and paging, interrupts, device workload, network activity, and swap space utilization


:::info Authors:

(1) Rahul Vishwakarma, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (rahuldeo.vishwakarma01@student.csullb.edu);

(2) Jinha Hwang, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (jinha.hwang01@student.csulb.edu);

(3) Soundouss Messoudi, HEUDIASYC - UMR CNRS 7253, Universit´e de Technologie de Compiegne, 57 avenue de Landshut, 60203 Compiegne Cedex - France (soundouss.messoudi@hds.utc.fr);

(4) Ava Hedayatipour, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (ava.hedayatipour@csulb.edu).

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

BlackRock boosts AI and US equity exposure in $185 billion models

BlackRock boosts AI and US equity exposure in $185 billion models

The post BlackRock boosts AI and US equity exposure in $185 billion models appeared on BitcoinEthereumNews.com. BlackRock is steering $185 billion worth of model portfolios deeper into US stocks and artificial intelligence. The decision came this week as the asset manager adjusted its entire model suite, increasing its equity allocation and dumping exposure to international developed markets. The firm now sits 2% overweight on stocks, after money moved between several of its biggest exchange-traded funds. This wasn’t a slow shuffle. Billions flowed across multiple ETFs on Tuesday as BlackRock executed the realignment. The iShares S&P 100 ETF (OEF) alone brought in $3.4 billion, the largest single-day haul in its history. The iShares Core S&P 500 ETF (IVV) collected $2.3 billion, while the iShares US Equity Factor Rotation Active ETF (DYNF) added nearly $2 billion. The rebalancing triggered swift inflows and outflows that realigned investor exposure on the back of performance data and macroeconomic outlooks. BlackRock raises equities on strong US earnings The model updates come as BlackRock backs the rally in American stocks, fueled by strong earnings and optimism around rate cuts. In an investment letter obtained by Bloomberg, the firm said US companies have delivered 11% earnings growth since the third quarter of 2024. Meanwhile, earnings across other developed markets barely touched 2%. That gap helped push the decision to drop international holdings in favor of American ones. Michael Gates, lead portfolio manager for BlackRock’s Target Allocation ETF model portfolio suite, said the US market is the only one showing consistency in sales growth, profit delivery, and revisions in analyst forecasts. “The US equity market continues to stand alone in terms of earnings delivery, sales growth and sustainable trends in analyst estimates and revisions,” Michael wrote. He added that non-US developed markets lagged far behind, especially when it came to sales. This week’s changes reflect that position. The move was made ahead of the Federal…
Share
BitcoinEthereumNews2025/09/18 01:44
Oil Jumps Above $90 as Iran Tensions Rise, Crypto Markets React

Oil Jumps Above $90 as Iran Tensions Rise, Crypto Markets React

The post Oil Jumps Above $90 as Iran Tensions Rise, Crypto Markets React appeared on BitcoinEthereumNews.com. Crypto sells off with Bitcoin as the Fear and Greed
Share
BitcoinEthereumNews2026/03/07 23:19
The Economics of Self-Isolation: A Game-Theoretic Analysis of Contagion in a Free Economy

The Economics of Self-Isolation: A Game-Theoretic Analysis of Contagion in a Free Economy

Exploring how the costs of a pandemic can lead to a self-enforcing lockdown in a networked economy, analyzing the resulting changes in network structure and the existence of stable equilibria.
Share
Hackernoon2025/09/17 23:00