This study explores how Mondrian Conformal Prediction (MCP) enhances traditional k-Nearest Neighbors (kNN) models in predicting hard drive failures. Using Baidu’s open-source dataset of over 23,000 Seagate HDDs, the experiment demonstrates that MCP increases accuracy for detecting failing disks, despite dataset imbalance. More importantly, it enables the selective scrubbing of only 22.7% of drives — drastically cutting energy use while maintaining reliability. The results highlight the value of confidence scoring in large-scale predictive maintenance systems.This study explores how Mondrian Conformal Prediction (MCP) enhances traditional k-Nearest Neighbors (kNN) models in predicting hard drive failures. Using Baidu’s open-source dataset of over 23,000 Seagate HDDs, the experiment demonstrates that MCP increases accuracy for detecting failing disks, despite dataset imbalance. More importantly, it enables the selective scrubbing of only 22.7% of drives — drastically cutting energy use while maintaining reliability. The results highlight the value of confidence scoring in large-scale predictive maintenance systems.

Predicting Hard Drive Failures Using Mondrian Conformal Prediction

2025/10/07 20:00
4 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract and 1. Introduction

  1. Motivation and design goals

  2. Related Work

  3. Conformal prediction

    4.1. Mondrian conformal prediction (MCP)

    4.2. Evaluation metrics

  4. Mondrian conformal prediction for Disk Scrubbing: our approach

    5.1. System and Storage statistics

    5.2. Which disk to scrub: Drive health predictor

    5.3. When to scrub: Workload predictor

  5. Experimental setting and 6.1. Open-source Baidu dataset

    6.2. Experimental results

  6. Discussion

    7.1. Optimal scheduling aspect

    7.2. Performance metrics and 7.3. Power saving from selective scrubbing

  7. Conclusion and References

6. Experimental setting

In this section, we detail the dataset used for our study and the conducted experiments as well as their results.

6.1. Open-source Baidu dataset

This dataset (DrTycoon, 2023) consists of samples collected from Seagate ST31000524NS enterprise-level HDDs, with a total of 23395 units and 13 features describing SMART attributes as shown in Table 2. The labeling of each disk was based on its operational status, categorized as either functional or failed. A significant proportion of disks, totaling 22962, were classified as functional, while a smaller subset of 433 was marked as failed, resulting in an imbalanced dataset. The SMART attribute values were recorded at an hourly interval for each disk, generating 168 samples per week for operational disks which gives 1,048,573 actual rows in the dataset corresponding to 23,395 disks (sampling frequency of 1 hour over a period of 2 years). The number of rows represents only the sample of operational disks that are provided in the dataset. However, the failed disks had varying numbers of samples, up to 20 days prior to failure.

\ Table 2: Features’ description for the Open-source Baidu dataset.

\

6.2. Experimental results

For our experiments, we employed the Python programming language and used the MAPIE[3] library (map) for implementing Mondrian Conformal Prediction. The underlying algorithm employed in our experiments was the k Nearest Neighbors (kNN) algorithm.

\ The main goal of conducting the experimental evaluation is to showcase the significant reduction in the number of disk drives to be scrubbed that can be achieved by using the drive health predictor engine, i.e. exploiting the Mondrian conformal predictor.

\ Table 3 shows a comparison between the confusion matrix for the drive disk classification problem using the underlying algorithm alone kNN and adding Mondrian Conformal Prediction, where label ”0” indicates a disk failure and label ”1” indicates a functional one. We can notice that, adding MCP, the number of disks correctly classified as failing has increased from 51314 to 51669, i.e., a difference of 355. This shows MCP helps to identify more disks of the minority class, but with a drawback that is a decrease in the number of disks correctly classified as healthy which has reduced from 296689 to 268616, i.e., a difference of 28073.

\ This issue can be solved by considering the confidence scores and their respective health status, as shown in Figure 5. There are nearly 126,224 drives with a health score greater than 99.95% for the disks labeled as healthy (left), out of total 349,525 disks, but when considering the relative health score, we categorize the 79,396 disk drives with a health score less than 99.9% as less healthy. Consequently, as shown in Table 4, we only select these 79,396 disk drives for scrubbing and skip the remaining 270,129. This approach significantly reduces the number of disks to be scrubbed to only 22.7%, resulting in lower power and energy consumption, which is noteworthy.

\ \ Table 3: Comparison of confusion matrix results for disk drive classification using kNN and MCP.

\ \ \ Table 4: The number of relatively healthy drives based on the health score intervals

\ \

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

:::

[3] https://github.com/adamzenith/MAPIE/tree/Mondrian


:::info Authors:

(1) Rahul Vishwakarma, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (rahuldeo.vishwakarma01@student.csullb.edu);

(2) Jinha Hwang, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (jinha.hwang01@student.csulb.edu);

(3) Soundouss Messoudi, HEUDIASYC - UMR CNRS 7253, Universit´e de Technologie de Compiegne, 57 avenue de Landshut, 60203 Compiegne Cedex - France (soundouss.messoudi@hds.utc.fr);

(4) Ava Hedayatipour, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840, United States (ava.hedayatipour@csulb.edu).

:::

\

Market Opportunity
OpenLedger Logo
OpenLedger Price(OPEN)
$0.14697
$0.14697$0.14697
-0.12%
USD
OpenLedger (OPEN) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Virginia Republicans rage against ex-GOP governor: 'Missing in action' while eyeing 2028

Virginia Republicans rage against ex-GOP governor: 'Missing in action' while eyeing 2028

Republicans in Virginia are turning on the state's former GOP governor, Glenn Youngkin, according to the Wall Street Journal, accusing him of being "missing in
Share
Alternet2026/03/10 00:31
Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC

The post Franklin Templeton CEO Dismisses 50bps Rate Cut Ahead FOMC appeared on BitcoinEthereumNews.com. Franklin Templeton CEO Jenny Johnson has weighed in on whether the Federal Reserve should make a 25 basis points (bps) Fed rate cut or 50 bps cut. This comes ahead of the Fed decision today at today’s FOMC meeting, with the market pricing in a 25 bps cut. Bitcoin and the broader crypto market are currently trading flat ahead of the rate cut decision. Franklin Templeton CEO Weighs In On Potential FOMC Decision In a CNBC interview, Jenny Johnson said that she expects the Fed to make a 25 bps cut today instead of a 50 bps cut. She acknowledged the jobs data, which suggested that the labor market is weakening. However, she noted that this data is backward-looking, indicating that it doesn’t show the current state of the economy. She alluded to the wage growth, which she remarked is an indication of a robust labor market. She added that retail sales are up and that consumers are still spending, despite inflation being sticky at 3%, which makes a case for why the FOMC should opt against a 50-basis-point Fed rate cut. In line with this, the Franklin Templeton CEO said that she would go with a 25 bps rate cut if she were Jerome Powell. She remarked that the Fed still has the October and December FOMC meetings to make further cuts if the incoming data warrants it. Johnson also asserted that the data show a robust economy. However, she noted that there can’t be an argument for no Fed rate cut since Powell already signaled at Jackson Hole that they were likely to lower interest rates at this meeting due to concerns over a weakening labor market. Notably, her comment comes as experts argue for both sides on why the Fed should make a 25 bps cut or…
Share
BitcoinEthereumNews2025/09/18 00:36
Wall Street Bull Warns! “US Stock Markets Could Collapse, Bitcoin (BTC) Could Fall Further!”

Wall Street Bull Warns! “US Stock Markets Could Collapse, Bitcoin (BTC) Could Fall Further!”

Wall Street bull Ed Yardeni raised the probability of a US stock market crash to 35 percent and warned of further selling pressure on Bitcoin. Continue Reading
Share
Bitcoinsistemi2026/03/10 00:34