Predicting Failures from Sensor Data using AI/ML — Part 2

This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .

Missing Values & Data Imbalance

One of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — Derived from Auto-Viz in Driverless AI . From the picture below, one can tell that a majority of sensor data is missing or incomplete – the red color in the aggregated chart indicates missing data. Where it’s incomplete, one can easily guess, it might be that not all hard-disk vendors agree to generate sensor data for a S.M.A.R.T sensor variable.

When I tried to build a base AI/ML model in Driverless AI, I got a notification that it automatically dropped these 19 columns because of empty or constant values.

I also got this notification:

Driverless AI 1.7.x does sampling by default for imbalanced data for every iteration to achieve good overall accuracy.

IID Model or Time Series AI/ML Model?

We can build an IID (Independent and Identically Distributed) model and treat all the rows as independent. We can also treat the data as time series, as sensor data is available for each model/serial # every day. I will build a Time Series Classification model in this blog. Since the data is highly imbalanced and it’s a binary classification experiment, I’ll use LogLoss as the scorer to optimize the model. We can always check AUC, AUCPR, etc., once it’s done.

I set a high value of 8/8 for Accuracy (Model tuning effort) and Time (>> Iterations/Early stopping only after 20 iterations) and set Interpretability to 5 (Medium Feature Engineering ). Driverless AI can automatically do the shift detection and drop overfitting columns automatically using some AUC threshold – but I disabled the shift detection in Expert Settings.

What is Shift Detection? Detecting distribution shifts of variables between training and test sets can avoid overfitting in models. Driverless AI by default does this to prevent overfitting. It’s an optional feature, and you can always turn it off, like I did here.

I set the Target Column to “failure”, the time column to “date”, and the forecast horizon to “7 days”. The assumption is that the data center can find the spare drive and get it ready to replace it within 7 days, before the failure happens. You can also change this for 1,15, 30 days etc ., – whatever days you want to forecast that suits the predictive maintenance task at hand.

The plan says that from 67 columns, it is going to build 4K features with 8 features being picked for the final pipeline after 208 iterations of model tuning/feature engineering. There is more on the screen that you can read.

BYOR Transformers

Driverless AI allows users to add custom feature engineering or models to the evolutionary model/feature finding process. It’s using a feature called “Bring Your Own Recipe” or BYOR. I uploaded the following feature recipes from this GitHub location , so the experiment can try using it and see if it adds value in feature transformations (besides the default ones):

BoxCox
Log
SquareRoot
QuantileWinsorizer
TwoSigmaWinsorizer
YeoJohnson
firstNCharsCVTE — This recipe takes the text columns, such as model and serial_number, and does a substring of 1,2,3,4 … characters, and then does Cross Validated Target Encoding.

Final Results with Time Series Model

After running for several hours on a single GPU box, Driverless AI built 2.4K m odels on 4K features and shows you the final result!

While the AUC on the training/validation is 0.9713, the AUCPR looks much more reasonable with a 0.26 score, given the 1:1000 imbalance in the training data set. This 0.26 value is for the micro-averaged value across cross-validation results on the training data set.

Clearly the BYOR recipes such as FRST[N]CHARCVTE come out on top at different positions. The initial characters of the hard-disk model name are a great feature that gives us good predictability, it would seem. The default feature engineering in Driverless AI, such as Cross ValidatedTarget Encoding, Numeric to Categorical Target Encoding, is useful in the prediction.

We can also see that the original columns, such as:

smrt_241_totl_lbas_read
smrt_193_rprtd_uncrrctble_sctr_cnt
smrt_187_rprtd_uncrrctble_errs
smrt_5_realloc_sector_cnt
smrt_12_power_cycl_cnt
smrt_7_seek_error_rate

etc., are either appearing on their own or getting feature engineered in interesting ways to create derived features to maximize the prediction score. It’s not surprising that the logical blocks read from hard-disk and the errors reported by the smart sensor are correlated to a hard-disk failure!

So How Accurate Was the Model When It Predicted the Test Data Set?

www.h2o.ai2019/09/Picture9-2-781x1024.png

Even though the training AUCPR was pretty impressive, the test set confusion matrix is a bit more realistic on what you can expect in production deployment.

I didn’t spend a lot of time (but plan to in near future) here — I probably did some 10–15 models and arbitrarily split the training/test without giving it much thought. You can, however, see for the high data imbalance of 1:1000 failures in the training data set, we are predicting 8 out of a total of 31 hard-disk failures in the test data set, which is roughly 25% failures, with a ~ 50x false positives. The above # is based on a threshold value of prediction probability. Changing that can get us the desired tradeoff between false positives/false negatives, etc. It’s also interesting to note that we are predicting 99.99% correctly on rows that don’t indicate failures correctly, which is not surprising how a majority class generally dominates in predictability with imbalanced data.

What I Did Not Do Yet:

There are a lot of Expert Settings options in Driverless AI that give us control over Imbalanced Sampling, how to generate Hold Out Predictions, adding more algorithms, vendor-specific feature engineering recipes, etc. I forgot to remove the Date Transformers – the model might overfit on that, as the day of the week, etc., is showing up in feature importance currently. So the model can be definitely improved over time with additional tweaks — the goal being to reduce false negatives first with an additional goal of lowering false positives.

Citation: The Hard-disk failure data used in this blog post and the previous one is from BackBlaze.com.

Explore similar content by topic

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition. Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

BLOG