Return to page


Predicting Failures from Sensor Data using AI/ML — Part 2


By Karthik Guruswamy | minute read | September 27, 2019

Blog decorative banner image

This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .

Missing Values & Data Imbalance

One of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — Derived from Auto-Viz in Driverless AI . From the picture below, one can tell that a majority of sensor data is missing or incomplete – the red color in the aggregated chart indicates missing data. Where it’s incomplete, one can easily guess, it might be that not all hard-disk vendors agree to generate sensor data for a S.M.A.R.T sensor variable.

www.h2o.ai2019/09/KB_1.png www.h2o.ai2019/09/KB_1.png

When I tried to build a base AI/ML model in Driverless AI, I got a notification that it automatically dropped these 19 columns because of empty or constant values.

I also got this notification:

 Driverless AI 1.7.x does sampling by default for imbalanced data for every iteration to achieve good overall accuracy.

IID Model or Time Series AI/ML Model?

We can build an IID (Independent and Identically Distributed) model and treat all the rows as independent. We can also treat the data as time series, as sensor data is available for each model/serial # every day. I will build a Time Series Classification model in this blog. Since the data is highly imbalanced and it’s a binary classification  experiment, I’ll use LogLoss as the scorer to optimize the model. We can always check AUC, AUCPR, etc., once it’s done.

www.h2o.ai2019/09/KB-2.png www.h2o.ai2019/09/KB-2.png

I set a high value of 8/8 for Accuracy (Model tuning effort) and Time (>> Iterations/Early stopping only after 20 iterations) and set Interpretability to 5 (Medium Feature Engineering ). Driverless AI can automatically do the shift detection and drop overfitting columns automatically using some AUC threshold – but I disabled the shift detection in Expert Settings.

What is Shift Detection?  Detecting distribution shifts of variables between training and test sets can avoid overfitting in models. Driverless AI by default does this to prevent overfitting. It’s an optional feature, and you can always turn it off, like I did here.

I set the Target Column to “failure”, the time column to “date”, and the forecast horizon to “7 days”. The assumption is that the data center can find the spare drive and get it ready to replace it within 7 days, before the failure happens. You can also change this for 1,15, 30 days etc ., – whatever days you want to forecast that suits the predictive maintenance task at hand.

The plan says that from 67 columns, it is going to build 4K features with 8 features being picked for the final pipeline after 208 iterations of model tuning/feature engineering. There is more on the screen that you can read.

BYOR Transformers

Driverless AI allows users to add custom feature engineering or models to the evolutionary model/feature finding process. It’s using a feature called “Bring Your Own Recipe” or BYOR. I uploaded the following feature recipes from this GitHub location , so the experiment can try using it and see if it adds value in feature transformations (besides the default ones):

www.h2o.ai2019/09/KB-4.png www.h2o.ai2019/09/KB-4.png

Final Results with Time Series Model

After running for several hours  on a single GPU box, Driverless AI built 2.4K m odels  on 4K features  and shows you the final result!

www.h2o.ai2019/09/KB-5.png www.h2o.ai2019/09/KB-5.png

While the AUC on the training/validation is 0.9713, the AUCPR looks much more reasonable with a 0.26 score, given the 1:1000 imbalance in the training data set. This 0.26 value is for the micro-averaged value across cross-validation  results on the training data set.

www.h2o.ai2019/09/KB-6.png www.h2o.ai2019/09/KB-6.png

Clearly the BYOR recipes such as FRST[N]CHARCVTE come out on top at different positions. The initial characters of the hard-disk model name are a great feature that gives us good predictability, it would seem. The default feature engineering in Driverless AI, such as Cross ValidatedTarget Encoding, Numeric to Categorical Target Encoding, is useful in the prediction.

www.h2o.ai2019/09/Picture7-2.png www.h2o.ai2019/09/Picture7-2.png

We can also see that the original columns, such as:

  • smrt_241_totl_lbas_read
  • smrt_193_rprtd_uncrrctble_sctr_cnt
  • smrt_187_rprtd_uncrrctble_errs
  • smrt_5_realloc_sector_cnt
  • smrt_12_power_cycl_cnt
  • smrt_7_seek_error_rate

etc., are either appearing on their own or getting feature engineered in interesting ways to create derived features to maximize the prediction score. It’s not surprising that the logical blocks read from hard-disk and the errors reported by the smart sensor are correlated to a hard-disk failure!

So How Accurate Was the Model When It Predicted the Test Data Set?

www.h2o.ai2019/09/Picture9-2-781x1024.png www.h2o.ai2019/09/Picture9-2-781x1024.png

Even though the training AUCPR was pretty impressive, the test set confusion matrix  is a bit more realistic on what you can expect in production deployment.

I didn’t spend a lot of time (but plan to in near future) here — I probably did some 10–15 models and arbitrarily split the training/test without giving it much thought. You can, however, see for the high data imbalance  of 1:1000 failures in the training data set, we are predicting 8 out of a total of 31 hard-disk failures in the test data set, which is roughly 25% failures, with a ~ 50x false positives. The above # is based on a threshold value of prediction probability. Changing that can get us the desired tradeoff between false positives/false negatives, etc. It’s also interesting to note that we are predicting 99.99% correctly on rows that don’t indicate failures correctly, which is not surprising how a majority class generally dominates in predictability with imbalanced data.

What I Did Not Do Yet:

There are a lot of Expert Settings options in Driverless AI that give us control over Imbalanced Sampling, how to generate Hold Out Predictions, adding more algorithms, vendor-specific feature engineering recipes, etc. I forgot to remove the Date Transformers – the model might overfit on that, as the day of the week, etc., is showing up in feature importance currently. So the model can be definitely improved over time with additional tweaks  —  the goal being to reduce false negatives first with an additional goal of lowering false positives.

Citation:  The Hard-disk failure data used in this blog post and the previous one is from



Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition. Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.