Return to page


Can Your Machine Learning Model Be Hacked?!


By Team | minute read | May 02, 2019

Blog decorative banner image

I recently published a longer piece  on security vulnerabilities and potential defenses for machine learning models. Here’s a synopsis.


Today it seems like there are about five major varieties of attacks against machine learning (ML) models and some general concerns and solutions of which to be aware. I’ll address them one-by-one below.

Data poisoning

Data poisoning happens when a malicious insider or outsider changes your model’s input data so that the predictions from your final trained model either benefit themselves or hurt others.

How could this actually happen?

A malicious actor could get a job at a small disorganized lender, where the same person is allowed to manipulate training data, build models, and deploy models. Or the bad actor could work at a massive financial services firm, and slowly request or accumulate the same kind of permissions. Then this person could change a lending model’s training data to award disproportionately large loans to people they like or grant unreasonably small loans to people (or groups of people) they don’t like.

How can I prevent this?

  • Disparate impact analysis: Use tools like aequitas or AI Fairness 360 to look for intentional (or unintentional) discrimination in your model’s predictions. (You should be doing this for any model that affects people anyway …)
  • Fair or private models: Consider modeling algorithms that are designed to focus less on individual or demographic traits like learning fair representations (LFR) or private aggregation of teacher ensembles (PATE).
  • Reject on negative impact (RONI) analysis: See The Security of Machine Learning.
  • Residual analysis: For forensic analysis, look at large positive deviance residuals very carefully. (These are often people who should not have gotten a loan, but did.)
  • Self-reflection: Score your models on your employees, consultants, and contractors and look for anomalously beneficial predictions.


Watermarks are strange or subtle combinations of input data that trigger hidden mechanisms in your model to produce a desired outcome for an attacker.

How could this actually happen?

A malicious insider or outside attacker could hack the production code that generates your model’s predictions to respond to some unknown combination of input data in a way that benefits themselves or their associates or in a way that hurts others. For instance an input data value combination such as years_on_job > age could trigger a hidden branch of code that would award improperly small insurance premiums to the attacker or their associates.

How can I prevent this?

  • Anomaly detection: Autoencoders are a type of ML model that can find strange input data automatically.
  • Data integrity constraints: Don’t allow impossible combinations of data into your production scoring queue.
  • Disparate impact analysis: (See above.)
  • Version control: Track your production model scoring code just like any other piece of enterprise software.

Inversion by surrogate model

Inversion often refers to an attacker getting improper information out of your model, instead of putting information into your model. A surrogate model is a model of another model. So, in this type of attack, a hacker could build a model of your model’s predictions and a copy of your model. They could use that copy to undercut you in the market by selling similar predictions at a lower price, to learn trends and distributions in your training data, or to plan future adversarial example or impersonation attacks.

How could this actually happen?

Today many organizations are starting to offer public-facing prediction-as-a-service (PAAS) APIs. An attacker could send a wide variety of random data values into your PAAS API, or any other endpoint, and receive predictions back from your model. They could then build their own ML model between their input values and your predictions to build a copy of your model!

How can I prevent this?

  • Authentication: Always authenticate users of your model’s API or predictions.
  • Throttling: Consider artificially slowing down your prediction response times.
  • White-hat surrogate models: Try to build your own surrogate models as a white-hat hacking exercise. Here’s an example of building a surrogate model.
  • Forensic watermarks: Consider adding subtle or unusual additional information to your model’s predictions to aid in forensic analysis if your model is stolen.

Adversarial example attacks

Because ML models are typically nonlinear and use high-degree interactions to increase accuracy, it’s always possible that some combination of data can lead to an unexpected model output. Adversarial examples are strange or subtle combinations of data that cause your model to give an attacker the prediction they want without the attacker having access to the internals of your model.

How could this actually happen?

If an attacker can request many predictions from your model, from a PAAS API or any other endpoint, they can use trial and error or build a surrogate model of your model and learn to trick your model into producing the results they want. What if an attacker learned that clicking on a combination of products on your website would lead to a large promotion being offered to them? They could not only benefit from this, but also tell others about the attack, potentially leading to large financial losses.

How can I prevent this?

  • Anomaly detection: (See above.)
  • Authentication: (See above.)
  • Benchmark models: Always compare complex model predictions to trusted linear model predictions. If the two model’s predictions diverge beyond some acceptable threshold, review the prediction before you issue it.
  • Throttling: (See above.)
  • Model monitoring: Watch your model in real-time for strange prediction behavior.
  • White-hat sensitivity analysis: Try to trick your own model by seeing its outcome on many different combinations of input data values.
  • White-hat surrogate models: (See above.)


Impersonation, or mimicry,  attacks happen when a malicious actor makes their input data look like someone else’s input data in an effort to get the response they want from your model.

How could this actually happen?

Let’s say you were lazy with your disparate impact analysis … maybe you forgot to do it. An attacker might not be so lazy. If they can map your predictions back to any identifiable characteristic: age, ethnicity, gender or even something invisible like income or marital status, they can detect your model’s biases just from it’s predictions. (Sound implausible? Journalist from Propublica were able to do just this  in 2016.) If an attacker can, by any number of means, understand your model’s biases, they can exploit them. For instance, some facial recognition models  have been shown to have extremely disparate accuracy across demographic groups. In addition to the serious fairness problems presented by such systems, there are also security vulnerabilities that malicious actors could easily exploit.

What can I do to prevent this?

  • Model monitoring: Watch for too many similar predictions in real-time. Watch for too many similar input rows in real-time.
  • Authentication: (See above.)

General concerns

Some concepts aren’t associated with any one kind of attack, but could be potentially worrisome for many reasons. These might include:

  • Black-box models: It’s possible that over time a motivated, malicious actor could learn more about your own black-box model than you know and use this knowledge imbalance to carry out the attacks described above.
  • Distributed-denial-of-service (DDOS) attacks: Like any other public-facing service, your model could be attacked with a traditional DDOS attack that has nothing to do with machine learning.
  • Distributed systems and models: Data and code spread over many machines provides a larger, more complex attack surface for a malicious actor.
  • Package dependencies: Any package your modeling pipeline is dependent on could potentially be hacked to conceal an attack payload.

General Solutions

There are a number of best practices that can be used to defend your models in general and that are probably beneficial for other model life-cycle management purposes as well. Some of these practices are:

  • Authorized access and prediction throttling for APIs and other endpoints.
  • Benchmark models: Always compare complex model predictions to less complex (and hopefully less hackable) model predictions. For traditional, low signal-to-noise data mining problems, predictions should probably not be too different. If they are, investigate them.
  • Interpretable, fair, or private models: Some types of nonlinear models are sometimes designed to be directly interpretable, less discriminatory, or harder to hack. Consider using them. In addition to models like LFR and PATE, also checkout monotonic GBMs and Rulefit.
  • Model documentation: Any deployed model should be documented well-enough that a new employee could diagnose whether its current behavior is notably different from its intended or original behavior. Also keep details about who trained what model and on what data.
  • Model monitoring: Analyze the inputs and predictions of deployed models on live data. If they seem strange, investigate the problem.


Many practitioners I’ve talked to agree these attacks are possible and will probably happen … it’s a question of when, not if. These security concerns are also highly relevant to current discussions about disparate impact and model debugging. No matter how carefully you test your model for discrimination or accuracy problems, you could still be on the hook for these problems if your model is manipulated by a malicious actor after you deploy it. What do you think? Do these attacks seem plausible to you? Do you know about other kinds of attacks? Let us know here.

 headshot Team

At, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.