# Accuracy Masterclass Part 1 - Choosing the “Right’ Metric for Success

Finding the "right" optimization metric for a data science problem may not be a straightforward task. One may expect that a “good” model would be able to get superior results versus all data science metrics available, however quite often this is not the case. This is a misconception. Therefore finding the metric most appropriate for the given objective is critical to building a scccessful data science model.

### 3 Main Learning Points

• Get familiar with different popular regression and classification metrics
• Find out about their strength and weaknesses
• See examples of when one metric may be prefrered over the others

All right. We have a very exciting clinic, at least for me. I hope it is for your topic today. We'll talk about metrics. And this is something that I had my fair share of exposure to, both within my Kaggle tenure, I've done a lot of Kaggle and have participated in over 200 challenges. So I have seen first there are different types of problems and different types of metrics used to optimize for this. But I had the great opportunity to work with many different clients and affiliates of h2o. And it is always an interesting and very important topic, which metric should we choose? For different problems? Sometimes it's critical to our success, we need to find the right one, because the target or other elements within the data might have properties that would make some metrics better than others.

But let's move on.

Then RMSE feels like a better choice. So it's really that thing, I think you need to put into your mindset what the error also means to you, and to the business that you're into the problem you're trying to solve. So it's not just a modeling and a scientific question. It's also a risk utility question, like, you know, what, what the error means to you. So in this case, an error of 200. Or it might be that that notion becomes like this when we're talking about larger errors, but you know, when 200 feels more than twice worse than 100 RMSE or MSE might be the error for you. Moving on, another very popular metric, and you will see the difference with MSE, basically you don't have that square as the mean absolute error, very popular, not as easily optimizable, although there are many proxies, and that goes, it's not defined at zero. However, there are proxies that can optimize it, still I have seen cases where selecting RMSE can actually with an optimizer can give you better results, even for, for mean absolute error. And that's because a lot of these problems are, you know, the solvers are approximations, they're not perfect. It is also very popular, you know, it's also simple to explain, you know, mean absolute error, you just subtract the prediction from the target that take the absolute contribution of it, I guess many algorithms and different packets is net, you name it neural network, gradient boosting machines have an implementation for this. So in this case, all errors, and we're going to want to use it now, are analogously important. So when and that is 200 Feels like two times 100 which is what it is mathematically, then this is the right metric for you or could be the right metric for you. However, this is very often not the case, and I have seen this, you know, you interact with people, I can easily change, if I changed the background, if I told you you have infinite money in my bank, I can live with it truly a 200 error is twice you know, and 100 error. Now, if I tell you that I have only \$150,000 in this case in my bank, I cannot really leave with the 200 error. So I don't want my model to be giving me errors that are that high. So in that case, I actually want that square to be applied here, given the context given my situation, which can be applied to any business context. And that's because I may have a different appetite for risk or or or equivalently. My might not be the right metric for me. So that is what I want people to think about as we go through this. When the error of 200 fields is what it is to ice There's 100, that is the right metric for you. If that is not the case, for whatever reason, you know, you become inflexible after a certain amount, or if the amount becomes as the amount becomes bigger, then my, maybe it's not the right metric for you, you want to switch to RMSE or something else. However it is, it is very popular. And you can see why it's so simple and straightforward as well. Now we go to something a little bit more tricky. It's a metric, which is very popular among business holders, it's similar to mine is the mean absolute percentage and error only you divide with the actual value as well. Business owners really like it, because it can express the error as a percentage. And you know, who doesn't like that? It makes it more easily comparable as well, across different different models.

And in this case, all letters are analogously weighted by the percentage of the errors, and you can immediately see that people here really ignore the volume. So if my error is 1000, in actuality it is 10,000, I have a 10% error. But if my error is 1 million, and my accidents are 10 million, again, I'm having a 10% error. So it's, it's the same contribution. Whereas if I have an actual one, or sorry, a prediction, let's say off one and the prediction of two, I'm having a 50% error. So with very small volume, I can have a much, much bigger error. So you can see how these may not be the right choice for certain problems. You need to avoid it centrally, there's not a good case, when you have really when your target variable takes zeros, you can immediately see you will basically be dividing with zero, this cannot happen. There are different ways to go about it, quite often we are the constant by you, different optimizers have approximations for us to try to deal with it. You may even exclude all these cases where the actual is zero from the optimization. But if you have negative values, zero values, also kind of very high rates and standard deviation, it really becomes very difficult to optimize for this metric. And you can easily see if, if my projects next time, let's say for tomorrow, can be any of anybody, you know, it could be one, or it could be 1 million, or it could be 10 million. And let's say I'm predicting 1 million, but the actual is one, which you know, could be in the stock market, a, you know, I can have a really huge MAPE. And I can have one observation that is basically a huge, huge outlier. And it really can affect my whole model. So it's not really ideal in these kinds of situations.

So, to use MAPE, you really need to look at the target variable. I understand it's easy to explain. But if you have lots of zeros, if you have a very big range with high standard deviation values, it's really a very difficult metric to make it work. Ideally, you want positive values that are you know, like away from zeros. And yeah, not ideally, maybe not very great or you want your range to be centered in like inserted positive values. However, you know, if you want to explain the results to stakeholders, or even if you want to be able to compare models in different industries or areas or levels, because of the percentage, you can make it more easily. Consumable, I'd say. So I think that's why this metric is also very popular in practice. So where may or may fail you because of some of the reasons that I just mentioned. Smith may help you. And basically the difference is that you add the prediction to the denominator. And now it gets to the maximum, you basically get a cap. So you cannot get that situation where one single case can have huge, huge concerns. So now it gets capped, I think the maximum can be at least 200%. And you're basically safe from the shoe trade Laird ruining your model, the issue you are facing this time is that by always adding their prediction, you're making it too easy, you're gonna make it too easy to the optimizer, I'm not sure if I can use this term. Hopefully it makes sense in this context. It becomes any model you build, where we'll look at maybe becoming too insensitive to target fluctuations. Now, you're you're really, you're really helping it, you're helping it a lot. And, and sometimes you want your machine learning algorithms, to be able to go deeper in to be forced to deal with with outliers and learn how to identify them, if possible, and so Smith has these side, negative side of it, but at the positive, you cannot have observations, if you observe races ruining ruining your model. So when can we use make basically when you can make you cannot make work, and you really want to to produce something that can be expressed as a percentage to be explained and used by stakeholders ardor r squared, that's an interesting metric. Where I think it's very nicely applicable. It's also where MAPE is applicable, in the same context, is when you want to compare different models, if you look at the formula, what you're basically doing is you're comparing a model that uses the error of the model that uses your predictions, that's just a very simple model that only uses the mean. So if we just use the mean , it is the most simple prediction we can make, right? You know, what if a student comes, try to predict what the age of a student will be, maybe I just tried to take the evidence aids of students in the school. And that's my prediction. And that could be in many cases, it could be a credible prediction, or I can use my machine learning model. So how better my model is from that very basic model that just uses the mean, this is essentially the R square metric. So it's essentially how better my model is from a baseline. That way we can compare models against, you know, like different industries or different, slightly different departments. For instance, if I tell you, the average salary is 10, this doesn't mean anything without contracts, you know, if if, then how many children somebody's going to have? If I get an average error of 10, that's, you know, that's very, that's a very big error, right? That'd be good.

Or if it is, you know, how much is your annual salary, then, you know, like \$10 at, or that's a very good error, if I can predict, you know, the annual income of somebody with a minus plus than \$10. That's a very good prediction, if I can get to that 10 Or so it really depends on the context, how, how good an error by you is, and this is what our software offers a way to compare models on different levels. And so how better is my model from an accurate prediction? This is what this metric essentially says. Obviously, the pitfall is that it does not tell you what the average error is. So you don't have these \$10 or these 10 Children, you don't have that. You get that by you, between one and you can actually go to minus infinity. Which is not very, very, very nice. Hopefully, most of the time models are better than an evidence prediction. So this doesn't happen often, but still. So you, you get an idea of how good the model is. So if I tell you at Square 07 in any context, it feels like this is a good model. If I tell you it's zero 99 It feels like a very good model. Again, context matters. But you have a way to compare and understand, generally, if a model is good or not, not necessarily if it is useful, but if it's good or not, if it is much better than a baseline, quite often, as you can see, because it uses the squared error, it is optimizable by MSE and animacy solvers. So that makes it convenient. And, and yeah, it's, it's when you want to be able to compare models against different different industries or different departments or different levels, and you want to get a general idea of if your model is good or not, in general, I guess the baseline this is this is the metric that you use, often you have to accompany it with something else something else that measures the volume of the error as well.