Some Problems with Machine Learning in Finance - #H2OWorld
This session was recorded in NYC on October 22nd, 2019. In this video, Data Scientist Dimitris Tsementzis shares his views on interpretability in Machine Learning and how such issues apply to the finance industry.
Dimitris Tsementzis is a machine learning scientist for the Central Machine Learning team at Goldman Sachs, which has a broad mandate to apply ML and AI across the firm. More specifically, he is driving efforts to apply machine learning techniques to automated trading as well as the generation of financial insights. Before joining Goldman Sachs, he was a postdoctoral researcher in statistics at Rutgers University, where he investigated interactions between statistics, machine learning, and geometry. Earlier, he completed his PhD in mathematical logic at Princeton University.
Read the Full Transcript
H2O World, is everyone fresh? All right, I’m going to keep it short and exciting hopefully. So, let me just say thanks for having me. I’ve had a long relationship now with other people at Goldman, with H2O and it’s been a productive and highly stimulating relationship, I would say. Yes, my team is the Central Machine Learning team. And roughly, we try to apply machine learning in the context of finance, wherever it can be applied. And it’s kind of an endlessly fascinating, complicated, difficult, maddening, all the good stuff endeavor. And so, what I want to do is I want to give some platform or technology, even agnostic problems that I consider to be fundamental at the intersection of business and maybe theory if you like, and in particular finance.
So, my background is in pure math. I like kind of clear and fundamental problems that get instantiated in various ways. So, I’m going to give you three of those. And in particular, let me make clear it’s problems that I consider to be at the intersection of theory and business. And the intersection of theory and business is something that very much is in the surface and present all the time for machine learning practitioners in finance.
So the first problem, and I think really it is kind of the fundamental problem in various guises when you try and do machine learning in finance or more broadly machine learning that involves time series, is the problem of distinguishing non- stationarity from overfitting. And both of these problems basically have the same outcome and basically look the same in your dataset or in your models, but solutions to them are very different. And this is why it’s a very difficult problem and a very important problem to have a good approach towards.
So, just to kind of set the level with what I mean. So, overfitting of course, is when a model tends to latch onto the noise it sees in historical data and it selects these patterns and expects them to occur in the future. And of course, they don’t occur because they’re idiosyncratic and they only happen once. Non-stationarity and I’m using the term broadly, not necessarily technically, is basically the property of a phenomenon, which means that it follows cycles or trends and it is unlikely to repeat in the same way that we have experienced it in the past. The weather, financial markets, being kind of fundamental examples.
So, I think this is something that we face all the time, a model is not performing as expected or it’s not fitting as well as we thought. And is it overfitting or is there non-stationarity in the phenomenon that we’re trying to model that is affecting it? And you can, sort of a level zero approach is if you think that you’re overfitting, you might apply some type of regularization. If you think that it’s non-stationary, you may want to change your training regime and cut down your training window or use a rolling window or something like that.
But there is really, at this point, there are approaches and there are techniques, but there’s no real fundamental solution. So, non-stationarity versus overfitting and distinguishing between the two, I think that’s one thing in finance and applying machine learning to finance that is going to be a very important problem to solve and have good approaches to, more generally, in any risk-sensitive task that involves applying machine learning.
I just want to offer one suggestion, somewhat cryptic perhaps of a direction that I considered to be potentially fruitful for this problem and that’s the encoding of categorical variables. So, categorical variables can be encoded in various ways. And it’s kind of a very simple thing that you do, but as with very many simple things, there’s still very many open problems related to it. And so, time-sensitive encodings of categorical variables. So, something like target-meeting codings where you have some kind of time sensitivity, I think that’s an approach that could be promising in helping us deal with problems where we don’t exactly know if we’re suffering from overfitting or non-stationarity.
So, that’s one. The second problem is that I believe we need to move, in my opinion, we need to move away from interpretability and into disagreeability. Let me call it disagreeability. And in a sense, you can think of it as a very strict sense of interpretability. We need what machine learning needs if it is going to be successful and applied in contexts that are risk- sensitive, for instance, in finance is the ability for people that are non-experts to disagree.
So, the point is not having something that is interpretable. And what I don’t like too much about interpretability is that in some sense, it refers to oneself. Okay, this model is interpretable for me, whereas disagreeability refers to the other person. So what I mean by this is that the models need to be such, the criteria for interpretability needs to be such and again, in a business and finance context, that the other person can disagree with it reasonably without being an expert in it.
And let me actually make a clarification here, which I think is important because I think in various discussions, especially in a business context for machine learning happens and is a kind of a misunderstanding is, interpretability shouldn’t really be about domain expertise. So, stochastic calculus is uninterpretable to someone who doesn’t know it. But that doesn’t mean that stochastic calculus is uninterpretable as is, and the same applies to machine learning.
Machine learning is not, a machine learning model is not uninterpretable when someone doesn’t know machine learning. The worry that many people have is that machine learning or machine learning models are somehow intrinsically or irreparably, inescapably uninterpretable. And this is not the sense of interpretability that I have in mind. So, we need to move from interpretability to disagreeability. So, let me give you an example based on a slight variation of a real life situation.
There was a model that I had been exploring and it included a feature which was essentially morning or evening. And the feature was one-hot encoded. And the people that I was discussing the model with we’re not a machine learning expert by any means. And one person in this group disagreed fundamentally with one of the things that the model was saying, which is that something will happen in the morning as opposed it will happen in the evening. And that’s what the model was saying.
So, this was a disagreement that we were having with someone who’s not an expert in machine learning at all but who was able to come up with something that of course is understandable to everyone and that was a clear disagreement. And in the end, we went back, looked at it, there was a bit of process. In the end, the human was right actually. There was a mistake, a kind of a bug that meant that what was appearing as evening was actually morning, what was appearing as morning was actually evening.
And that particular incident in this particular context was fundamental in actually getting people to trust the machine learning model, not the opposite. It didn’t have the effect of getting them to distrust it, it had the effect of making them trust them because they were able to disagree with it on a kind of a non-expert level. So, the reason why this had happened was because we had already made the model to be so interpretable to the point where we could point that individual features and measure features, the contribution of each feature. So interpretability, I think the criteria that we’ll have more success in the context of business, in particular finance will be disagreeability, so that’s two.
The third one, the third problem that is a bit fuzzier and a bit less well-defined, but I think is probably going to be more fundamental in the future is the problem of interpreting the life cycle. So again, interpretability is a property of a model, of an individual model, but really what we want to have is also to interpret the model life cycle. We don’t just want to ask the model, “What are you?” We want to ask the model, “Where did you come from?” And what I mean by this is, that there’s various choices you make when you construct a machine learning model. And once the model is in production, then there’s various choices you make to develop it or change it.
For instance, you might increase the dropout rate, you might add a certain kind of regularization, you might begin ensembling when previously we’re using only one model. Now all these transitions may involve models that we want to call interpretable, but these transitions themselves may be completely uninterpretable. And the reason why this is especially a stark, I feel in the context of machine learning is because machine learning models are empirical models. They feed on data, if we have good data, they learn from data and they potentially modify themselves based on the data that they see.
So, why do we ever change them once they’re in place? Why do we change things like moving from non-ensembled to ensembled models? And I think there are various techniques that we use, but there is no good clarity on the phenomena that these techniques are supposed to tackle. And I think this is kind of a general situation that’s going to have to develop and evolve in the business context moving forward. So, moving from an assortment of techniques to clarity on what phenomena these techniques are actually trying to tackle.
So, the three problems that I gave, overfitting versus non- stationarity, interpretability as disagreeability. It sounds slightly worse than it is, disagreeability in a positive way, falsifiability maybe for the more philosophically-inclined. So, interpretability through disagreeability And the third one, where does the model come from? Not just what is the model telling us?
So, these three problems, I think for the practitioner, for the machine learning practitioner in finance in the business context are already appearing and are going to be very important in the future. All right, and let me wrap up. I said, I’m going to keep it short, hopefully exciting by saying that, H2O’s motto is democratizing AI. I feel a less glamorous variant or a less glamorous sounding variant, which sounds more glamorous to my ear is standardizing AI and part of that is certainly being done by things like driverless for example.
And the reason why I selected these problems is because they will all benefit from standardization. And it will be very interesting to see to what extent practitioners using tools will be able to standardize approaches to these problems. And I think if that happens then, on the business side and again, particular for finance, we’ll see machine learning moving to the next level, so to speak.
So, that’s it for me. Thank you. Thanks very much for listening. I can take questions if there are any. I can’t actually see anyone but I’m assuming there’s someone there.
If there are questions, I’ll come around with the mic. There you go. There’s one here.
That was a good summary, thank you. What’s been your experience with business side getting to know the storytelling from your team? How have you adapted over time?
Of the results basically. You work on a model, you create a big project. The early example you gave was excellent, trying to bring it at a non-tech level so that the user could spot the error, but how have you evolved your storytelling over time?
So by storytelling, I understand-
The results interpretation.
Look, I think to be honest, I’ll speak for myself here, for me. I’m just so excited, so incredibly excited when a machine learning model works, when the data is there and it’s works and it’s training and it’s in the data stream and it’s producing predictions that I just feed on that excitement in how I present the model, that’s really the story. And I think AI and machine learning again, with a disclaimer in the context of finance, I think is at the stage where the storytelling really can feed from the originality of the successes.
So, I haven’t really thought deeply about standardizing that part. So for now, the excitement is fuel enough, so to speak.
I think it was a good talk. In addition to the three things that you mentioned, can you mention any specific challenge when it comes to the capital markets implementation within the machine learning?
Well, I’ll mention one, maybe, that sometimes you don’t have a lot of data. Sometimes there’s just isn’t a lot of data. And when you couple that with non-stationarity, then you get into this twilight zone of whether you should even be thinking about machine learning at all. Because if you combine non-stationarity with not a lot of intrinsic data and we’re talking smaller data and overlay that with quality of data and things like that. That’s definitely something we have felt.
Hey, it was a great talk. I have a simple question, how you bring machine learning and AI into a CICD and software development life cycle from your development to all the production. So, what is your sense on that how you manage those high-level?
On how to move, say from the exploratory phase to productionizing?
Again, we use H2O, no, I mean, look, very good question, very difficult question. I think in my experience, the fundamental thing that needs to be in place is for the code that you use for exploratory data science or data analysis to be reusable in productionizing the models. And that sounds easy in that, okay someone’s writing some code, they just copy paste it into some production machine and then it just runs. But in practice, I guess, especially in the context of finance, when you have risk controls and moderate risk management, it’s not that simple.
So, I think having a platform in place that minimizes the amount of code that you need to rewrite when you move from the exploration to the productionization phase. I’m saying platform because it’s not about the code itself that I think is totally fundamental.
Time for one last question over there.
Hi, I had a fundamental question. For example, the AI model general case is the output is randomly. How do you understand, how do you use, how do you plan in this situation?
When I’m sorry, when the model is a random, you said?
No, the output in general case, machine learning algorithm the output is random, in general case.
How do we know that it’s not random? Or what do we do when it is random?
No, in general case, it is random. It is not random because you set the random seed.
Oh, you mean when the model is trained based on some randomization?
I’m sorry, I’m not understanding the question.
That the algorithm, the machine learning algorithm, the output, each wrong, the output is different.
What would be an example? Can you give me a concrete example?
For example, autoencoder for the H2O autoencoder, each wrong, the output is different. The reason it is constant because you set the random seed that output, it is constant. Otherwise, their output is random.
Yes. So, I think that’s an issue of trusting the training procedure. There’s a lot of randomized things you can do while training like dropout, randomly dropping, nuzzling your own network.
No. The output it is, actually it is, what do we say, is reduced or fade, but they come out and make that output of the model, it is constant.
That’s what I’m saying. But at the end of the day, it’s a question of trusting your training procedure. So, if you trust your training procedure, at the end of the day, even though the training runs are going to be different, you’re going to end up with some fixed coefficients essentially, some fixed weights and then you can replicate your inference. So, I think that’s a question, if I’m understanding it correctly, look, let me say.
Dealing with a random phenomenon, the phenomenon is truly random, then machine learning is just as worse off as anyone else. If the question is, how do we have confidence when we use randomized processes in our training? Again, it’s about trusting the training process and exploring how it actually performs life. That’s what I’m saying.
Thank you, Dimitris.
Thank you everyone.
Thank you. Thanks.