Return to page

Cyber-Security & Anti-Money Laundering

 

In this episode, Sanyam Bhutani interviews Dr. Ashrith Barthur, Chief Security Scientist at H2O.ai. As you can guess, they talk all about cybersecurity and AI, AI broadly speaking in this episode. Ashrith has a background in cyber security and has done a lot of interesting research in the field, he’s also currently doing applied research at H2O.ai. This is a first on this podcast series: They discuss about cybersecurity generally speaking, and its applications in AI, including anti money laundering and the applications that H2O is working on in the cybersecurity domain.

 

Read the Full Transcript

Sanyam Bhutani:

Hey, this is Sanyam Bhutan and you’re listening Chai Time Data Science, a podcast for data science enthusiasts where I interview practitioners, researchers, and Kagglers about their journey, experience, and talk all things about data science.

Hello, and welcome to another episode of the Chai Time data science show. In this episode, I interview the chief security scientist Dr. Ashrith from H2O.ai. As you can guess, we talk all about cybersecurity, and AI, AI broadly speaking in this episode. Ashrith has a background in cybersecurity, and has done lot of interesting research in the field. He’s also currently doing applied research so to speak at H2o.ai, which of course we talk all about.

We discuss about cybersecurity generally speaking, and its applications in AI. This is I know a first on this podcast series, so I’m really excited to be sharing this with you. We also talk a lot about anti-money laundering, and the applications that H2O is working on in the cybersecurity domain.

If you’d like to know more about all of these amazing things, actually we’ll be doing a lot of webinars soon. Again, please scroll down to the show notes if you’d like to check them out. For now, here’s my interview with Ashrith, all about cybersecurity, anti-money laundering, artificial intelligence, and applied AI in this domain. Please enjoy the show.

Hi everyone, this is our first unique on the series, where I will be talking all about cybersecurity, but I’m on the call with Dr. Ashrith. Thank you so much Ashrith for joining me on the Chai Time Data Science podcast.

 

Ashrith  Barthur:

Thanks Sanyam. Maybe we keep the doctor part to people who save lives, and you can just call me Ashrith. That would be fantastic. Thanks for inviting me for Chai Time.

 

Sanyam Bhutani:

Awesome. Now, I want to start by talking about your background. I’m curious how did you discover your passion for machine learning. You followed a learning part in cybersecurity and research. Where did machine learning start to come into the picture for you?

 

Ashrith Barthur:

That’s actually a very good question, maybe it’ll help out people who are seeking something similar. After I finished my master’s, I had this real itch for research. I wanted to do a lot of research in the field of cybersecurity. It so happened that a lot of research that was actually being done was very operational. In the sense, someone’s trying to hack you, so we just prevent you kind of a thing. What I wanted to do was much more analysis, which was can I use algorithms? Can I use mathematical or statistical models to actually detect these things?

That’s how I ended up at Purdue, and my advisor Dr. William Escleveland is one of the really well-known network security researchers in the field, from a statistics point of view. What I did was always very statistical. We used Random Forests, and we used SVMs, and these kind of things. But it was never put under the umbrella of data science, and to be very honest, when I was in school, the concept of data science had not evolved, so this just start analysis analytics, and even when I was looking out for a job, it was like, “Oh, you know, and analysis job would be great.” Is how we were talking about it.

But I got a bit of an exposure to [inaudible] and then the term data science came in, and yeah, that’s how I probably moved towards the field.

 

Sanyam Bhutani:

Okay. Is it common in your field for an outsider talking about statistical analysis, is that common in cybersecurity?

 

Ashrith Barthur:

When I started, it was definitely not that common. There were only a few people who had actually published papers, using the approach that I had spoken about before me. They were the pioneers of people who did it. It still not common, because one of the things about cybersecurity is it’s always very reactive, which means that you’re always firefighting. You’re not thinking in the future. It essentially means that it’s more operational than algorithmic. It’s still not uncommon is what I would say. I mean it’s [inaudible] common is what I would say. Sorry. Yeah.

 

Sanyam Bhutani:

Okay. Now, before we talk about the intersection, can you tell us more about your passion for cybersecurity? When did that happen? And asking for a friend, can you retrieve someone’s Facebook messages if they have been blocked?

 

Ashrith Barthur:

I think these kind of questions have been asked quite some time. I would lik to say I think these kind of things are not something that you’re supposed to do. But if people are motivated enough, I think you can do these things. Having said this, one of the things that I was always very interested in is from probably a generic point of view, I just wanted to see how things, how competence come together, how things break apart, and what are the weaknesses kind of a thing.

It was not necessarily form a software point of view. It was from a larger computers point of view. Then when you get into like a field very specific to trying to see vulnerabilities in software, then it becomes much more interesting. That’s where I got into the idea of cybersecurity itself, which eventually lead to my interest in actually be like an active penetration tester in my master’s. After that, going to much more research, in the field of cybersecurity, try and identify people who are doing these kind of things, something that I did much before as a part of my master’s program, and my internship, and all those things to try and see if mathematical models or statistical models for that matter can actually identify this behavior.

Or can we build models that can identify this behavior? It’s how it’s in a progression matter. But coming back to your friend’s question. I think all technologies are vulnerable, so if this is something as an experiment that you would want to try.

 

Sanyam Bhutani:

We’ll be back after a break dear audience. Kidding aside, now I want to talk about where you’re still following your passion at H2O.ai, what problems are you working on? What does a day in your life currently look like?

 

Ashrith Barthur:

That’s actually a pretty big question. Maybe I’ll break it a part into a few things. One of the primary … I mean the general umbrella that I work in is the area of identifying malicious behavior. I started off with the field of cybersecurity with focus heavily on network security, trying to identify malicious behavior through network traffic. What that eventually led us to is to add many different kinds of behaviors into my portfolio. Right now, I also look at electronic fraud. I also look at money laundering as a part of this larger scope of things that I research in.

We also look at other kinds of malicious … like other kinds of state acting malicious behavior as a part of this entire portfolio as well. We essentially build models for all of these things for different organizations. Sorry, I think I forgot the second question.

 

Sanyam Bhutani:

What does a day in your life currently look like?

 

Ashrith Barthur:

In a day, I would actually say a big part of my work is still testing models, is still building and testing models. Is also researching what are the different approaches that I can use, but one of the things that I am very passionate about if you would want to identify something in a much more smaller … like a sliver of cross of whatever I’ve said is to actually see everything that I build, like a model that I build to be taken to a point where it’s completely applied.

Because mind you, when I’m building a model, if I build a model and it’s a fantastic model, I’m only satisfying the data scientists in me, or the analyst in me. But if I take that model, and put it into a solution, and solve someone’s network security problem. Like say an analyst who’s sitting on the other end, and this analyst, he or she has a lot of false positives in their network information, or the network attacks or alerts. If my model can solve that … if it can reduce the number of false positives, give a much more accurate information, then I’ve solved a real world problem.

I’ve actually learned some applied problem solving kind of thing. Getting things end to end literally from science here to applied solutions is what I’m actually very passionate about in getting done. That’s what I focus on in my day to day activities.

 

Sanyam Bhutani:

Where does H2o come into the picture? H2o to the world is the AutoML company. Where does AutoML come into the picture so to speak?

 

Ashrith Barthur:

Yes. I’m guessing you’re talking about AutoML when we say AML or is it…

 

Sanyam Bhutani:

AutoML because you mentioned [crosstalk] you’re building models and AutoML is supposed to replace you.

 

Ashrith Barthur:

Oh, yeah, yeah, yeah. Of course. One of the big things that we always have to focus on is the idea that eventually systems will take over human beings. Essentially what you have to do is you have to train the systems enough to understand how you’re thinking. Because mind you with the numbers of attacks that happens, with the number of alerts that are generated, with the number of things that are becoming digital or electrotonic for that matter, and systems being interconnected, we will not … regardless of what [inaudible] the population explodes, we will not have enough manpower to identify all those behavior. Which essentially means that a lot of things needs to be handed off to systems.

Which also includes intelligence. One of that intelligence aspect is the very kernel of how H2O fits in, or H2O’s AutoML for that matter fits in, or even if you want to be much specific the whole aspect of how driverless AI fits into the [inaudible]. It gives us the ability to build models, it gives us the ability to refine the models, tune the models for different kinds of variations and behaviors. Be it change in data, be it periodicity, seasonality, any of these things. I think that is … and the fact that you can get a model from conceptual to actual production very quickly, gives you the ability to just come a bit closer to that situation where I was telling you about where machines should be doing a lot more work to support this.

 

Sanyam Bhutani:

You’re on the side of robots in the robots versus humans race?

 

Ashrith Barthur:

On the contrary, I’m actually [inaudible] to be very honest. I like to do things with my own hands kind of thing is how I approach it. But the reality is that these things are going to explode exponentially, and you are better off handing it off to an intelligence system than trying to make things work and losing out in the process.

 

Sanyam Bhutani:

I think it’s similar to creativity in general. Before the call, you were making coffee, and coffee make us unautomated. You don’t need to manually figure out that’s what maybe data science will be in a few years now.

 

Ashrith Barthur:

I would agree. I would agree, but just like how there are spaces where coffee handmade is slightly better than coffees that are made on machines [crosstalk]-

 

Sanyam Bhutani:

For the very niche expert [inaudible] very exclusive club.

 

Ashrith Barthur:

I completely agree with you. Although, I do like machine coffee I’d have to tell you that. The same way a data scientist for that matter would be able to polish the pre-made model by a system, or by an automated system to just give it that edge to be better. That is how you could see your work as, or as you yourself as a data scientist, you could see your work the same way.

 

Sanyam Bhutani:

Okay. Now coming to an application that I think you recently interest in, you’ve been working on anti-money laundering. Why is it even a thing in 2020? Everything’s digital, how indirect money being laundered in 2020?

 

Ashrith Barthur:

This is a fantastic question that you put out, and it might even come out as you’re working in cybersecurity, and then money laundering. But what [inaudible] I’m sure that problem be your next question.

One of the things was that I could see while I was exploring a lot of behavior, I saw a lot of behavior that I was looking while I was studying network behavior, was that there were a lot of similarities in papers that were published about fraud and money laundering, and these kind of things. I started to explore a bit more about that. Then what we did is we engaged with a few clients to work on these aspects as well.

The thing is almost always people want to save up on what they have to pay the state. Of course, this is not ethical, not moral, any of those things. But they just want to try and evade the system as much as possible. Which is one of the reasons money laundering exists. In old-school, it was very similar, where you set up shell corporations in different countries, then just move the money around, and when you’re done. In the current day, it’s because it’s digital, it’s fantastic.

When I say fantastic, I say it with a bit of responsibility in terms of how money laundering is happening. The way money laundering is done using electronic means itself is it’s so well-done that it becomes really, really difficult for identifying, and because the crooks are also growing with how … It’s like a generational change that they’re undergoing as well. Which still makes it an important aspect for today’s … something that needs to be focused on today.

Yes, which is why 2020 is still relevant for it.

 

Sanyam Bhutani:

How do you end up [crosstalk]-How do you end up automating the process? Since you mentioned it’s beautiful in a sense how people evade this beautiful and not real fashion, but it needs a lot of human expertise, how can we ensure that whatever model that we or driverless AI builds are robust enough, are of the human expert level?

 

Ashrith Barthur:

Fair enough. That’s I would see that as like a much more … like a technically rooted question, right? The idea is that what you’re trying to do is you’re trying to identify behavior. Now, anything out of the ordinary, be it an attack on a network, or be it you siphoning out money, is going to Stand out. But the important thing is with reference to what is the big question. If you want something to pop out, or be similarly visible compared to something else, the thing that you’re comparing to needs to also establish like a baseline for something to pop up.

The models that we build essentially … sorry, the features that we build essentially does that. Go ahead, I think you had a question [crosstalk]. Okay. Yeah. The features that we build are so tuned to actually pop out unique behaviors that you could have normally not seen. I mean for example, if an investigator were to look at behaviors in the last week, he or she might not see anything interesting. But now, if you were to look at a long term shift about how much money or how much transaction a certain account was doing, there could have been a steady shift, or there could have been a spike that happened much before, a week before, which the investigator might not have sights on. That is essentially what the model captures.

The model is capable of going long, far, and wide, and deep, which for humans, we’re limited. We’re not, not smart, but we are limited, because there is only certain amount of information that we can process at a given point in time. But that limitation does not exist with these features or models or systems. And essentially, when you add these features, the models have the good capability to pick these things up. That’s where driverless AI comes into the picture. You have these fantastic features that are identified for use cases, and the models and driverless AI are able to pick up on these features.

And then your model is highly likely to be a good predictor of what money laundering is. That’s essentially how we go about building a model.

 

Sanyam Bhutani:

This is actually pretty interesting because someone would assume that a human expert needs to spot those areas and you talk about the model being robust enough to actually see through things that a human might miss through the fine grains?

 

Ashrith Barthur:

Yes. I would also be honest enough to say that there’s not necessarily a negative connotation as well. Because we as humans, all of us, like you, me, there is a limitation in the amount of information that we can consume. It’s obvious that we’ll miss out on something. The only thing that we’re saying is, “Hey, look,” for example, when you’re fatigued, you don’t necessarily grab … Let’s say you’re reading a book and you’re tired, you don’t necessarily grab all the story that’s coming out of the book. Sometimes, which I’ll always do, I go back about four pages and I start again.

Which it’s the same. But machines don’t have that problem. Or maybe we still haven’t figured out if machines or systems and algorithms have fatigues. I’m sure we’ll figure that out later. But they don’t have that, which is one of the reasons if you can offset these things to a machine, it helps us a lot.

 

Sanyam Bhutani:

Before we talk about where machines are currently helpful, can you tell us more about the data set curation process? Because anomalies happen maybe in the ratio of minus 200, maybe less, I’m not sure. But how do you find the right data in place? I couldn’t find any proper datasets on Kaggle, maybe there were one or two competitions. It’s not a common problem so to speak.

 

Ashrith Barthur:

Yeah. It’s not a common problem, and one of the reasons is because there is … it’s heavily guarded with security, because one of the things is that when you’re looking at these kind of irresponsible behavior, you are necessarily bringing in your organizational risk team, you’re bringing in the state, you’re bringing in many different agencies. Which essentially means that you have to be very, very, very careful when you’re handling this dataset, because it’s got PII, personal identifiable information.

You would hardly fine … I don’t think you would find any dataset available online, which should be how it is, which is how it should be. It essentially means that a lot of this work actually happens iteratively, which means that you look at the data, you learn, you build your first iteration of a model, and then you look at the data again. Try different kinds of features, try different kinds of joins, and then iterate the model to be much better. That’s essentially how we have built the whole, the solutions base for any of these malicious behaviors using driverless AI.

 

Sanyam Bhutani:

Can you speak to where driverless AI’s currently being used? What sectors is it being currently deployed across? Maybe the model’s from driverless or AutoML.

 

Ashrith Barthur:

One of the things that has happened is driverless AI has given organizations that amazing ability to be able to build models, and deploy models without having as many data scientists as they would have earlier needed.

 

Sanyam Bhutani:

I think we need to first clarify for the unknown people, what is driverless AI and how is related to cybersecurity?

 

Ashrith Barthur:

Please. Okay. I think for the people who don’t necessarily know what driverless AI is, driverless AI is this tool that our company H2o.ai puts out, or makes, which basically is a completely machine learning automated tool. I think it’s very popularly called a [Kaggler] In a Box. What it does is it has this amazing ability to tune the model, build better features, and build a whole process iteratively, and then give you the best [inaudible].

That’s essentially how driverless AI works. Now, the way we adopted for the field of cybersecurity, or money laundering, or fraud, or any of these things, or malicious behavior for that matter, is that we tune driverless AI … we rather configure driverless AI with something called as recipes. The idea of recipes is to tell driverless AI that there are certain set of … there is a certain design of features that it needs to look for, or to explore when it’s building the model, which is very specific to a use case. For example, when we’re looking at malicious users, I’m going to tell it to look at historical backgrounds, periodical backgrounds, anomalies that stand out with a certain statistical effect.

Or unique interesting behavior that never existed, maybe logs that are incomplete. In terms of money laundering, transactions that seem to go in circles. For example, in money laundering, there are lots of times when people move the transactions in circles, because if money is in transit, which means that they don’t necessarily have to pay tax for it. Which is one of course that we could do as well. Any of these behaviors is actually encoded like in proper coding. It’s not in written language, it’s actually encoded, and it’s [inaudible] driverless AI, which gives driverless AI the ability to actually build models and engineer features that are very much required for the [inaudible].

That’s essentially how it fits, which is why I would say driverless AI is the kernel in this case where it’s able to build us the model. But once it’s built the model, we call back driverless AI again when we want to model to be built again, or in order to build the model again. But other than that, the output of driverless AI, which is the actual model actually gets deployed.

 

Sanyam Bhutani:

You mentioned about the configuration. Is this a different version? Or is it just a few switches that you toggle to put it into a security mode?

 

Ashrith Barthur:

Oh, no. It’s actually very simple. When I say configuration, it’s the feature, it’s the recipes feature that comes out of driverless AI. The idea is you write a custom recipe for any of the use cases that you’re working in. It could be any of the use cases that I’m working in, it could be … which includes the entire spread of malicious behavior across electronic fraud to cybersecurity, to transactional and money laundering, and all those things.

Or it could also be for things like you want to identify loan default, you want to identify are you going to have customer churn, or any of these use cases. And driverless AI has a capability to ingest any kind of recipe that you provide to adopt that for the use case that you are expecting the model for. You can think of it as a much more unique customization for the use case that you want to use, that you want to work on.

 

Sanyam Bhutani:

Okay. Now this was an interesting tangent, coming back to where is driverless being used, any sectors where we’re currently using it.

 

Ashrith Barthur:

Oh, yeah, yeah. From what I can see, from what I know rather, driverless has been adopted across the entire spectrum of all the customers who are using H2o 3. H2O 3 of course is an opensource product as well. It’s used across the entire sector. It’s used in financial, insurance, manufacturing, supply chain management, I think hardcore security as well, so pharmaceutical and I think … these are the ones that probably pop up in my head right away. I would say that these are group they’re using, but it’s gained the same amount of traction as much as H2O is widespread.

 

Sanyam Bhutani:

Any applications of anti-money laundering that come to mind in this broad spectrum?

 

Ashrith Barthur:

Yes of course. In the sense, you’re saying with respect to driverless AI, right?

 

Sanyam Bhutani:

Correct.

 

Ashrith Barthur:

We’ve built multiple models for the AML use case itself, which is the anti-money laundering. We use driverless AI as the very engine to generate the model, and to be predictive enough for AML. The driverless AI with the AML solution is something that [inaudible] quite a few organizations to solve their problems.

 

Sanyam Bhutani:

Any upcoming sectors that you’re excited about, where we could help with AML or even cybersecurity problems?

 

Ashrith Barthur:

One of the sectors that I’m very excited about is IoT. IoT is a vast space from the malicious behavior, which I always start with. It’s got its influence in the financial sector, like the more payment systems for that example, or autonomous payment systems. It’s also in the field of cybersecurity like IoT is very important. Because you have systems that are not necessarily managed, but that are very critical in the entire operation space. I’m very excited to see how that will come about and how we can work with that from a modeling point of view.

 

Sanyam Bhutani:

To me, this always brings an interesting question. I’m visiting U.S. soon, so I know IRS flags any transactions I think above $10,000 and it’s 50,000 Anna, which is $1,000. Does this problem vary from region to region across … this might be a bad example, but do you see any challenges in shifting from region to region? Or policy changes?

 

Ashrith Barthur:

The thing is you have to understand where the limits come from, right? The limits comes from the idea that a certain country has certain average income, average per capita kind of thing. That essentially plays a part in setting up the limits, but there are other activities, the kind of activities that they’re looking for also plays a very important part in setting up these limits. For example, in Europe, it’s much more stringent. In India, it’s probably stringent as well, because there might be a lot of activities that are seeping through.

In America, it’s much more regulated, there is a lot of … much of the financial sector has actually moved to an electronic footprint, which means that track ability is easy, it’s not a big deal. This will vary region by region, and that in essence is dependent on what the specific agencies, who guide these things, think is a reasonable amount that would be a threshold. That’s essentially what will drive the entire process.

 

Sanyam Bhutani:

U.S., like you said, is properly regulated, maybe relatively properly regulated region. Do you think how does the future to you look like in a region such as Asia, India, where it’s still up and coming technology, internet is still picking up.

 

Ashrith Barthur:

I mean see the thing is, Asia, all parts of Asia, right? Of course, it’s up and coming, there’s no question about it. But you do have to understand that there is a fundamental problem that exists. Is that regardless of what you do, regardless of the fact that you use any kind of technology, if you want the technology to be applicable … Let’s say you want a large kind of electronic footprint like the way Europe operates and the way America operates, you would have to deploy the same kind of electronic footprint across Asia.

China and India have some innovative banking methods that have come through, like SMS-based banking, and all these things. Transactions that can be monitored as well. I think India is coming with the one unique ID where everybody starts to … you’re able to unify any kind of detection if that helps.

What matters is because the numbers are so big in Asia, it’s not a matter of will we be able to? It’s a matter of when will we be able to? And it’s only because the technology has to be applied everywhere, and that’s why I think they’ll be much more I would say effective monitoring, and also transparent monitoring. Because both sides can se what’s happening. If I am falsely flagged, I’ll have the ability to question that. But on the other side, if someone thinks I’ve actually done some kind of illegal transaction, then they’ll have enough information to flag me for that as well.

It’s a problem of numbers, not necessarily will we be able to do it so to say.

 

Sanyam Bhutani:

You’re talking about being flagged and H2O already has AML in our products, how interpretable are these models that we just talked about?

 

Ashrith Barthur:

Oh, yes. One of the things that we do when we are building these very unique specific use cases, or specific solutions is that we work very closely with the risk groups of the financial institutions at the organizations that we work with. From cybersecurity fraud, money laundering, any of these things, right? We actually engage with the risk team, internal risk team, because one of the things that they also have to do is … let’s take money laundering for a quick example. Now, when a transaction that seems like money laundering, for a matter of fact, is visible in a bank, it needs to be reported to the state.

It’s a process that they have to go through. Which essentially means that this information is not something that you are keeping it for yourself for your knowledge, but you have to provide it for others as well, and it must be equally informational for them. Which is one of the reasons what we do is we customize driverless AI to build unique statistical features, so we do a little bit less of feature combinations. We do a lot more of statistical features that go into the model, and because these statistical features are naturally interpretable.

For example, if I tell you the average amount in a month for a user, you know what it means. I mean you know it’s aggregation of everything divided by the number of records, simple as that. It’s very intuitive, easily interpretable. That’s something that we strive for in all of these use cases, and let me give you a quick simple example, if you have the time for cybersecurity as well. Let’s say you are the risk officer working in association with the CISO, the chief information security officer of an organization. Now, essentially what happens then is that if your systems are breached, you’re duty bound to inform the state as well, inform all the required agencies that you’ve been hacked, if there is a certain loss of data, and basically the customers that you lost.

You also have to inform your customers whose data seems to have been lost, which essentially means that you have to make all these parties not probably to the same level, but you have to make all these parties understand what actually happened. And if I’m using AI, it behooves me to actually be able to explain these models through explainable features. It can’t be just feature combinations, it has to be explainable features. Which is why we adopt the same approach, we customized driverless AI through recipes of course, for these kind of features so that when we’re looking at malicious behavior specifically, the thing is the models are super transparent. You’re able to explain everything in the model.

 

Sanyam Bhutani:

Now, broadly speaking, or maybe even naively speaking, how do you convince such a regulated industry banks to use AI something that’s hard to sell so to speak?

 

Ashrith Barthur:

Fair enough. That’s a very fair question I would say. This is the thing, AI or ML for that matter is this fantastic tool that companies who got good resources have adopted it have been very successful. Now, one of the things that is … and these companies who have been successful, you can identify them literally, because these are companies who have large amounts of data. Now, AI, very different to like … different approaches of making models seeks large amounts of data to be predictable.

Now, which essentially means that AI is more observational than a concept called emergent, where you’re actually getting knowledge out of what the model is detecting. There is a slight difference there. It’s very experimental and observational while I would say ML is more experimental and observational. While other forms like old school [inaudible] is more emergent. The way you have to convince regulators is the fact that you don’t necessarily show them that large amounts of data will give you a much better result. But you show them a process of transparency in how your model is built, in what your model actually is. It could be a very simple [CART] model.

But it has to be transparent. The regulator should understand what it does, like how does it make its decision? And the very fundamental aspect in a model are the features. These features must be understandable. That, I would say, is the golden rule to get a regulator to actually understand how AI makes a difference is to make the features understandable, because then everything falls into place, and it’s intuitive for them to understand.

 

Sanyam Bhutani:

I think that’s where also [Auto Doc] comes into the picture, Auto Doc is already integrated everywhere, so that gives out a fully regulated, regulatory friendly so to speak document for anyone that wants to investigate or prove into this.

 

Ashrith Barthur:

Yeah. I mean Auto Doc I think [inaudible] and the features. I think that would be amazingly useful as a consumption device for regulators. Because it keeps the model very transparent, it’s able to take in all the feature sets that we have put in, and built a story around it. How much ever bigger story it can build. And then essentially provide that as information to the regulators for consumption.

 

Sanyam Bhutani:

We were also talking about being fatigued. As a data scientist, you’re not most excited over the writing documentation. You want to move it all to the next model building task, and I think it’s where the automation is helpful.

 

Ashrith Barthur:

It is. it is. I would agree with that. But the other aspect … this of course is not something that people who build models like … and that includes me as well. Is to document everything, it’s like this, you’ve gone and done something really cool, and you’re like, “I don’t want to document this.” I mean it’s cool, that’s where it ends, but that’s not how it is right?

What I’m doing, the model that I build is not for my satisfaction. It’s for the satisfaction of someone else who sought my help, which means that our customer, or clienteles, organizations who work with us, it’s for their consumption, which means that we must be very thorough in what we give them as information.

Which essentially means that because a customer came to me for help, I must provide him or her with all the information that they can, to understand that what we have built is something that they can trust, what we have built is something that they can rely on, and what we have built is robust enough to solve their problem. Which is one of the reasons although we don’t like it, we have to force upon ourselves to get these things done.

I think that’s where Auto Doc kind of helps us [crosstalk] it fills up the large enough space, you can probably browse through it very quickly, and tweak a few things, and that should probably solve the problem.

 

Sanyam Bhutani:

This has been an amazing interview full of many great insights. I know a lot of the audience is ML students, for them who are excited about machine learning, who have a good grasp of it, and now are itching to apply to the domain that you’re an expert of, what best advice would you have for them?

 

Ashrith Barthur:

One of the things I would say is try and get your hands on any kind of dataset that you can to familiarize one with your domain, and to familiarize with the data scientist, or with the data science aspect of it. Which is one thing that I do to get an idea of what you do. When you’re familiarizing yourself with the domain, try and don’t focus on the accuracy of the model, focus on what is going into the model, so it gives you a much better valuable result.

I would say focus on that a bit more. The next thing that I actually … I feel very strongly is try and be a full stack data scientist as much as possible, because building these models are cool, building a model is really cool. But solving someone’s problem is much cooler. Which means that if you can build from model to a solution, an entire application, that’s way cooler than building a model and saying, “This is my shiny new object.” If you can follow these two things as guidelines, I think that’ll be fantastic people are starting off.

 

Sanyam Bhutani:

Awesome. Before we end the call, what would be the best platforms to follow your work?

 

Ashrith Barthur:

I think I wasn’t expecting this question. It’s because I don’t necessarily put out information on a lot of platforms. But I think I rarely put things out on Twitter, when there are some good articles. Sometimes probably on LinkedIn, but H2O blog is a very much a good space to seep through what I write, because even though I write a few things, there are people who push me to publish me, so that helps me. Yeah. I think that should be good enough to look through.

 

Sanyam Bhutani:

Perfect. Thank you so much Ashrith for joining me on the podcast.

 

Ashrith Barthur:

Thanks [inaudible] I really appreciate the time and the opportunity. I would say to the audience if there is any questions that they would have, please feel free to drop on the line, and we’d be more than happy to answer them. Thank you.

 

Sanyam Bhutani:

Leave them in the comments. We’ll try to review and leave a reply to your reply.

 

Ashrith Barthur:

Yes. I thought I would never get an opportunity to say that, but yes, leave them in the comments please.

 

Sanyam Bhutani:

Thank you so much for listening to this episode. If you enjoyed the show, please before sure to give it a review. Or feel free to shoot me a message. You can find all of the social media links in the description. If you like the show, please subscribe and tune in each week to Chai Time Data Science.