Building ML Models to Detect Malicious Behavior
This meetup was recorded in San Francisco on January 23rd, 2020.
In this talk, we will explain the fundamental design and approach of looking at malicious behavior, some of these approaches are used in identifying, electronic fraud, attacks, and anti-money laundering. Along with example features that are members of this fundamental idea. Additionally, we will touch upon designs of embedding solution in your application â so that you could be proactive (catch the act while happening) or reactive (catch it after). Finally, we will also talk about design approaches to modeling making models more effective.
Speakerâs bio:
Ashrith Barthur is the security scientist designing anomalous detection algorithms at H2O.ai. He recently graduated from the Center of Education and Research in Information Assurance and Security (CERIAS) at Purdue University with a Ph.D. in Information security. He is specialized in anomaly detection on networks under the guidance of Dr. William S. Cleveland. He tries to break into anything that has an operating system, sometimes into things that donât. He has been christened as âThe Only Human Network Packet Snifferâ by his advisors. When he is not working he swims and bikes long distances.
Read the Full Transcript
Ashrith Barthur:
All right. Yeah, as Bruno briefly introduced me, I work as a security scientist, actually as a principal security scientist out at H2O. My primary job is to basically architect and build solutions that include models which try and detect malicious behaviors across the internet. It could be anything to do with financial fraud. It could be to do with electronic fraud or networks or anything of these things and the general idea that I want to take today across my slides is how do we go about building out? What are the talk processes of actually going it? What are the key components that we look for and what is important for you to look as well in case you are building models for malicious behavior?
Quick question. How many of them here out there are data scientists or ML engineers who do this on a day to day job? Great, okay. Quite a few number. How many of you guys out here work in the field of security? A wow, okay. Fair. That number usually is never bigger than two. Iâm actually happy itâs about five so yeah.
Quick set of slides about H2O. How many of guys actually know about H2O? Curious. Okay, so this seems pertinent. We are based down in South Bay out of [inaudible]. Weâve got a good amount of money and then weâre about 200 plus people in the company, a good number of [inaudible] out there who help us build models. The company comes up with a lot of engineers, a lot of scientists who work in many different areas. We are primarily a mission learning platform company and we build quite a set of tools which help people build mission learning models for many different use cases and I primarily work for the security solutions group.
Now, weâve got offices based out of, across the world, good part of the world. Letâs put it that way. In case anyoneâs interested, thinking of joining us, please contact Luna or me. Be happy to help you guys out.
A set of our clients, how good we are in the community and that should be good and we primarily have four different products that we, I would say that the company puts out. Driverless AI is our current flagship product which automates a lot of machinery and processes that you usually used to do. H2O and Sparkling Water are actually the open source variants of mission learning model building. H2O Q is slated for the end of Q1. Itâs actually a new platform that weâre building which helps us build models much better with insights and modeling together coming under one platform. Thatâs like an integrating platform that we are currently building which brings in other things for ⌠You can build models and insights at the same time which seems to be like a pain point currently in the mission learning field.
All right, so thatâs set of slides about the company and this is a general structure of my talk. Iâll talk about the problems that usually we look at when it comes to identifying malicious behavior. I look at what are the kinds of data you necessarily need to have, how do you look at the modeling process itself, the featured design and the modeling design itself. Now, having said this I would urge you to have ⌠Call out and ask me any questions that you have at any point in time. Completely okay with the idea. If you want to hold it until the end and get this set of slides and see if your questions get answered, fair too. Either way works for me. Donât worry about it. Iâm happy to answer anything.
Generally speaking what is that weâre looking at malicious behavior? What are the kinds of malicious behavior that weâre looking at? There are many different kinds of malicious behaviors. You could have some of them that are problematic for you that we have listed on the fields right there or they could be other things as well like someone breaking a traffic rule. Thatâs not necessarily the kinds of things that weâre concerned with when it comes to the modeling that weâre building right now. We are essentially ⌠Actually, give me one second. Can you guys look at the screen from there? Is it possible for you guys? Maybe you guys might want to scoot in or come ⌠Okay, great. Thatâs fine.
There are various sets of malicious behaviors that we actually look at when we build models. A good number of them comes in the field of electronic fraud where there are many different kinds of frauds that actually happen. There are people who have stolen access to accounts and then theyâre doing transactions in someone elseâs name, then thereâs usually impersonation and phishing. This is one of the biggest use cases that we deal with where I sent you an email saying Iâm the king of Nigeria and I needy your account and that usually how it goes. Iâm sure everybody knows about it. Then the usual, the credit card fraud. Someone use your account in the middle of the night at 3:00 AM. You wouldâve never done it because you sleep at night. Well, thatâs a problem. Then you have the usual transaction fraud. I wouldâve actually gotten access to your account. By some sort of [inaudible] means I wouldâve gotten onto the docnet, probably gotten access to some of your account information and then I come back and use it before the bank and you figure out Iâve drained all the money out of your account.
The last one which is personal but unknown transactions and phishing is actually one of the most interesting use cases that we deal with. Essentially what happens in this type of malicious behavior is ⌠Actually, Iâll put it a different way. What a lot of people have figured out is that stealing your passwords or stealing access to your system does not make sense because at the end of the day you get discovered. If I were to steal your account usernames or passwords at some point in time, 10 days later, 20 days later, a month later someone will figure it out and then there account gets blocked. What a lot of people do is they run access to your account through your machine when youâre up so basically all your credentials are captured, kept in your system and then they run it from the same system so as to not create any suspicion to the bank or to you as a user.
Thatâs probably one of the most interesting use cases that we deal with and the way we work in all these things, we work with many different financial institutions and interesting agencies to try and deal with this problem and we usually see quite a few variants of these things but much of these things actually [inaudible] trying and predicting any of these frauds happening around the time or just after itâs happened and thatâs essentially what we build models for and thatâs what weâre trying to solve out of it.
Now, your malicious behavior can be classified in two forms. One of the things is itâs actually criminal in the legal parlance if youâre looking at it that way and the other one is itâs actually if you look at it sadistically, if you look at it merely as an aspect of the data it is not normal. Essentially there is something out of the ordinary that stands out that should stand out. Let me correct it because a lot of to his kind of data tends to seep into the family and donât necessarily stand out and thatâs essentially what weâre trying to find out. Itâs not the first problem that we essentially deal with. Thatâs for a lot of law agencies and law enforcement agencies to work with. We deal with the second problem and by dealing with the second problem we tend to give enough information in case someone wants to go towards legal procedures.
Now, one of the most important things that you need when youâre looking at malicious behavior is the data that actually supports the kind of modeling that youâre doing. A lot of people work in security right now and do you guys have the best kind of data to deal with in terms of any kind of identification that you have to do in a tax or are their shortcomings that you actually face?
Â
Audience Member:
There are shortcomings.
Â
Ashrith Barthur:
And what kind would that be?
Â
Audience Member:
[inaudible].
Â
Ashrith Barthur:
Fair enough. This is not necessarily a problem that exists in the field of security. This actually exists in many different fields. Your data must compliment the kind of model that youâre trying to build so in a very fundamental sense you have to figure out what youâre trying to detect. Are you trying to detect individuals who are showing up fraudulent behavior or are you trying to figure out clusters or put it in a very simple sense, are you trying to figure out groups of accounts that are operating in a certain way?
Let me give you a very simple example. You could have ⌠Letâs for example say that a bank tends to have some kind of vulnerability and using that vulnerability a lot of accounts might actually be compromised so when you compromise a bunch of accounts then eventually what it leads to is all those compromised accounts tend to behave in the same kind of way. In those situations are are expected to try and identify the clusters of those accounts that actually stand out than your normal banking accounts or in some cases where only one individualâs credential has actually been stolen then youâre looking at those individual credentials to try and identify this is the on that actually stands out. Thatâs essentially one of the things. Do you have enough information to identify an individual or do you have enough information to actually identify the cluster of the groups that youâre looking at?
The next thing is how quickly do you want to determine if there is a breach or if there is a malicious behavior thatâs actually happening and this could very well depend on your risk team. This might not be up to the data science team or this might not be up to the business team as well. This usually comes from your risk organization where they say, âWe are okay by taking a damage for about a dayâ, or âWe are okay to take damages for about an hour or a minute or so.â Based off that, your models will be designed in a way where youâre predicting activities when itâs happening or youâre predicting activities when itâs just happened or at the end of the day.
A lot of use cases that we deal with and some of the use cases ⌠Iâll give you simple example. One of the use cases that we deal with is fast credit card transactions. Fast credit card transactions is where a person has actually gotten access to your credit card information, not necessarily the physical card itself but the actual credit card and they can actually make a copy of the card, they can go to a different location, extract as much money, as much as the ATM allows and quickly get away with it. Essentially what happens is under those situations you donât necessarily, you canât actually identify it immediately but you get to identify it at the end of the day. There are different kinds of behaviors that you can identify immediately. There are different kinds of behaviors that you can identify in batch.
Now itâs also a negotiation between how much of a risk you hold and how quickly can your models detect? Now, your models might be super good. They might be extremely good with a very high ability to detect these behaviors but the other problem that stands out is is your model super heavy to actually detect these things? If you have 10 thousand features or if you have a hundred thousand ⌠Canât be hundred thousand features, I would hope not. Maybe a thousand features then you would have a problem because the model would take much more time in identifying these things so at those times thereâll be a bit of a compromise where your risk will have to say, maybe five minutes and the model will have to scale down in terms of the features that you actually put into the model. Thatâs essentially the question. Is the data [inaudible] actually at line speed?
Then comes the much deeper question, have you joined all the relevant tables to make the decision? If not, can you do it at line speed? One of the bigger ones is how robust is your model? Now, we understand that the model building process itself has been commoditized and what do I mean by that? Your model is not the loved one that you keep it to yourself anymore. You build a model every day. As [inaudible] behaviors change you tend to build newer models, you tend to build it faster and you tend to implement that really quickly and thatâs essentially what is happening right now. And you actually have to do it because the terrain in which these kind of behaviors actually exist change very frequently. It could change within a day, it could change across a week so that means that you have to be quick enough to build newer models to be able to try and detect this behavior but for you to actually have the kind of robustness that you expect in the model, you need to have enough data. Is the data big enough? Is it long enough? Do you have a good enough history for about eight months a year? Usually eight months a year is a good enough number depending on the kind of behavior that youâre looking for or do all you have is just three months of data?
We routinely see a problem between different agencies we work with and this is not something that ⌠I kind of write it out as something that I just think about as an actual problem but this is routinely a problem that we face when we work with different agencies. A lot of agencies do not actually have access to data more than three months which means that the data that weâre working on is in much lesser time which means that we actually have to build robustness into the model within that period of time.
There we go.
Now, youâve got your fundamental sources of data. Now are there additional sources of data that you can look at which would probably provide you supplementary information that you can use for your decision making. For example, can you look at web server logs, can you look at network logs, can you look at your application logs, can you look at your account access logs? Now again mind you, adding a lot of information can give you a fantastic model but that does not mean that gives you enough speed. The ability to be actually proactive in trying and identifying this behavior. You might be seeing a subtle idea that Iâm trying to throw that you have to be really fast and that is the kernel or the crux of how you deal with malicious behavior. Itâs not a model where youâre trying to figure out, does this person get a loan or not? The person problem can wait for a day if someone wants a loan.
In trying to figure out this behavior there is a lot of loss that is associated with it which means that you are on the clock every time a certain behavior actually happens which means that you actually have to make very conscious decisions about how big your model is, how many features are going in, what is my risk factor, how much of data I can actually bring in and that essentially puts you in a good and a bad squeezy spot where you have to make these decisions before things go into implementation.
Iâve come to the next set of the actual what we do here is features. Now one of the biggest things that you have to decide when youâre building up features or rather one of the biggest things that you actually have to make a decision when youâre building models for malicious behavior is trying and building the right set of features that helps you identify whatever youâre trying to identify in this case and the features can be largely divided into two groups. These are not purely exclusive. They kind of overlap with each other. There is the individual and then there is the population. Essentially the reason why youâre building this is youâre trying to build a population characteristic. Youâre trying to see how are all the members of your financial organization, of your agency that youâre working with, whatâs the normal behavior that they usually carry and then you are trying to identify what is the individual behavior that everybody carries?
Now when youâre using these two as a comparison metric youâre trying to see how much of a variation is there from the individual to the population, from population to letâs say three months before or from [inaudible] to three months before or the week before or the day before. Thatâs essentially how youâre trying to identify if there is a variant across here. There are three primary very, very fundamental feature families that you look at for these individuals and activities.
These three feature families are attributes and activities of course. Youâre also looking at interactions, interactions between entities. It could be a population, it could be a systems interaction, it could be an individualâs interaction and thatâs essentially what youâre building. Finally networks, youâre trying to see how does a larger system, how is it actually working? What are the kind of interactions itâs actually leading to egress and ingress. Iâll give you a few examples and probably that will illustrate these things much better.
Letâs say youâre looking at a single individual and youâre trying to establish a baseline and when youâre trying to establish a baseline youâre trying to find out what are the kind of behaviors does this person naturally espouse. Youâre trying to see does this person naturally transact using cash or is it usually credit? Thatâs essentially to establish what is the kind of instrument that this person usually uses when theyâre trying to do transactions? Then youâre trying to see what is their natural set of interactions? What is the natural course and set of interactions that they do on a daily, on a weekly, on a monthly basis? That essentially tells you how periodic or how non-periodic is this person in terms of their activities.
This is actually one of the bigger clues that you get when youâre trying to move to actual interactions between different people. Then youâre trying to see what is the geographical identity and the geographical identity, I donât mean race. I actually mean what is the IP address youâre coming from, what is your usual activity location, what kind of activities do you conduct from a certain kind of location for example. Let us say Iâm applying for a new credit card. I would probably not be applying from an airport terminal. I would probably be doing it from my house or I would probably be doing it from a residential location because those or the kind ⌠There is a certain association with when you do these kind of activities and you also try to see the periodicity or where you come from. Is this person naturally residential? Does this person conduct their activities from the office as well or usually when theyâre mobile and essentially that is what establishes the kind of probabilistic score that you can associate with in terms if you know how much of a risk you can provide.
How does the social identity fit? Now, you can take information that you probably, you can get from social infographs and you can bring that in and you can say how many people does this person naturally interact with? Is there a natural ⌠Is there a sense of a lot of interactions, one of interactions, periodic interactions or complete randomness and that is something that you can establish when youâre trying to identify a social identity of the person as well.
Then youâre looking at different kinds of ingress and egress. This ingress and egress can be of many different kinds. In this case weâll problem just focus on the financial instrument. Now, if you are to look at my bank account the only way of ingress that I get in my financial account is my paycheck but then there is egress for natural utilities, there is egress for credit cards, there is egress for different kinds of payments that I necessarily have to do. With that too you can establish a certain kind of periodicity and then you can say there are some that are not necessarily periodic but are not necessarily risk worthy because theyâre not that active.
You can also try and these past triggers. Past triggers are actually one of the biggest identifiers that help you detect whether this person has a natural position to actually [inaudible] interesting behaviors. You would look at letâs say how many times has this person triggered off, or how many times has this person triggered off a loss of credit card? If there is a loss of credit card then you know the risk quotient for this person can actually be higher than the rest of the family that you associate with. You can also look at volume rates in terms of what is the rate at which this person operates in terms of their amount. You can also look at natural, the amount of volume itself. You can look at that as well as a measure of how risky or how not so risky it is.
Operational window, the idea is very intuitive. It tries and finds out when are you actually active. A lot of people tend to pay bills, tend to pay credit cards and all these things over the weekend. We donât necessarily do all these things on the weekday. That essentially puts you in a separate group. You can establish this kind of information as well.
Now the thing is with all these identities that you establish you can also establish a second form where you replace the individual by population so where you segment the data and you say, okay this person belongs to this group, how does this group naturally operate, how is this person, how much of a variant does this person from the group that we expect them to belong and that will essentially give you two kinds of measures. Itâll give you a personâs measure ⌠I mean it will give of course the population metrics, population statistics for all these things that you measure and then itâll give you the individual statistics which you can use and compare and try to find out how much of a risk quotient you can actually associate for this person.
The next set that we look for again is interactions. Interactions are another beautiful set of features that we use which are actually super useful when it comes to predictions. Sometimes we donât necessarily have a good amount of individual or population statistics. At that time we use interactions as ⌠Interactions tend to stand out as good features when youâre building these models. In interactions youâre looking at age groups. What is the âŚ
Do you have a question? Sorry. Yeah, okay.
In interactions weâre looking at different kinds of age groups. Different age groups tend to operate in different ways. Usually age group of 50 and higher, maybe not 50 anymore I would probably say about 60 and higher tend to operate much more in cash where what you see is ⌠You have to be watchful also based on the kind of instruments that youâre looking at. For example, a person might actually withdraw cash and give it up to someone and on the other end the person might actually get cash into the account so although thatâs not a detectable link with the probabilistic measure you can say that this amount was essentially transferred to this person using cash. For that you can actually use different kinds of metrics where you say ⌠You can use the ingress and the egress where essentially you try and identify how much of moneyâs actually coming into the account and how much of the moneyâs actually going into the account based off the instruments that the person is using.
One of the indicators that we use to try and segment across the group is age which tends to help us identify different age groups and clearly figure out whatâs happening out here. Loyalty, not a subjective thing. Itâs not brand loyalty. Itâs more like how long have you had this account and letâs say, how many people have you actually tried to bring in to the financial institution or the agency for that matter, would be an interesting measure. Many number of years you tend to have a low risk score. Not necessarily so many years. We donât necessarily give a low risk score but we definitely associate a risk factor with that person. Then we also look at the periodicity of interactions. How frequently do you pay your kids? How frequently do you pay the bar ticket or any of those things?
We also look at seasonalities. Every time the Super Bowl happens a lot of people put in bets which essentially means that a lot of money starts to move into certain kinds of accounts. That is not necessarily dangerous. Itâs just a seasonal thing and one of the other things that we also see is ⌠This is especially in not America. We see that when itâs summer there tends to be a lot more activity in terms of transferring money because people are traveling and theyâre trying to share money or pay back or something. Seasonality tends to increase the mode of interactions as well.
There is also system interactions. How much of this entire system tends to move periodically? Do all these people interact, were they all interacting for the last three months, were they all interacting for the last six months, were there any new entities, were there any old entities, did things change and very similarly you also looking at the volumes trying to see how much of this interaction tends to carry over a year or six months period a month? You also look at the interaction rate. How many times does this person actually interact with another person?
Now again just like how we did for an individual you can also look at interactions across a population. Now the interactions in population is very interesting because individuals, itâs very easy because there are individualistic attributes. You can cluster them and it ends there but across populations youâre not only looking at cluster interactions but youâre also looking at interactions between clusters. That essentially puts you in a very good spot to try and see how mobile is this person, how mobile is this interaction from one cluster to another and that essentially gives you a good measure as well.
Now mind you, these two sets of things Iâm actually talking about are the kind of features that you actually build for your model. Now these features actually go into your model and help you figure out if something is actually a real transaction or something is actually a fraudulent transaction or some kind of a malicious behavior in the system. Thatâs essentially what weâre getting to. Now what youâve seen in the first set is interactions of individuals or specific set of population. I mean, sorry, attributes of individuals of [inaudible] population and here youâre looking at the same set but youâre looking at itâs interactions.
The next one is actually a bit more interesting where youâre looking at a network. Now this tends to blow up the problem just a bit but it also gives you a lot more information in terms of what youâre trying to find out. One of the biggest things that youâre trying to find out is is this network convergent or divergent and the reason is a lot of money laundering techniques, fraudulent transfers of money tends to have convergent networks so thereâll be 10 people ⌠How many people here have actually heard of a concept called smurfing?
Great. Itâs not the smurfing in the network sense. Itâs the smurfing in the financial transaction sense but itâs the same principal. The idea is very simple. I donât want any financial institution or any organization to actually detect that Iâm transferring 10 million dollars and what I do is for about a thousand people I break up that 10 million dollars and give it to them and ask them to deposit into their account. After a set of days I ask them to transfer it to another person, person x. Each of them will have ⌠Letâs say a thousand people, itâll converge to a 50 people network and then 50 people eventually converge to letâs say 10 people and these 10 people tend to actually withdraw the money and then itâs taken out of the system without any detection. Thatâs actually fraudulent. Thatâs a form of money laundering which entitles you to get a very fantastic [inaudible] case but thatâs of the course the other side of the story.
Those networks tend to be fraudulent. Sorry, those types of networks tend to be convergent because one of the things that people who are actually conducting these kind of exercises, what they want to do is they just donât want to leave the money in your bank account. They actually want to take it out which essentially itâs like their achilles heel. If we can catch them at a point where the networks are convergent, we exactly know how things are converging and if itâs a normal operational behavior the networks donât necessarily converge. Yes, of course it converges to PG&E but PG&E is not doing money laundering. We know that. Thatâs essentially how things work.
Which is one of the reasons why we look at larger and larger systems so you can look at this at different sizes. You can look at the convergence and the divergence of a network across segments, within segments, within subsegments or within clusters. You can look at it in many different ways to try and identify how is this entire system or network actually moving? Youâre also trying to see what is the operational periodicity of these networks? Is this transaction a one off thing thatâs actually coming in to the network or is it periodic? Does this kind of money actually move across the network periodically?
You also try to see what is the active time? How much across a given period of time is the network actually active? How much of activity does this person tend to have and you look at something thatâs very interesting. Itâs called lock level. Lock level is nothing but itâs just a window which you use to actually look at different parts of the network at a given period of time. Youâre looking at many different areas across the network to try and see if something thatâs actually coming out or if a part of the network that weâre looking at, is it getting in whole or a big chunk of it is it actually getting into a different part of the network as well and that essentially tells you that this network, that these set of individuals or these set of clusters are actually operating as one network.
If youâre lock level mapping, so essentially letâs say three people out here and three people out here are not necessarily transferring anywhere close to letâs say 80 percent to 120 percent of the money that theyâve got within them then essentially youâre trying ⌠You can say that theyâre not necessarily in the same lock level and that helps you try and identify whether theyâre part of the same network or not. Youâre also looking at any recent changes in behaviors. Did the network grow? Did it grow in a much larger fashion?
For example, this is probably going to be very interesting. Across Europe one of the things that we see is we see dead people coming back and thatâs not necessarily a cool thing. Itâs just accounts of dead people actually comes back. Thatâs all. Thereâs nothing else to it. What happens is there are times when these accounts actually come back and the number of people in the network actually increase. Thereâs a really large number of people that swarm the network and then you see that there is a lot more activity than you would expect it to be. Those are the recent behavioral changes that youâre looking at.
Then youâre also looking at ⌠One of the things that you also want to do is you want to try and find out who are the probable sent of culprits that are sitting on the network? Youâre essentially looking at the last egress point. Where is this money actually coming out or where are these transactions actually converging into? Is it the same person all the time? As I said it could be PG&E or it could be an actual person. If someone is malicious the egress points tend to change quite frequently. That essentially gives us an indicator to try and highlight and say, this person was the egress point for this network three months ago but the same person is not the egress point for the network anymore and that change actually helps us identify that there is this interesting behavior across this network.
One of the last things of course, probably one of the most valuable things that weâre looking at is pass through to origin ratio. Essentially how much of transactions, it could be volumes or it could be merely counts of transactions so it could be volumes or values. How much of money actually passed through the system, how much actually got retained in the system, how much actually converged much before the egress point in the system? That gives you a good indication of a segmented set where either there is a lot of fraudulent transaction happening in it or itâs not necessarily fraudulent, itâs just a diversion network and everything seems to be hunk dory.
As I said ⌠Let me quickly jump back to this. As I said there are three primary things that we try and build when weâre looking at malicious behaviors across financial transactions. Attributes and activities. This could be individuals of populations. Interactions and networks. Yes?
Â
Audience Member:
How do you go about trying to demarket the network or find boundaries in a network?
Â
Ashrith Barthur:
Thatâs actually a fantastic question. For people who couldnât [inaudible], how do you demarket the network is the question. The idea is we use natural segments to first try and identify if the behavior seems to be into a cluster. Within the cluster, does the behavior tend to be the same? Does the behavior tend to pass over to another cluster? So we try and figure out if the clusters are exclusive or there is some crossover in terms of behavior. If there is a crossover then we recluster. You could also have a large cluster which could be subsegmented but then two segments of two different clusters could be the same so then we recluster them as well. This part I would actually say is a much more trial and error and a much more like a [inaudible] problem rather than [inaudible] here you go, this is a solution. You have to look at the data and you have to do it multiple times to get the right set.
Â
Audience Member:
Is the starting point [inaudible] often or demographic or geographic region or is this hitting too close to home?
Â
Ashrith Barthur:
So thatâs actually a fantastic question as well. The starting point ⌠We use both because we work with organizations around the world. We tend to use geographical boundaries as a natural starting point for us but then we automatically introduce the institutions as well as next subsegment for that.
Iâll switch to the next part. If there are no other questions Iâll just switch to the next part which essentially speaks about how we look at the models as well and one of the reasons why weâre looking at models is we tend to have a lot of data scientists and mission learning engineers who work with me who tend to say, âOh, Iâve got this to nine five or nine seven. How do I make it nine eight or nine nine or whatever?â Something that I very strongly believe in is those numbers donât actually matter. What matters is how much of a problem are you solving based off the risk factor that your institution actually gives you. If youâre working within the risk limit that your institution actually provides you then youâre fine but you can of course improve the modeling by making better features, by probably attuning your model as well because to be fair, almost all modeling techniques right now tend to be really good and especially for trying and identifying relations behavior I would say that I have found good and bad results with all kinds of modeling techniques as well so there is no one go to model but I would say everything works the same way but of course, I do a lot of random [inaudible] myself just because I like it. Thatâs about it. Thereâs nothing else to it.
In terms of the model there are a few fundamental questions that actually comes [inaudible]. One of the fundamental questions that ⌠Of course one of the things is what are you actually trying to solve in this case? Are you trying to solve an ML problem or are you trying to solve an AI problem? There is a chasm between these two and we donât necessarily try and figure that out. We tend to speak about it in the same breath and thatâs essentially not fair and Iâll probably speak about it just a bit but if I were to put this question out to you guys in the audience how would you look at it? Are you guys solving an ML program or are you guys solving an AI problem? Anyone could ⌠Itâs perfectly fine. The mic is free.
Â
Audience Member:
Are you trying to differentiate between a research problem versus-
Â
Ashrith Barthur:
Not exactly.
Â
Audience Member:
[inaudible].
Â
Ashrith Barthur:
No, no, no. All right, if nobody wants to [inaudible] Iâll try and not speak about it. The idea of it ⌠Actually, maybe the next slide might help. The idea of an ML is much more just classificational, what youâre trying to do is youâre getting a model to actually learn a certain set of behavior and then youâre using that predict a certain set of behavior. In AI there is a bit more to it. Youâre not just ⌠The whole process of ML seems much more linear. As I said, if I were to give you an example, very simply put, in ML youâre just trying to find out whether itâs fraud or not. You are essentially concerned with the very simple decision that youâve given it a set of data and you want to try and find out whether thereâs fraud happening or this transaction is fraudulent or this transaction is not fraudulent. Itâs a very linear problem and what essentially ⌠The question you tend to ask is what is the classification or what is the group that this transaction actually falls in and this problem actually is very linear. It doesnât necessarily give you any insight into what youâre looking at. Your model doesnât necessarily carry the insight. Itâs just Iâve learned a set of things, based off these set of things I know how to predict and thatâs essentially what an ML problem is.
But when you switch the same thing to AI the solution is actually emergent. What youâre trying to do is youâre not just trying to classify whether itâs fraud or not but youâre also trying to classify what kind of a fraud is it. Itâs answering a much more deeper question. Youâre not only using the data set, youâre not only using the features and the techniques, the domain knowledge and all those things that youâve got to answer the basic question which is is this fraud or not but youâre also trying to identify is this actually a certain kind of fraud or if you want to put it in a way what type of fraud or why is it a fraud. Thatâs essentially what youâre trying to identify.
There are two fundamental differences out of it. I think David [inaudible] probably does a much better job at explaining what emergent is so I would probably defer you guys to him but yeah. Thatâs essentially what youâre looking at and that is what youâre supposed to do and if youâre looking at merely a mission learning problem then a very simple model that helps you classify if it is a yes or a no makes sense. Thatâs about it. Thatâs essentially what you should be looking at. But if youâre looking at problem which is much more AI or that youâre trying to find out is this impersonation or is this transaction fraud or is this personal fraud but sitting on your machine and you have these things, you are looking at much more sources of data or much more feature sets, a very rich set of features that actually make sense. From a domain point of view youâre able to understand what these features are, youâre able to help yourself classify, youâre able to actually identify what this behavior is and that essentially makes it an emergent problem.
Â
Audience Member:
DoesâŚ
Â
Ashrith Barthur:
Yes, please, yeah?
Â
Audience Member:
[inaudible] AI probably can automate the decision associated with understanding-
Â
Ashrith Barthur:
Absolutely, yes. Yeah. It really puts you much more far away from this. It puts a bit of a distance between you and the system. I donât want to say puts you far away from the system. That sounds a bit scary. We donât want that but yeah, it puts a bit of distance between you and the decision making because the system itself is intelligent to make that decision.
Â
Audience Member:
[inaudible].
Â
Ashrith Barthur:
Yeah. I think there was another question. Okay. Yes?
Â
Audience Member:
Is it like there is another ML problem after a bigger ML problem?
Â
Ashrith Barthur:
It actually is. Yes, youâre absolutely right.
Â
Audience Member:
So why would it be AI? [inaudible].
Â
Ashrith Barthur:
So thatâs a fantastic thing and this of course is subjective and thatâs essentially the idea of what emergence is. Itâs much more than the composition that it actually has so your problem could entail many different models eventually by the time you actually get that intelligent decision making or it could be just two models and your system that you build with all these models, could be smart enough to actually make that decision but youâre absolutely right. It could take many models to get you to that point or it could take you just a few models to get you to that point. Yes?
Â
Audience Member:
Do you also use a [inaudible] technique for AI instead?
Â
Ashrith Barthur:
I would beg you not to do so because the only thing that happens is when youâre using a [inaudible] approach for problems of this kind it tends to flatten the problem and here this is a sense of hierarchical approach. If you know the construct of how intelligence is created, you have data then you have knowledge and then you have intelligence and then you have wisdom so the number of models that you actually put out eventually puts you to the pyramid where you have intelligence but if you flatten the model then youâre necessarily just stuck at a place where have a data of knowledge. Yes?
Â
Audience Member:
Yeah, I guess what youâre saying is ML is one specific model predicting one specific component of a larger system.
Â
Ashrith Barthur:
Yes, yeah.
Â
Audience Member:
AI is the larger system because here what type of fraud is it? I could have maybe multiple labels of fraud.
Â
Ashrith Barthur:
True, yeah.
Â
Audience Member:
It could be expanded upon beyond binary.
Â
Ashrith Barthur:
For that Iâll-
Â
Audience Member:
Thatâs still a machine learning problem, correct?
Â
Ashrith Barthur:
Iâm sorry, say that again.
Â
Audience Member:
If I had three categories or 300 categories of what type of ⌠Thatâs still an ML problem.
Â
Ashrith Barthur:
It is. Of course it is and that essentially goes back to what the gentleman was speaking that itâs actually a sequence of multiple models that eventually gets you to a point where youâre actually making intelligent decisions and for very ⌠Iâll give you a very simple example of how this approach is. You have a model which actually is telling you whether this transaction is fraudulent or not. The model is not smart enough to actually identify what kind of transaction this is or what kind of a fraudulent transaction this is. You might need another model with an extra set of information which is capable enough to actually classify what kind of a fraudulent transaction this is and thatâs essentially ⌠Itâs like a chain of models that will eventually get to your point.
Â
Audience Member:
Yeah, yeah thatâs cool.
Â
Ashrith Barthur:
Thatâs essentially how we build it. Thatâs essentially how we build the system because a lot of the decision making that financial institutions want who essentially work with us, they want to try and keep as much as possible to an objective view. In the sense they want systems to do these things as far as possible and thatâs essentially what weâre trying to build when it comes to identifying relational behavior and there is a genuine reason why they want to do that. Itâs because everything is becoming electronic. We donât necessarily go to the banks anymore. Much of our transactions, most of our transactions happen online.
Way back, I think in 2012, credit card agencies were running about probably close to 30 thousand transactions a second. Now every day, Iâm sorry, every second they tend to have about 300 to 400 thousand transactions per second which essentially does not give you good enough cushion to put enough people to actually [inaudible] something is fraudulent or not which is one of the reasons they want models that can actually make these decisions themselves and come to a point where it says, hey you know what this is what I actually think it is which is why we actually take this approach of building the models.
Having said that, you could have many different ways in which you do this approach. An ML problem usually almost always is a supervised approach. In whatever [inaudible], I donât necessarily have seen an unsupervised approach in the models that we build at least of fraudulent techniques. In AI it could be a combination of both and essentially for this very example, for trying and identifying fraudulent transactions weâve built supervised models the outcome of which tends to get fed into an unsupervised model which we are trying to use and classify what kind of a fraudulent technique is this and essentially of course the whole pipeline has to be as lean as possible and this is what we usually implement in most of the financial organizations that we work with.
Sorry. So having said that, this is basically a set of ⌠This is a slide where, Iâm trying to say how our models have actually [inaudible]. Then of course we have started off, we went through the idea that oh you know we can solve the whole intelligence problem with one model. Then you build a model but that doesnât necessarily get you to the point so you necessarily scale down.
Most of the systems in the current day, most of the systems across the world ⌠Actually, I wouldnât say current day. A lot of them are switching. I would probably say about two years ago they were super rule based. There were many different systems which were extremely rule based which are trying and identifying fraudulent transactions, money laundering or any of these malicious activities using rule based systems. The problems with these rule based systems were you could very easily skirt around this and you could miss the detection which means that it was very, very easy for you to get through the system.
The way we evolved it is we took up the rule based systems, we built feature sets, we built the model [inaudible] parameters and then we used classification checks. For classification checks we actually put a human being in the system to try and verify whether a certain set of classifications were actually doing a good job or not. We use the techniques to actually rebuild much better models and essentially we are at a point where we have done the classification of ML models using these techniques so from a peer rule based systems we have moved to a classification model.
Now, the next approach that we have actually built as well is like self discerning models. Models that are actually smart enough or intelligent enough to actually tell us what is the kind of fraudulent activity that is actually going on and for this we donât necessarily need classification checks. We have built enough learning that actually comes out of existing models but we still use the features, the set of features that we actually spoke about. We still use parameter tuning to try and tune the model to [inaudible] risk factor, best specification and weâve come to a point where we have models that can actually make a decision on their own. I wouldnât say they are super intelligent. I would probably say they are intelligent in a very narrow way, very humbly put I would probably say. They probably canât do anything else other than trying to identify 10 sets of fraudulent or maybe say it doesnât belong to any of these 10 sets it could be that one more new set of fraudulent activities that I am not able to classify but you need to take a look at. Essentially thatâs where we are right now. Maybe weâll see some more development in the future.
Yeah, having said that I think Iâll probably stop here and Iâll open it up if you guys have any questions or anything at all, comments. It would be great to say that. Thank you. Yes?
Â
Audience Member:
Two questions. One, are there certain combinations of integer values, not necessarily obviously if thereâs a huge amount or very small amount that could be anonymous but actual numbers that you just donât see in combination or per mutation very often as a transaction that seem anonymous?
Â
Ashrith Barthur:
Iâm sorry I think I missed after the âŚ
Â
Audience Member:
So an example would be youâll see a lot of different transaction values like 99 cents or a dollar or something like that but if itâs a combination of numbers that you just donât see very often as a value?
Â
Ashrith Barthur:
Okay, so that I think is a fantastic question that leads us to some of the features that we built. We also look at sequence of transactions. Letâs say you usually do 99 cent transactions quite a lot and then you suddenly do a thousand dollar transaction or letâs say even a hundred dollar transaction, it tends to create a feature which creates like hey youâre seeing a sequence of transactions which is very interesting. This needs to be a feature so letâs add that in as well. We do tend to add that as well, yes.
Â
Audience Member:
Then the other question is, is there certain things that are fraudulent or fraudulent trends that you train on and you may catch them early on but then youâre not training on those anymore because theyâre less prevalent?
Â
Ashrith Barthur:
Okay, yes. We tend to face that problem but most of the institutions that we work with are not necessarily completely proactive. Partially proactive is how I would put it so that means that we still retain trends that were there yesterday but we still make the models oblivious to the trends that were there so it could have two different models where one is necessarily just trained right in to find something interesting while the other one is trained to figure out what are the trends that this fits in. Yes. Yeah?
Â
Audience Member:
Very early in the talk you mentioned concerns around latency and keeping that down. Iâm curious what are some of the things that end up being most challenging from a latency perspective and how you think about addressing those?
Â
Ashrith Barthur:
One of the biggest problems that we face when weâre addressing latency is having a comprehensive set of data but not being able to bring that together to actually make a decision and this is ⌠If I were to put it down to a very simple problem itâs the problem of joins between many different tables. How do I bring it together? At that time what we do is initially what we used to do, we used to go over the idea that weâd join everything together at line speed or just before something is going to happen and then use all the features to make a decision but now what we do is we have initial set of models thatâll actually tell us what class of problem this might be, what class of a fraudulent behavior this might be. That means that there is a selective set of features that actually get joined, not necessarily all the tables and that helps us make a decision so that cuts down the line speed, the amount of time it takes for us to process things at line speed but that does not mean that we are still capable enough.
After that the next step would actually be a curated set of features so we just ⌠For example, some of the models that we have built have about thousand 700 features. They donât make any sense when we are making the decision. We use just about 100 features. We bring it down to a much more smaller level to be able to make the decision much faster. That reduction is also a place where your risk team comes and tells you that hey youâve brought down the features to 110 but the risk is increased by a certain factor. We can take it or we canât take it as something that ⌠Itâs a discussion that actually has to go through.
Â
Audience Member:
I guess a followup question to that would be is there some sort of offline model where [inaudible] maybe instead of having a second latency you have as long as you want for some [inaudible] job where you go back and revisit these?
Â
Ashrith Barthur:
Yes. One of the other things that we do is this is an online model. I wouldnât say online. Itâs near line speed. We also have another model which comes through after the data comes into the system and we know itâs been classified in a certain way. We have another model which builds all the features that we have actually put into the model and then reclassifies that thing as right or wrong based off what was classified earlier so if there is a difference it actually gets fed through a manual handler who puts it back in the system and says, âThis is where your classification was wrong.â End of the day, end of the week a new model building will be triggered. Yeah?
Â
Audience Member:
I have two questions actually.
Â
Ashrith Barthur:
Please, yeah.
Â
Audience Member:
For this, just automating the [inaudible] so before [inaudible] how do you translate learning into your models [crosstalk]? Second part is what [inaudible]?
Â
Ashrith Barthur:
Okay, so one of the things that we have done is try and understand how people make decisions like investigators who have made decisions, we try and see what are the important things that they have actually made decisions. We donât necessarily take every aspect or every type of I would say assistance that theyâve used to actually bring it, make it into an object and bring it into the model. We use things that can actually be numerically or represented in a classificational way and thatâs essentially what we bring in.
If someone new is actually getting into this, into the investigative field, I would say it would behoove that person to actually try and understand how a model is actually classifying because one of the new things thatâs ⌠There is a new problem thatâs coming out of this and I donât know if a lot of people know about this, is wrong classification. Of course the model is not perfect so a lot of the things that have been wrongly classified we randomly pick out of this set of classifications that we havenât rescinded to manual handlers. If these manual handlers know how things work, [inaudible] of course knows how these things work and they can come back and identify saying this is wrong or this is right because of a certain feature that we put into the model. That would be an extra set of skills that would be very valuable. Yes?
Â
Audience Member:
Yeah, my question is on the classifying part [inaudible] of features so especially the network of features. I just wanted to ask in general how do you guys generally the features that you have different classifier that classify the same, for example [inaudible]. Do you just push a set of transactions to a classifier and it tells you is it [inaudible] and use that as a feature [inaudible] or you have one model and you push all the transactions [inaudible]?
Â
Ashrith Barthur:
We have one model that actually does this for us but the network features that you talk about, the network doesnât necessarily change at line speed. It changes in a much more slower pace so we tend to precalculate this much before, like at the start of the day for example. We tend to recalculate this every 24 hours, every week and we keep it ready. Sometimes what happens is there are places where we actually recalculate this ever hour which means that there is a set of functionality that is actually happening at another place where a new set of features are coming in but some places are very comfortable by having these network features for a day or for a week. Yes?
Â
Audience Member:
[inaudible].
Â
Ashrith Barthur:
Please, yeah.
Â
Audience Member:
Letâs say you classify something as [inaudible] or whatever the classification youâre doing and if I ask you why did you say itâs fraud, how do you go back and infer what feature was responsible for that? How do you answer that question?
Â
Ashrith Barthur:
Fair enough. If you actually look at it ⌠Letâs take any of the features that weâre using. Letâs say there was some individual feature, Iâd probably take network because thatâs probably some of my favorites. Youâre looking at some of the network features out here and some of these features actually come by and tell you that there is a certain reason why your transaction is classified as fraud. Now if you look at convergent or divergent for example or if you look at letâs say lock level or recent change in behavior. These are intuitive features. These are not just communitorial features that you build but you donât necessarily understand.
These are internal features that you can actually take it to your business and say, âHey, look, there are a set of entities in this network that actually have a convergent behavior which is one of the reasons why based off the network part of the features that we built for this model, it tends to say this is fraudulentâ, or for example a recent change in behavior. The number of entities couldâve increased which essentially says that because of this there is a complete change in how the volumes are actually happening in this network and that is actually a true feature that you can take to a business and say, âLook, you understand what this isâ, and our model is saying that this is an important feature and thatâs essentially one of the explanations you can give.
Â
Audience Member:
Basically there is a human being that is sitting behind the model who is going through your 110 features?
Â
Ashrith Barthur:
No, thereâs no human being per se. There is a human understanding to it. This is not just an x and when you figure out why is your model classifying something as fraudulent itâs not saying because of x. You actual understand what x is and that gives you legitimate reason for your business to try and figure out, oh okay maybe we have a problem here, maybe we have a lot of money laundering that works in our financial organization and that gives you an intuitive idea of whatâs happening but if [inaudible] is convergent and I just calculated a feature ⌠Letâs say I blindly build features, all communitorial possible values that I got I just built a feature and I threw it into a model and I predicted the model.
How do you explain to anyone what that is? You would only be able to explain from a very experimental idea that this feature actually results in certain behavior but now can you say that that feature is consistently going to predict all the other behaviors? Maybe not. Which is one of the reasons we focus on building features that have more intuitive, understandable for a human being as well and for the models because you also have to understand fraud and money laundering and all these things. There is an element of legality associated with it which means that in a court of law you will actually have to be able to prove that this thing is how it is. I canât just take in feature x and say, âThis feature x said this is fraudulent hence we are filing charges against you.â That doesnât necessarily happen.
Â
Audience Member:
So you basically want to build into the features [inaudible] with [inaudible]-
Â
Ashrith Barthur:
We want to build into the features. We donât necessarily want to analyze them but if there is a point where you need proof of why a certain thing, we classify a certain behavior, there is an intuitive feature for you to use. Thatâs essentially how it is. Yes?
Â
Audience Member:
So does that mean in dimensionality reduction techniques like a no go?
Â
Ashrith Barthur:
Yes. We do not go towards that at all unless until we are in such a bad shape that none of the features that we are building are actually making any sense. We have never ended up in that situation and the reason is because what weâre looking at here is malicious behavior which means that there is a human element in it. Human beings are the only ones who do this. Systems donât do it, every other living organism probably doesnât do it. I donât know. I donât want to say that.
Â
Audience Member:
[crosstalk].
Â
Ashrith Barthur:
Fair enough. Yeah, see there you go which is why I said I donât want to say that. We know humans act in a certain way which is one of the reasons if we build features that can identify these, which can classify these behaviors itâs comfortable for us to explain it. Yes?
Â
Audience Member:
Do you incorporate different costs for false positives versus false negatives?
Â
Ashrith Barthur:
Yes, we do. Yes.
Â
Audience Member:
How do you think about the cost of a false positive these days?
Â
Ashrith Barthur:
There are actually many different cost factors that we associate with. One is the false negative as well which actually has got much higher than the false positive. Our models tend to be ⌠We usually try and go for the [inaudible], try and produce false negative as much as possible but the cost factor for a false positive is high enough so that we donât necessarily run into any kind of trouble so thatâs essentially how we design it.
Â
Audience Member:
So youâre not just out putting a confidence that the client will then consume and interpret as they want, youâre giving an actionable-
Â
Ashrith Barthur:
Yes, yeah and there is a variable component to the cost factor as well and that variable component comes from certain risk coefficients that are actually associated with certain primary features. For example, if the transactional amount is super high we tend to provide a reasonable high cost factor to it so there is a variable component to it as well.
Â
Audience Member:
So, when you say cost you mean so for the matrix?
Â
Ashrith Barthur:
Yes, so basically if the model is predicting a lot of false positives or if itâs ⌠Essentially what we do is we penalize the model and thatâs the cost factor the gentleman was talking about.
Â
Audience Member:
Just curious, [inaudible] but you also customize it as per your needs?
Â
Ashrith Barthur:
No, no. [inaudible] is always what is used because we do not want to be an association where we missed things so we try to reduce false negatives as much as possible so we use [inaudible] but even then we are having a larger cost factor for everything that the model gets wrong and thatâs essentially what weâre turning on.
All right, any other questions? If not I think ⌠okay. We are at 7:40 so maybe itâs time to ⌠Yeah. Thanks guys.