Return to page

Meetup: Detecting Money Laundering Networks Using Machine Learning

Read the Full Transcript

 

 

Ashrith Barthur:

One of the things mentioned, we are actively working in the area of money laundering as well. That's one of the areas that I've been working in the past. Whether it's from cybersecurity to any kind of Fintech fraudulent activities which encompass anomalous behavior. I like to classify it as narrow because there are different kinds of money laundering, which could include backings of terrorism, human trafficking, and even for example: tax evasion. Any of these things kind of come under money laundering in terms of how the activity goes about. This is a sliver of a problem that we are working on where we have actually developed a solution. That is something that I'll talk about and I'll showcase as well.

Money Laundering False Positives Problem

What essentially happens while you're trying to figure out things about money laundering is you have a very high false positive rate in the alerting system that exists right now. Having said that, I just want to quickly pause and see how many people are actually working in the field of Fintech trying to apply ML in Fintech. Fair enough. Great. Some people are halfway there. Maybe you are still developing a model. Fair enough. One of the quick things that I want to talk about when it comes to money laundering is the solution itself. What we saw when we were working with this problem is there was a huge amount of false positives that existed in the system. Currently there exists different kinds of rule systems which generate a lot of alerts. A lot of these alerts are actually not good in terms of quality. Meaning you have about anywhere close to about 70%, maybe even 75% to about 99% false positives.

Some financial institutions don't necessarily tune it well. This is where you tend to have these really large false positives. When we started engaging with this we thought it was a one-off problem that we would be working. Quick point as well. If anyone has a burning question, please, I'm happy to be stopped right there. You can go ahead and ask me. It's totally fine. Let's keep it easy. Having said that, we thought this was a one off problem that we were solving when it came to applying what we know. It seems like this is a systemic problem across many financial institutions where rule-based systems don't necessarily give them high quality alerts.

We actually built a solution and we've deployed this across quite a few financial industries around the world. What we've designed essentially is very simply put it's an end-to-end machine learning-based false positive reduction system. This takes the existing alerts that you have from the alert generation system and it classifies it using supplementary information to do this classification. We'll go about it. I'll even show you how it works.

Money Laundering Schemes

Having said that, is anybody still new to the concept of money laundering or somebody who does not know what money laundering is? Simply put, the idea of money laundering is you have money that comes out of illegitimate sources. You need to make the money that you procured from these illegitimate sources into the financial system so that you can use it for legitimate purposes. Essentially that's what it is. People go through a lot of interesting things to do. I'm sure a lot of people have seen the show Breaking Bad. They very interestingly show you how to do it. That doesn't necessarily happen the same way. Anyway having said that, a typical setup of the process is broken down into three parts. The first is the placement, then it follows up with layering, third is integration.

Placement is the first step in your money laundering process. This is when you know you are actually bringing in money that you've obtained from illegitimate sources. Trying to bring it into the system. It could be money that you've got from selling drugs or trafficking human beings around the world. It could be funding interesting behavior around the world. It could be terrorism, sleeper cells, any of these things that might come in as well. It very simply could also be money that you're not necessarily you're not being transferred about this money to the government. It could be that part as well. That is the money that you're bringing in and that is what you need to place it into the system. The placement into the system is where do you deposit the cash into the system or into the financial system. Once you actually deposit it as a form of a large payment or you hand it over to hundreds of people in your village. Then you'll tell them, "hey come over and deposit it to one account."

That is a form of placing into the system, which is called smurfing: where you give small chunks of money to hundreds of people and then ask them to put it back into the system. There are different ways in which you can actually place money back into the system. That's essentially your first process. The next one is layering. This is where you try to lose your trail. When I say lose your trail, what you're trying to do is move the money across multiple accounts, into different instruments, and out of different instruments. To make sure that the government that necessarily is tracking or the financial institution can't track you anymore based off how you moved money. This is one of the reasons why a lot of money tends to end up in places like Cypress, Malta, Maldives, and a lot of islands around the world. Which have very convenient laws about not disclosing where the money is coming from or where it's going. Switzerland has the same thing. There's Liechtenstein and I think there's one more place which is popular as well.

Audience Member:

Panama.

Ashrith Barthur:

Oh, Panama. Oh yes, of course. Thank you. Fantastic. Once that's done, your money tends to move around these islands and these places. Then eventually you lose the financial institutions that are trying to track it. They tend to lose the trail. Then what you do is it comes by some interesting means. It comes from one of your cousins who had written a will. He was a millionaire who suddenly passed away and in his will he had your money. That just comes back into the system and you get millions of dollars. That's one of the ways of doing it. There are many ways of doing it so that it comes back into the system and then that becomes legitimate.

Then you go ahead and buy, do whatever you want with that money. Most of the financial institutions are actually trying to identify money laundering at this point. There are systems that try to detect if there is layering happening. If there is some kind of endpoint integration that's happening in the money. Most of the work is financial institutions trying to find placements. This is the fundamental part in which almost all rules-based systems work. These rules-based systems. Actually, I think it's excellent.

Audience Member:

How about Bitcoin?

Ashrith Barthur:

Oh Bitcoin is something that they have still not worked out with. So Bitcoin is still very easy to work with when it comes to money laundering. You and I could do it as well.

We have different rule-based systems. You have things like FICO, Fiserv, SAS, Actimize. These are all rule-based systems. They have fantastic rules actually that exist in these systems. They try to catch different behavior. They're usually very successful in catching different behavior. Some of the problems that exist is that they look for uptakes, downticks, switches, or cutoff points. They also don't necessarily look for a lot of stateful behavior. It's usually a lot of stateless behavior. Which is one of the reasons they tend to fall short. The other thing that also happens is a lot of financial institutions tend to run these rules in default mode. You are supposed to customize these rules specific to your purpose. A lot of financial institutions don't necessarily do that.

They run it in default mode which means that they end up getting a lot of false positives. As I was saying, adding to that, one of the reasons why using these alerts is the quality of these alerts is still bad is because the process of how you work with these alerts is still manual. There is still a lot of human intervention that happens. Investigators sit through these alerts trying to find out what is good, what is bad. There are lots of financial institutions that are much closer to this than this. The cost of these systems are millions of dollars. They still end up getting about 99% false positives. Which means that they really have a very small margin of true positives that they can actually work with.

That essentially means that they have to go through a lot of garbage, which becomes a big problem for them. As I was saying, the rule based systems are slow. It takes some time for them to actually understand what the behavior is. One of the things that we have seen is that different geographical locations tend to have endemic behavior in terms of how money laundering actually happens. For example, the way we see money laundering in Asia doesn't necessarily happen in Europe. The way we see North America doesn't necessarily happen in Asia. One simple example is that in Asia you tend to see high volumes, small amounts. Which is one of the most common behaviors that you see in Europe. It's actually the other way around.

You see high value, low volumes. In America, you can actually break the sector apart. In high value customers, you tend to see how it is in Europe. In low value customers, you tend to see how it is in Asia. So it's a mix of both. This is one of the reasons why the behavior can be a bit endemic as well. That's not necessarily captured in the rules module. Which means that you have to tune it, make it work for you. Then the rules also have a good amount of gaps which means that there could be different ways in which money laundering is actually happening, but the rules don't necessarily capture it. Which tends to also be a problem. At some level it requires experts who understand these rules to come and help you out in terms of fixing these rules.That essentially makes it very limited in how you can work with it. So it's a problem as well.

Rule-Based Model

Having said that, this is the current workflow. This is how any financial issue that's using a rule based system actually works. So what they do is, you get an alert that actually comes in. The person that you have an investigator who is evaluating the alerts. They use different sources of information. They use LexisNexis to try and find out if you have any criminal records or any of that kind of information. There are lots of other systems that actually give you KYC information. Different countries have different sources of information that they can use.

One of the cool things about the Netherlands is that banks can actually share information in terms of KYC. There is this very limited scope of what they can share. It's very interesting. You can share what volume of transfer that this person does. How many times does this person transfer? Those kinds of things you can share between banks. It becomes a really beautiful thing that it can share. That doesn't exist anywhere else, by the way. You are left to figure out based on whatever aggregated information that you have. In terms of the analytical data that the investigator looks for, LexisNexis is one account database. It's your KYC information, transactional card, and collateral.

Any of these things are usually valuable. Once this is done the investigator is able to classify an alert as suspicious or not. This essentially becomes your ground truth. This is probably the most valuable thing that you would say will help you classify a model. So what essentially happens is that the investigator goes through this whole process. The investigator actually goes through this whole process of evaluating the true positives and the false positives. Trying to figure it out. You still have this huge amount of alerts that they have to go through. There are financial institutions that have about 300 to 350 investigators to go through this data and try to find out. If there are 3000 alerts in a day that are available for them to prove which becomes a big problem for them as well. The workload is one of the big deals.

False Positives

Audience Member:

You keep talking about the false negatives. Right? You keep talking about the false negatives. What about the false negatives actually? If it's real money laundered and this misses it or are these rules rigorous enough? That it will definitely be cut.

Ashrith Barthur:

That's actually a fantastic question. So what do we do? The solution that we have actually works post alerts. Which means that if you have actually missed something, which is the false negative. Then it is actually gone. So you have lost that information. At least at this point in time you have lost that information.

What Does the Anti-Money Laundering Solution Do?

Essentially the solution designed, what it does is there is some amount of consistency that we've built into the whole process. It reduces the false positives that we actually have. The way we have designed this is it actually is strategically placed between the AML system and the investigator. It doesn't necessarily have to do any modifications to the rules. It actually gets inputs from the output of the role system. It classifies as an investigator would do an alert into a false positive or a true positive. It would use an ML approach. The machine learning approach or the modeling approach that we take, uses a good bunch of features that are specifically designed to find anomalous behavior and financial transactions.

Advantages to AML System

How does this lead you? What are the advantages that this gives you? Speed is one of the biggest things that you can get. Usually the time it takes for another to be processed is anywhere between 45 to 90 days. Sometimes even close to six months. But in this case, it reduces the time down to a few seconds. It also reduces human inaccuracies. One of the studies that we did was we tried to look at how investigators approve across days. Different temperature conditions, different times of the day, we tend to see investigators have a faster rate of approval as they approach the evening and a much slower rate of approval in the morning. This essentially tells them that they either have a target to meet or they just want to go home.

That's the interesting thing that we saw. For example, the temperatures that we were looking at. Different days in which temperature changes, what we also saw was that on cold days, things are usually slow. Warm days are usually very fast. Probably because they want to go out. They don't necessarily want to be in, on a warm day. It also reduces a lot of personals that you would actually put in. Yes?

Audience Member:

Give a sense of how many alerts each investigator uses on a daily basis.

Ashrith Barthur:

On a daily basis, they deal anywhere between 40 to 60 alerts. I would say anything beyond 50 is usually on the higher side. Some of the things that actually happen is with the modeling and with the features that we've built. It tends to fill in the gaps. It helps us identify behaviors that we didn't necessarily see before. Yes, please.

Audience Member:

Since everything is rule based, what does the investigator know that the rules cannot quantify?

Ashrith Barthur:

Yeah, fair enough. One of the things that's happening is that. Let me go back to this one. You see a lot of false positives that are being generated by the system. For example, I could have sent money to my brother which is beyond $10,000. One of the things in the United States is if your transaction is about $10,000, you have to flag it as a potential money laundering situation. Then what happens is that's a false positive because I probably had legitimate reason to send it to my brother. Unless I was money laundering. Now the investigator has to go through these systems. He'll probably call up. He'll probably check up some through out of loops in this case. Sometimes he'll look through the system of information that will tell them what it is. For example, he'll have the ability to look through a memo and say what is going on here? Maybe he's transferring to a sibling or something. Essentially that helps in terms of an investigator. The rule system doesn't necessarily know that.

The other thing is the way we have designed. We've built the solution. We've built it as a supervised problem, which means that we are looking for labels that already exist from an investigator's approval. There are different ways in which we did the label study. We looked at investigators across different levels. The first level, second level, and third level, trying to see if this information we can aggregate in a certain way. Actually build the labels. Of course we also look for historical alerts just to make sure that's what we build the model on.

Data Sources Required by the AML Solution

There are three sources of data that we need. The email alert, the transaction information and KYC. Everything else that you give is icing on the cake. So that's essentially how we go about. Having said that, I'll quickly switch.

Audience Member:

May I ask a question?

Ashrith Barthur:

Yes, please.

Why are Banks Approaching This Problem?

Based on what you explained. What you do is for whatever procedures that are there. You are training in an ML engine and then use it. But you don't go into the fundamentals of money. What is money laundering? How does the government get involved? They do get involved, good countries, bad countries. Can give a big list for you, myself. Then where the big money goes. Not the small money. Not 10,000, 50,000. These and millions, hundreds of millions of dollars given by American banks to bad banks routinely happen. Lots of money is printed in terms of 4-5 trillion year after year by the USA, by China. China never knows what the number of days is. Japan and so on. You are not even touching all of that. You're just taking what's already there, trying, and using that machine learning standard ML.

Ashrith Barthur:

Approach. So this is.

Audience Member:

I'm not trying to build what you're doing.

Ashrith Barthur:

No, of course not. I don't even think about that. That's totally okay. We essentially work within the system. Meaning every information that the financial system has is what we work with. Which means that we are addressing the problem only from a financial system perspective. We are solving a problem that they have. Not the problem that they themselves are not able to comprehend how to bring it into the system.

Audience Member:

They intentionally do it. Can you give an example?

Ashrith Barthur:

What I'm saying is, which is why I opened up with the idea that this is a very narrow problem that we are solving. This is not. The solution can identify things based off which we have data. If there is data suppression, if there is no data, then there is very little that we can do. I mean, by any agency that does critical not work. Thanks. Any other questions? I'm going to quickly jump to a demo. Any other questions maybe? Yes, please.

Audience Member:

To add to that, I do text space for a company called Edzi. We also do this. We have an AML solution. The reasons that, you're right, the banks do want money laundering actually or they don't. They're willing to look the other way. Right? It's great. Money's being put into their system. That's what they want. The reason for that. I mean, not that they're evil or whatever. They will for a long time and look the other way. In the last maybe 5-10 years regulations have come down on these banks. From a fines perspective, from a reputation perspective, it's become a much bigger deal to catch these things. Right now, and because I know we solve the same problem. We are trying to solve the same problem. It's being able to say, "hey, I did everything I could" and then the regulators will leave you alone. That's a big part of this. Of course you're right. I mean, it's something that the banks don't necessarily do. It's not like fraud in the same way where they want to catch that cause it's fraud losses. It's something that harms a reputation or that they're getting fined for.

Ashrith Barthur:

Here they're accountable for not necessarily something that it's a loss for themselves. Because of the stricter regulations now, I think there are bigger fines that are coming in. Which means that they automatically have to be more authoritative in terms of how they manage the whole system. Yes. Thank you. So question. Yes, please.

Alert Generation vs Filtering

Audience Member:

So from what you said, the first filtering is done by the rule space and then your solution is built on top of that?

Ashrith Barthur:

So is that your question? I wouldn't call it filtering. I would call it alert generation because filtering requires a process of some kind of curation in terms of the data. This is merely, "oh, what I find is suspicious.” something that's interesting could also be found as suspicious by the system. The system pops it out because it can. What the model does is, the model actually looks at the data. Which means that it's looking at your transactional history. Looking at your card history. Looking at your KYC background. It's looking at your historical alerts as well. Trying to find out, "hey, is this actually an alert that's valid or not?" You could say it's like a very mini brain on top of the.I wouldn't want to say it's a big brain, but cause it's a nano-problem that we solve. It's like a smarter system that sits on top of it. Thank you.

Driverless AI

Currently what we have. Can you guys actually in the back see this clearly? Because what this is. I'm running. How many of you guys are actually familiar with driverless AI? Quick question. Driverless AI is actually our flagship product. Which automates the modeling process and feature engineering as well. In this modeling process, we use driverless AI to build our models. What essentially we have done is. In this case to build a solution for ML, we've actually used driverless AI as the modeling engine to do the whole process for us. Give one second. In this case, let me see if I can. I did a very interesting thing. I don't have my internet here. Give me one second.

Audience Member:

Can I ask a question?

Ashrith Barthur:

Question? Please do, Yeah.

Supervisor Approach

Audience Member:

If I missed this part, please. So your approach is kind of like a supervisor or what?

Ashrith Barthur:

Yes. The one that you're seeing here, this is actually a supervisor approach. Which is why I'm saying there are many different problems that we solve in the AML space. One of the problems is reduction of false positives. That's essentially what you're seeing in this case. Yes.

Audience Member:

Money learning doesn't happen every day. Like, I mean somebody's depositing money or is checking, for example, right? Yes. So how mind labels, for example, you supervised learning problems.

Ashrith Barthur:

So that's exactly the point that was making was we use investigative data to actually help us create labels for these alerts. There are historical alerts which have already been labeled by different levels of investigators. We use that essentially to build our model.

Audience Member:

So that means these are the rules. That is what they trained. One is trained with that.

Ashrith Barthur:

The model is trained based on the alerts, not of the rules of the alerts.

Labeling Transactions

Audience Member:

The scale you have, set of transactions and can you label all of them?

Ashrith Barthur:

No. So you don't label transactions. There are alerts that get generated because of a transaction or because of a set of transactions.

Audience Member:

Okay.

Ashrith Barthur:

Then what happens is that the alert actually goes through the process of. Let me go back here for a second. This might help. Yes. So what happens is. A transaction or a set of transactions actually happen and then an alert gets generated. For example, structuring. Right? In that case, you would need a set of transactions to detect that structuring is actually happening. Once an alert gets generated by the alerting system, then you have an investigator who is looking at all kinds of information that he or she can gather about you. To make a decision whether this is actually structuring or not. There is a manual process in word. Someone makes a decision. Once someone makes a decision, then they classify the alert as suspicious or not suspicious. Now, we take that as the label to build the model on.

Minimum Number of Transactions

Audience Member:

So do you have to see a minimum number of transactions, for example? Or is there anything like that before you can say. You know?

Ashrith Barthur:

So what I would actually try to tell you is we don't necessarily see the problem of. There's no problem with data in terms of what we do because we are trying to identify behaviors from many different vantage points. Which means that we artificially enrich the data merely by the different vantage points that we try to look at the data from. That essentially gives us a lot more valuable information. We do not do sampling. Which is one of the things that we do not do because it tends to affect the whole modeling process in a very negative way. Sampling is something that we very dogmatically avoid.Okay. So people in the back, how bad is it? I know it's bad, I just want to know how bad it is? Okay, let's do it this way.

Demonstration

Helps. Okay. Essentially what I'm showing you right now is actually a flattened data set. Okay? This data set essentially has your email alerts, okay? It has your KYC information and it has your transactional aggregated or transactional sample with certain features built into one flat table. I'll give you a few examples of how we look at the data. By the way, this is synthetic data. We have not taken it from any financial institution. Please do not freak out. We do not have anybody's information. Although the distribution is very similar to a lot of data sets that we've seen. What do you look at, as you're looking at someone's account number? You're looking at what day it is.

Are you looking at the line of business? By the way, the moment you see a line of business I think it is, for a lot of people, probably ringing a bell. We are looking at retail data right now. In this case it's retail banking. Not necessarily wholesale or private banking. Then you're looking at the typology. The typology is essentially the kind, the family of alert that an alert belongs to. For example, someone mentioned structuring, it could be tax EVA vision, or it could be fast cash withdrawals. Any of these things tend to fall under typology. Under these things you have rules, subrules, and so on which tell you: do this trigger because this person moved $5,000 or $10,000. Sometimes it could be someone who is under 18 moving $2,500 gets triggered.

The Target and the Case

These are all sub rules that actually fall under typology. Having said that, there are two key things that you have to look at when you are trying to build a model on this AML data. It's something called the target and something called the case. Now I'll go to the case first and they'll come to target very quickly. So what case is, is the first two levels of evaluation of an alert where you're trying. The investigators in a financial institution are saying, "hey, this seems like a case." I wouldn't yet send this to the government for further investigation or information. This seems like the case. That's essentially when a lot of investigators actually mark an alert as a case. They just change a flag. They're answering your question.

Let's say a senior investigator looks at it and says, "hey, this actually seems like something that we can send to the government." What you send to the government is actually called a suspicious activity report or a SAR. I'm guessing now. That's essentially what is identified by targets. If you've said something to a target, that means it's eventually suspicious. Then you're looking at things like, is it a withdrawal or a deposit? Then you have a lot of KYC information: your account fixes, your transaction banking information, transaction code, your amount, your time of transaction, if this was a manual transaction, your Telecode transfer branch. This essentially comes from who is doing this or what? Who actually handled a transaction?

Then you also have certain KYC information: which you're looking at a number of ATM withdrawals of a customer, number of ACH credits that the customer had, number of credit purchases that the customer did in the last year to date or the year to month. I mean day to month or month today, sorry. Essentially, what we do is we take all this information, we aggregate it, and then we build a flat data set. This data set could also contain your card transactions, if you have your card information. It could also contain information about your collateral loans. Actually one of the most amazing rules that exist in AML is fast repayment of loans. I'm sure you guys probably know about it.

It's like, "Oh, your uncle you never knew just came over and paid your loan." So hey I mean. That's actually money laundering in a certain way. You have these kinds of alerts. You can also look at collateral, your loans, how creditworthy you are or not, and go from there. Having said this, let me try. Having said this, now what we do is we'll try to build a simple model with it. Essentially what we have done is, we have already dropped in the module that builds the features. I'll briefly show you. What we have in this instance of driverless AI, is a sample module which builds features for you purely for email purposes. In this case because we are using driverless AI, you need to let driverless AI know what is the column that you're going to build the model on or your labels for that matter.

Here we'll choose a target because we want to see what goes into SARs or suspicious activity reports. Driverless AI also has some tuning knobs which will tell you your accuracy. How accurate do you want the model to be? How much time do you want the building process to take? How interpretable does the model need to be? And so on. These are knobs that you can use to tune in. One of the things that we insist on when we build this model is something called an F1 score. Which means that it minimizes false negatives as much as possible. One of the reasons. Yes please.

Audience Member:

So this is a binary classification?

First Step to Tune the Model

Ashrith Barthur:

Yes this is a binary classification. In this case, no. Not, in this case. What we do is we take the F1 as our pinning value. The F1 helps you minimize the false negatives. The reason why that is important is because every financial institution that you worked with tends to have a MLRO or a money laundering risk officer. This officer tends to tell you, "hey, I do not have the appetite to lose 1% of my false negatives or 5% of my false negatives." Essentially you have to use that as your pinning figure. Which means that should be your target. Not the false positives, not the true positives. I mean, of course that should also be good. This is essentially your very first step that you'll have to tune the model on. Once you do this, let's pray. So now what it's doing is. Hang on. Yes please. Sorry.

Audience Member:

If you minimize the false negative, you do that because it means that some human later is going to check it again, correct?

Ashrith Barthur:

Could you repeat your question please?

Does Minimizing False Negatives Increase Potential Cases?

If you minimize the false negative. It means that you are going to accept a lot of potential cases because then they are going to be filtered again by some humans?

Ashrith Barthur:

Not necessarily because we've implemented the outcome of this model in two ways. Some financial institutions say, "hey, I want an investigator at the end of this pipeline." In that case, what you say is "right there is a human being who's looking at the outcome." Another case where we have financial issues would say, "whatever comes out of this as true positives, we'll just file them as suspicious activities." There are both ways of doing this. What essentially is happening right now? Let me try and zoom this out. What is happening right now is we are actually building a model in this case. The model we are building is based on this data set which had alerts in them. Because we have pinned ourselves to the false negative scores we are trying to get as good a value. You can kind of see here. We are trying to minimize the false negatives as much as possible. That's essentially what's happening out here. You can see the features that have actually gone into the model. We have many different features that actually come out of the alert activity. The transactional activity. Some of them come from your KYC information. These features add value and build your model.

How the Pipeline Works

Having said that, this is the whole modeling process that we are going through. This solution works with driverless AI as a backend. Let me just switch the slides for a second while this model is running and tell you how the whole pipeline works. Okay. Can the people in the back see this? Is it okay? No, maybe not?

Okay, let me go with the whole thing. Maybe that'll help. What we have here is the solution has a set of pipelines within itself. It's got its own database where when you're putting it into a financial institution you have daily uploads of data that actually comes in. You're looking at transaction table uploads, alert table uploads, and KYC information uploads. There's a first rollup set of features where we are trying to build the whole data set into some form of a flat data set so that we can get driverless AI to actually ingest it. Once this is done, there is a complicated set of joints in terms of how you join it.

The reason why this actually happens is because you have to find the alert specific to a day, or a day before and so on. Then join it back into its transactions and the KYC information. Once this is done, then driverless AI ingests it. You can actually build a model and this model tends to get deployed. Very simply put, we follow the same model in terms of productionisation as well. Where the same ingestion happens, feature creation happens. You have a flattened data set, but now the pipeline is essentially focused on more of this score. Driverless AI helps you make that switch really easily because it's a one click deployment of a model into the system. That makes it really easy. Having said that, this is probably like a brief write up about how we deploy it.

As I was saying earlier, this AML solution doesn't modify any of the existing rule based solutions. It sits in parallel to whatever rule based solution you have in your financial institution. Quite a few reasons for that. One of the reasons that I would like to give is because these systems are super expensive and they follow amazing regulatory guidelines to see that they get the rules right. All those things, right? So ML just can't change this thing in a day. I'm sure you guys probably know it takes some amount of time to actually get the whole process through. Which is why we tend to work with it instead of just against it. Much of the module in this case. The feature transforms and the model actually gets built using driverless AI.

We ingest the alerts that are generated from these systems. There is also automated documentation which comes out of these models. When you send these alerts through, the alerts are able to identify what features were actually valuable in terms of how this was classified as an alert or not. There's also automatic generation that actually happens in driverless AI itself. That helps ease out the pipeline about when you're trying to file out your SARs.

Organizational Architecture

Having said that, this is a very simple structure of how the whole thing fits in your organization. You have your data that gets fed in based on an ETL process. You have the email solution that's deployed. Then you have either a manual review: which is the investigator that you're talking about or you have an automatic case filing system that goes through. Let's switch back to the.

In this case, we have built the model and you can see. We've built the model. The model will also tell you what are the features that were actually valuable in this case. When the model is being built. Our F1 I think would be. Our false negative rate is about two. We've been able to classify the remaining amount of two positives and true negatives decently fine. We've had about 40 of them as false positives. That's because we are focusing more on an F1 score to try and build our model. Rather than try and maximize our classifications better. Yes, please.

Is this Model a Random Forest?

Audience Member:

Is the model, is it a random forest or?

Ashrith Barthur:

So yes. In this case, what we are using is something called a GBM. Which is a gradient boosted machine model. That is a tree based model, but it's quite different when compared to a random forest model. Here what's happening is every successive build is being built based off the errors that you have in the previous step. That's how it is in terms of GBM.

How to Balance Data Before Training

Audience Member:

The second question I have is related to what he had asked. You said you deliberately leveled over sampling, where you clearly have a highly unbalanced dataset with very few true positives and mainly false positives. How do you balance the data before training? Otherwise it's not going to really work.

Ashrith Barthur:

So what we've actually seen in this case is. As I said, we do not oversample. If it is actually much smaller than the amount that we can actually handle as a model, we move the problem entirely to a different system. Which is an anomalous system. Which is an anomalous model detection system and not a supervised problem anymore. If we move that to an unsupervised problem if it's too imbalanced. That's because the way these alerts are generated in transactions exist. There is a lot of time based dependency that exists in these transactions. Which means that oversampling these atomic transactions or atomic alerts is not going to help us classify something. It'll give us better numbers in terms of classification. If you were to build a relationship about how many times did I do a transaction today at Starbucks? If that is important for us, that won't be captured when you're oversampling your data, which is one of the reasons why we avoid oversampling data. If that's the case, we move the problem to an anomalous detection problem. Which is like an unsupervised approach.I'll quickly pause here. Maybe I have one more slide, sorry.

Important Things to Remember

Certain things to remember before you go about doing these kinds of things. Your false negative is one of the most important things that you have to focus on before you build these models. Any amount of reduction in terms of false positives is perfectly fine. We have seen with a very strict guideline where the risk office has told us 0.1% of false negatives is all they can handle. We have seen about 16% reduction with close to about 5% false negative rate. We have seen close to about 87% reduction in false positives. There's a huge range depending on your false negative amount that you can pin on. Which is why how many false positives you can reduce is secondary. Try and focus much more on what your risk offers are actually telling you. Based on that your model will be much better. So having talked about the false negatives, the other thing that is also very important is, you have to make sure that your data is of good quality. That's one of the most important things. If you do not have KYC information, if you do not have enough transactional information, then there's no point in building these alerts because it's not going to be valuable enough for you. Having said that, I'll open it up to some questions if anybody has any. Thanks.

Type One Error

Audience Member:

I like to understand the type one error is the false, how do you say that? The type one error?

Ashrith Barthur:

In this case we actually have false negatives. We break it down into false negatives and false positives in this case.

Audience Member:

So what you're saying is, what is more important?

Ashrith Barthur:

Focus on your false negatives to keep it as minimum as possible. If you have a good enough model your false positives and your two positives will get classified.

Audience Member:

It's more risky to have a missed opportunity.

Ashrith Barthur:

Yes, that's it. I think you had a question.

Why Don't They Improve the Rules Based System?

Audience Member:

You said the existing rule based system is very expensive, right? Is it millions of dollars?

Ashrith Barthur:

Yes it is.

Audience Member:

So why don't they improve it? I mean, if you're paying that much money and if I buy a system and I generate 99% wrong data, I want improvement.

Ashrith Barthur:

Fair enough.

Audience Member:

Can you use this system and feed it back to the rule to improve their rules based on your learning here?

Ashrith Barthur:

I would say about four years ago. Just around four years ago most of the modeling that we were doing. We were actually feeding it back into the rules. The financial institutions were much more risk overs. Which means that they were using it to tune back their models. Now it's a slow process. One of the reasons is that we are trying to push ML in these areas, right? We have to understand that most of these systems are actually vetted really well by regulatory authorities. There are lots of rules that they follow there. The rule generation systems actually follow a lot of rules decided by FinCEN as the agency out here. There are central banks across Europe, Asia, and all these places. So having said that, it'll take. Replacing ML for any rule based system is going to be a time consuming process. These guys have invested a lot of money people in it. To get them to understand that this moves quickly is going to take a bit of time.

Audience Member:

Can you use this as feedback for them to tune?

Ashrith Barthur:

Yes, you can.

Audience Member:

So it's not replacing?

Ashrith Barthur:

One. Yes, let's help that tune. If you want to save that 50 million that you're paying. You can replace it, you can

Audience Member:

2 million to 9%. 60%, right?

Ashrith Barthur:

You can do that. A lot of them actually used to tune it earlier, but now they use it in parallel or they use it for automatic classification. I think it's a slow process. It's a step by step process. Probably the next one would be where we see a complete ML domination in terms of this. Yes, please.

Audience Member:

Great cut. Is it complete?

Ashrith Barthur:

No. In this case, we are looking at something called as. One of the things that we look for is fast money transactions which can involve your card action. Credit card fraud itself is very different from this use case. I think there was. You had a question? Yes.

Government Regulators and AI

Audience Member:

Have you guys found it while trying to sell the solution to institutions? I don't know how much success or not that you've had so far, that the regulators here are not ready to accept machine learning as a legitimate way to detect money laundering? Cause that's something we've come across too. That's why financial institutions choose to buy/optimize FICO. The regulators are very comfortable with them. It's very easy to say why it made a decision because it's literally a rule. It was, "oh, it was X amount of transactions and x amount of time of this amount or whatever." With machine learning, unless it's explainable AI.

Ashrith Barthur:

We don't necessarily see that question as a problem. One of the reasons is because the documentation that comes out of driverless AI, the explainable part, is looking at a lot of features. Gives you the option of simplifying your features. Something that you can do in driverless AI is you can say, "I do not want a complicated stack model or an ensemble or any of these things. Give me a very simple model at that time." It'll just give you a very straightforward model with all the features that are very statistically approved. I think this is what you were hinting towards when you're saying regulators look for. That solves the problem for us. There are a lot of financial institutions who say, "yes, with the regulators, we'll work with these kinds of models and these kinds of features, but for us to show me what's the next big thing." That's where we separate this space and we say anything that's getting kicked out. Use the old school. Anything that you want to explore, use the new age, not new school. I'm sure. Yeah. Yes, please.

Performance

Audience Member:

What are the results that you see for the automated machine learning system in terms of performance compared to the rules based system?

Ashrith Barthur:

And when you say performance, could you classify it? Is it system performance? Is it a rule classification performance?

Audience Member:

In terms of I guess F1 score, is it much better, similar, or not as good?

Ashrith Barthur:

It's way better compared to what we see. You do have to understand that we are working post alerting systems. In this use case we are working post alerting systems. Which means that things that were lost by the alerting system are already lost for us as well because we are using their output as ground truth for our processing.

Alerts Are Not Targets or Cases?

Audience Member:

Also a second question. When you have the target and you also have cases, what is a data point that is neither a target nor a case? Is that just a flagged event that was only flagged by a small number?

Ashrith Barthur:

Yes. That's a fantastic question because we try to exploit those kinds of alerts a lot more. Usually what happens is in the investigator realm, they just ignore it saying, "this is useless, nothing's going to come out of it." For us repeated alerts, every six months or every month, there is one alert that's coming up. The investigator said, "no, this is nothing very interesting for us in terms of behavior." So we've been able to squeeze those kinds of alerts for a lot more features. I would say we've been successful, but it's not very rampant, it's not obvious but you can squeeze out some features from that as well.

Unsupervised Model

Audience Member:

Oh my question is is there any way to twist the model so that is unsupervised?

Ashrith Barthur:

When you say twist, merely remove the labels and I start building the model unposed? Or are you looking at it in some other way?

Audience Member:

Yeah, like any opportunity.

Ashrith Barthur:

Any approach. So one of the ways that we do this is there are lots of situations where certain rules. The gentleman behind you was mentioning about not much data in terms of labels and then what do you do. So at that time, what we do is we try to look for similarity between unsupervised and supervised models to see if the same features come in. We use a lot of clustering techniques. We try to avoid anything that is a reduction in dimensions because if you try to do dimensional anti-reduction, then essentially it's a lousy problem. Meaning you're losing a lot of information. Before we come to any of those points, we try to see if we can recreate a supervisor model using unsupervised approaches.

It's not as straightforward as it seems. That's essentially how I can word it. We try to recreate the model in the unsupervised space and can classify the same way in the supervisor space. That helps us solve certain problems where we don't necessarily have labels. There are situations where investigators are not fast enough to give us the kind of alerts that they can. This problem ties back to the gentleman asking about fraud. Investigators cannot solve fraud as quickly as possible. There are fraud alerts that come through as well. There are certain rule based systems which tend to classify it. We use that approach in fraud. Three questions.

Training Required Before Effectiveness

Audience Member:

How much training data do you see before your systems at a certain level of effectiveness?

Ashrith Barthur:

Good question. So in our worst case we have seen close to about 6.5 million transactions a day. In the worst case. In the best case, this is probably the goal for people who do data science. We've seen 1500 trillion transactions a month. That's our best case. 1500 trillion transactions a month. Sorry, 1.5 trillion transactions a month. Yeah, that's the number.

Audience Member:

I think not really guide the how, the balancing data balancing you ask the question.

Ashrith Barthur:

That's what we do. Once we enter the zone where we need to balance, we change the problem. An unwise problem, we do not make that supervised anymore.

Audience Member:

So you mean the data is balanced because alerts itself is in the feature, right?

Ashrith Barthur:

No. We don't necessarily have to balance the data itself. We don't oversample to balance the number of alerts that we have. If we have really low yield, then we move the problem to an unwise problem. Does that help?

Audience Member:

Oh, okay. Yeah, I think maybe the iPhone score itself is Okay. You don't have to balance the data. You optimize the.

Ashrith Barthur:

iPhone score, you optimize on f1. Yes.

Audience Member:

Okay.

Ashrith Barthur:

Yes, you.

Audience Member:

Sorry. Maybe you say the beginning again. I arrived late. Given that the amount of cases that you analyze is so high. I'm curious how many predictors do you have in your model?

Ashrith Barthur:

There are certain restrictions. There are certain restrictions that sometimes we have to play with. In our worst case we've had 129. In our best case, we have actually hit close to 1200. Yes.

Audience Member:

To complete this in general, which ratios are numerical and compared to categorical? I mean 50-50, 20-20, 80.

Ashrith Barthur:

what would be the ratio for?

Audience Member:

The ratio between numerical and categorical?

Ashrith Barthur:

Oh, in terms of features? Oh I would say 95 to 97% numerical. Good. Thank you. We also use hierarchical features, if that helps. I think there were two more in the back. Yes.

Audience Member:

At the beginning you mentioned that you were doing packet sniffing. Do you do it through a wire or wireless networks?

Ashrith Barthur:

I did it on both sides, wire and wireless.

Do You Use Consortium Data?

Audience Member:

I'm almost a hundred percent sure you mentioned this in the beginning. I apologize. You're using the financial institution specific historical data. Their unique data. You're not using consortium data? No consortium data model?

Ashrith Barthur:

Nope.

Audience Member:

Yeah, we don't either. You guys don't do that. You've found that obviously it doesn't work because what you see at even a specific financial institution within the same country will look different. 

Ashrith Barthur:

We don't necessarily use any form of consortium data. The way we have done it is we've looked at different behaviors across different geographical endemic zones for that matter. We have used things that are helpful on the other side. features and behaviors that can transfer between zones.

Audience Member:

You're looking at these alerts or you're sifting through these alerts on a batch basis, I assume. Right? Not in real time because there's no point.

Ashrith Barthur:

These alerts. I think banks have enough time to actually catch a person who's doing my logic. So most of these systems work in a batch. We give them an option of deploying it real time in case they want it. You can work the same thing in real time as well. Because of how the banking systems work, most of them deploy it in batches. Which means that it happens in the morning, early morning, or something like that. There are some financial institutions that are not necessarily working in retail. Which means that they have a different need for them. This works in real time.

Audience Member:

Thank you.

Ashrith Barthur:

Any other questions or are we good? Great. Oh, there is one more.

Individual Automotive Transactions.

Audience Member:

What is the maximum individual automotive transactions that you deal with in the model that you mentioned? Because these seem to be the regular transactions, not in hundreds of millions to 50 million, or higher amounts.

Ashrith Barthur:

We've seen amounts that I would just stop at saying in the millions in the

Audience Member:

Only one or two. Right? Not routinely.

Ashrith Barthur:

No, quite a few.

Audience Member:

So you deal with the standard charter banks and that kind of a thing?

Ashrith Barthur:

See different kinds of cross border banking as well. What you saw in the dataset is retail. Which means that it's people like us. It's not necessarily institutions doing transactions or things like that. Their transactions tend to be much different. You would see behaviors in wholesale, institutional.

Audience Member:

Cross border level.

Ashrith Barthur:

You would see on those scales, you would actually see larger times.

Audience Member:

The question about the government. Do you use any government banking system? What do they do and the investigative branch, etc in different countries?

Ashrith Barthur:

No. We don't.

Audience Member:

They don't have such an account.

Ashrith Barthur:

All right guys. Thank you. Thanks for the evening.