Fighting Transactions Abuse Using Artificial Intelligence at H2O World Sydney 2022
In this Technical Track session at H2O World Sydney 2022 Dr. Luiz Pizzato, CommBank, Executive Manager of AI Labs, Shivam Bansal, H2O.ai, Senior Principal Data Scientist and Kaggle Grandmaster, and Genevieve Richards, H2O.ai Customer Data Scientist discuss AI-based solutions for transactions abuse.
Read the Full Transcript
What is CommBank Doing to Fight Financial Abuse?
It's great to be here. And yes, I'm going to introduce everyone else. Jen was also very involved in building these models, which is very exciting as well. So, first of all, let me just go through this slide. Obviously, when you look at statistics like this, approximately one in four women and one in 13 men have experienced violence by an intimate partner since age 15. You see the scale and the sheer importance of this problem, right? And CBA has the next chapter program. They're really focusing on helping victims of domestic violence. And the way we do it is actually in three key areas. One is supporting, advocacy, and prevention of domestic violence. Therefore, actions on the next chapter program. The first one is focusing on leading the industry in providing care for customers impacted by domestic and family violence.
What we do is that we support them with immediate financial assistance and banking needs through this specialist team of community wellbeing. Second action that we have is expanding support for long-term recovery. Which means that we are partnered with Good Shepherd to deliver a financial independence hub to assist survivors of financial abuse and provide long-term financial independence. And this is regardless of where they bank. The third action is actually to build a fact base and raise awareness of domestic violence. We are increasing support of a community and industry understanding and this is including a financial abuse research partnership with UNSW and a series of community and sector partnerships. And the fourth action is actually looking into our own products and finding where there's issues in the products and how we can actually improve our processing services.
How Do You Recognize Financial Abuse?
And this is exactly what we've been talking about today. So we are working to reduce financial abuse in our own products. So let me go through what we are going to go through today. And you heard today about how we identified, I think it was about three years ago, we identified that people were sending abusive messages to customers, right? Via transaction descriptions. So this is our very low value payments that might contain very abusive messages, a very traumatic way of happening. And obviously we jumped into that and implemented a solution, an initial solution, which is literally a filter of keywords in our bank app. So obviously those messages have a lot to do with swear words, right?
But not only that, there's some words that are related to abuse. And you see one of them is like "unblock me" right? And this filter was implemented and it really helped reduce abuse in our platform. Now we noticed that that wasn't enough. People are really motivated to send abuse. They will not be disturbed by a block, so they try to go around it. And that's where we implemented an AI system that monitors those transactions and tries to really, really find the high risk abuse. And I'll work more in detail about that. Now the really exciting thing is that we work there and we work within Australia to solve this problem. And now we really look into that collaboration with H2O to make those models that we implement as CBA available for any institution in the world, right? And through the H2O AI partnership people will be able to take our pre-trained models and kickstart their process so they don't have to start from scratch. They don't have to start from trying to identify abuse, and they can start from the model the CBA has built and it's really democratizing and enabling that solution for the entire world.
Now let me go a little bit more on those things with details. So, and I do apologize for the language, if you see in this slide, right, this is a serious problem and people who are suffering from abuse see much, much worse. And a lot of times it's not just swear words, right? Things like, I love you. I saw you today. And those things could be really, really threatening if you have a relationship where you can't see each other, you shouldn't see each other. So here's a serious abuser. All the messages here are made up, but they're very much in the spirit of what's happening. This person has sent more than a thousand abusive messages, all very low value. One cent. And it could be to their ex-wife, could be to their children. It's horrific, right? And working on this problem is not an easy thing. Now, as I was mentioning we implemented a filter, a block. That's our first line of defense. What happens is that once you use some of those words that we identify as abusive we won't let that transaction proceed.
CommBank's Current Success Stopping Abuse
We blocked more than 60,000 messages in November, 2020 on this, right? A lot of them were abusers. And we tracked whether people have changed the message. As you can see here on the side, blocked messages and actual messages, we could see whether people have retried to send them. People can go and try to go around them and change some words, removing spaces, adding dots to things. What we notice, and it's very, very good, is that almost half of them didn't resend. That means that filter is working, it is preventing people from sending messages, but still not enough. When people are really motivated, they go around it. And sometimes it could be simply friends sending them messages and a joke to another friend when they're paying, and that that's still not okay, the type of language. And people might try to still send a joke and they might modify it, but in some cases we don't pick up on those, right? So you can see the last example here, that that transaction was not detected in the future beforehand, but it has some words that are very abusive. Normal words, just the context matters.
Using AI and Deep Learning to Fight Abuse
So what we did is implement this second line of defense, which is an AI system. And now we are not really truly concerned about the single message of the abuse. We actually really want to look more holistically into the relationship of these two people by sending messages to each other. And I don't mean relationships trying to understand whether they are wife or husband or not. We're looking into the transactional relationship. And the next slide, you will see a little bit more in detail for that. But a transaction between two people by paying each other for a coffee or something else, it's very different to a transaction of someone trying to send these messages, right? And that's why we're trying to pick it up. So as you see here what we are worrying about is the extremely abusive message, the high risk ones.
And it is not you sending a message to your friend with swear words, which is absolutely unacceptable. But they're very different things. As you see the top people here, the abusive relationship is very different to the non-abuse transaction. The same person can obviously transact normally with someone else and transact very abusively with someone else, right? The interesting thing is that we look at several signals for those messages, and you can see, oh, it's very easy to distinguish between these two. But I just want to highlight one thing. People, amazingly, it's a funny thing. These conversations happen in the banking system, shouldn't happen, right? But some people seem to run out of money on their SMS and they start sending messages to each other as a conversation, right? Those ones, we still train on those ones to pick up that those ones are not abusive, right?
There's a difference between a conversation and an abusive conversation. So the features that we use to distinguish between those two types of messages vary. So we use a lot of NLP, natural language processing. So we use a lot of Deep Learning methods like birth and codes to try to capture the semantic relationship between those words and the message. Use toxicity models. How toxic is the message being sent? Use sentiment analysis, emotional detection, but we also use features that are completely independent of language, right? Like the number of transactions sent, the length of transactions, the median transaction amount, a lot of information is about the transaction itself, which also gives away transaction abuse. And that's how we distinguish between abusive and non-abuse. So this is my last slide, and I just want to highlight the size of the problem and why we are actually doing it with the information.
How Do You Train a System to Recognize Abuse?
So it's millions of transactions a day, as you imagine in the banking system, and less than 0.0005% are abusive, right? So it is a needle in a haystack, right? Obviously there's a lot of challenges, how do you train a system to do that, right? How do you actually pick your information, how do you retrain? But also I want to highlight how important and how we deal with that information. And I didn't mention our custom vulnerability team. So what we do in this funnel is obviously, all those millions of transactions, we try to identify the ones where the relationships might be this highly, highly abusive relationship. And every single case that we identify is handled by a team, which is trained to do this work. Obviously, if we as a data scientist pick up that information, we don't know what to do with it, right?
But we have a team that's trained to deal with that information, and they study every action that they will take with those cases. And it could be contacting the perpetrator and saying, please stop. And a lot of times, this simply works, stop doing this. We are watching you, right? And monitoring that relationship, right? If they continue abusing, they could be actually unbanked in the system. So there's monitoring going on but it also could be contacting the police. There's a lot of actions that we are allowed to take and we do take for their abuse to stop. So with that, I'm going to leave to Sivan to, to talk about how that relationship with H2O is evolving and how we are making this available for everybody else in the world. Thank you.
Future Goals For AI and Machine Learning
Thank you, Luiz, for sharing very useful insights about this problem. This is definitely one of the very important problems for a bank to solve, not just for CBA, but for many other financial institutions. So I'll be talking about the collaboration, how H2O.ai and CBA are doing on this use case, and what's next on this use case? What is in the roadmap, what's coming? So for the collaboration, there are six main objectives. The first one is to develop a full fledged structured framework, structured solution for the entire work that CBA team have done. Add the data labeling and annotation capabilities. Add the concept of pre-train models so that it becomes easier for other financial institutions and banks to quickly adopt this solution. Add additional model experimentation capabilities, because there may be additional ideas that can be plugged into the whole pipeline that may add more value. Add model explainability, and add an AI application on top of the whole solution so that different levels of stakeholders can easily use this entire solution.
So let me go through these one by one in a bit more detail. So first one, like I talked about building a structured framework for this solution. So what Luiz' team has done, they have built this solution. And in the solution there are various components like Luiz talked about. It has a data fetcher component, data procurement, adding a number of features through feature engineering, adding different machine learning models from simple ones to even more complex ones, and then generating the outputs, validating them, which can be consumed by the business. Now, one of the goals, which we talked about, we want to release this type of a solution for other financial institutions, and we don't want them to redo all the work from scratch. Essentially, we are looking for a generalized solution. So to convert a CBA solution into a more generalized solution, we need to stitch all those components together one by one, and we need to make the models so that they can be accessed via APIs.
CBA's Generalized AI Framework
They can be accessed in a more functional approach. So, even if there are different teams who want to replicate the learnings, they can probably just call a function, call an API, get that result immediately, and move on to the next stage asset. So that's where our team is currently focusing on building the incredible model Luiz's team at CBA have developed in a more generalized framework. The other important point about this problem is the data scarcity problem. Like we saw it has like 0.005% transactions only flagged as abuse. So it'll be the case for many different financial institutions as well. And to really identify that maybe to get started, humans need to label the data first. And this data labeling is really a time taking process. It takes a lot of time and effort.
Using H2O's Label Genie to Label Data Sets
So if we can come up with an idea of using, say, tools like smart learning or active labeling to accelerate the labeling process and annotation process, that also cuts short the time to get the quick models out. Essentially, that's where our team is also exploring ideas of adding the layers of active learning, where humans need to just label a few sample rows of data. And then immediately the model also will predict what has been annotated so far and what is the likely next annotation that will save a lot of time in the new annotated or labeling data sets as such. At the same time, our team at H2O is also working on a different product, which we call Label Genie, which is based on the same concept of active learning or smart labeling. And it provides data labeling and annotation capabilities across various different data sets. It could be text data, audio data, image and video as well. And all of this is focused on providing quick label data sets to get that quick start on any different, any new use case. So we are trying to blend this concept in the transaction abuse as well as in the automation of labeling.
Now, the next part is about the concept of a pre-trained model like we discussed for any new financial institution or a bank to get them to quickly start to save them a lot of time that may go into adopting this solution. We want them to get the baseline model as quickly as possible. And the other reason, like I talked about, is that they may not have a lot of training data. Now, to solve this type of an issue, we can convert the existing CBA model in the form of a pre-trained model. So that pre-trained model concepts which are based on transfer learning, a new bank or new team, they don't have to retrain everything from scratch. They can use the existing model, finetune it on their dataset, and then start making iterative improvements. Start improving that model on their dataset eventually, and maybe in one of the future iterations, they can replace that model with maybe their own data, because by that time, they will have a more relevant corpus according to their domain, according to their customer's assets. So that is one of the areas where our team is working on with CBA.
The Role of Experimentation in Fighting Abuse
Next one is about adding different experimentation idea. Well I would say from my Kaggle experience as well, experimentation is really one of the skills that I have learned. By experimentation I mean we really need to try various sorts of things. What may work out, we really don't know. CBA have already added a layer of sophisticated feature engineering model capabilities using techniques like BERT Embeddings and so on. Additionally, we want to try more. We want to see if there could be any other hidden features that haven't been explored yet that may be added in the whole existing pipeline that may improve the current model as well. That's where we are trying ideas on feature engineering using text character level embeddings, word embeddings, adding them to different models, deriving new features. Again, adding them to main training data set in order to see like if the existing model improves or enhances over time. And at the same time, we are also exploring the ideas of AutoML in this whole process where we are exploring if the whole modeling effort can be automated using tools like Driverless AI, so that again for any new team, any new group or bank they can again, quickly get that first model out and then start fine-tuning it to start refining it as such.
How are CBA and H2O Collaborating?
The next component is about adding a layer of model explainability. So this use case is really important as it belongs to the financial industry, which is heavily regulated. So we really need to understand what goes behind this model from a regulation point of view, maybe for a more transparent point of view. And also to explain why, let's say we are one person's transaction, why not other customer's transactions? That's where a layer of model explainability is required. And that's, again, a collaboration effort between H2O and CBA. And our goal is to provide post modeling capabilities. Post modeling essentially refers to once this entire pipeline is developed, that starts from data integration to future engineering model training, generating the outputs. Then we focus on explaining the entire pipeline by applying techniques like sharply recent codes or maybe deriving the future importances, or surrogate model trees. Disparate impact analysis in order to understand there could be bias in the data or bias in the model.
And can we correct that by going back in the original pipeline? We are trying sensitivity analysis or what if analysis in order to check if there is any problem in the data of the model. How does the model perform in different scenarios? So the same model, which the CBA team has built, can be tested across different scenarios and scenarios can be split up until say, by a different category. It could be let's say different states may have different behavior. In the future different regions, different countries may have different behavior. So we need to test it out, the robustness of the model, how the model performs across those geographies, across those situations. And then if there is any correctness, any rectification is required, then data scientists can go back, fine tune the model again and provide that.
Are Abusers Getting Smarter?
We also are focusing on model backtesting in order to ensure that relevant features, relevant patterns in this use case are captured over time because as we provide these types of solutions, even the abusers become smart as well. They keep on changing how they have been abusing the system messages. So that's what we also want to do. We want to back test our models that we built today, how they were performed in the past, and how they're likely to perform in the next future dates. And again, if there are some relevant insights that can be added as part of the model training process itself. So our modeling effort becomes more stable, more robust, and gives a better model as compared to, let's say, the previous versions. And we are adding a component of automatically generating model documentation to give, again, more transparency into what goes behind this whole system.
And all of these ideas are linked to the main goal that we want to release this whole effort for, let's say any new bank or financial institution to give them more insight or more overview of the solution as well as the pipeline that we have built. And last but not the least, we want to convert this whole solution into an AI app. So that this type of an application may not be restricted to only let's say the data teams or data scientists, but we also want to democratize this solution to even other groups which may be business people. Now one of the things in the life cycle of data science is there are two groups of people. One is people who make AI models. Data scientists, software engineers, analysts. The other group is business users who want to consume the output of those models.
Using H2O Wave For Business Users
And then when data scientists want to share those models, they probably have to collaborate with, say, UI/UX teams. Sometimes they have to convert those models in the form of dashboards or reports, which takes a lot of time, because there is a gap between these two teams. So essentially we want to provide the capability of converting the models built by data scientists in the form of AI apps, so that it can be automatically leveraged by the business people. And that's where we are leveraging the H2O SDK, H2O software called Wave, which is a low-code, python based framework that allows the capability of building AI apps directly by data scientists to the business users. So Wave, as I said, is a low-code pure Python based framework to convert models into apps.
These apps can be realtime. They provide capabilities of real time syncing, realtime predictions. So even if, let's say, hundreds of different users are accessing the same output or the same solution, all of them will see the same results because there is a server behind the scene which propagates the information to all the instances. Wave provides easy to use customizable UIs and snippets. So data scientists don't have to take care of, let's say how to do designing, how to maybe add visualizations and charts. Wave provides all these capabilities which can be accessed as it is and they can be converted into AI apps. And the time to develop the AI apps is really quick I would say because it reduces the time to getting that model out to embed that into an app very fast.
And that's where Wave provides that capability to just to reuse some of these templates, some of these components, and convert them into AI apps. And these apps can be deployed anywhere. They can be deployed on various servers and even on different machines, local machines, as well as on mobile apps as well. And they can run anywhere again to provide more accessibility of the outputs, more accessibility of the models that data science teams might have built. In fact our team has been working with CBA on this Wave AI app development for transaction abuse for the past couple of months, and Gen will be demoing that app. So I would like to invite Gen to showcase that app. Thank you.
Training a Data Set with Wave AI
Awesome. So taking the great work developed by CBA to solve this problem and then utilizing H2O's Wave low-code dashboarding capability, we've so far built out a transaction abuse waiver. And at the moment it really focuses on three main modules of functionality. The first one being that model scoring component where we take the pre-trained model from CBA or even your own model and score that on new data in your transaction estate so you can get an initial result of what's going on in your own institution. We then move to the next module where we have that feedback loop functionality, that human in the loop. The idea behind that is, using the labels generated from the model of the scoring component, we can then start to generate that training data set. So in the future, you can build your own transaction abuse model on the basis of your own financial institution, which is where we come to module three with the model training functionality. So utilizing the great work and the generalized framework that we've talked about so far today, you can then train your own model that can then be deployed on the basis of your own transaction estate. Just want to note and let you all know that all of the data today, the models, the examples shown in the presentation are synthetic data, and we're prepared only for the purposes of the demo.
Awesome. So when you're taken to the Wave app of the current AI transaction abuse detection app, you're shown the two pathways to interact with it. The first being the scoring component, and the second being the training of the new model component. In order to score your own model, you need to bring your transaction data sets and the history of those relationships for at least a month. There are multiple ways to load a data set. Today they are loaded locally from my laptop, however, connected devices such as S3 buckets and the rest have also been configured to allow for ease of access to do that. Once you upload your data set, it lets you know the number of rows that are currently inside there, as well as the length of the data set that you've loaded, and it gives you the ability to view the underlying data set and what's going on, which is really going to help you when you need to configure and map those columns in the next slide.
Scoring Data Sets With Wave
So in the data set today, we have user_id, which relates to the person who sends a transaction. We have user_recip which relates to the person who receives the transaction, as well as the description, the amount and the text date those transactions were sent. So what you'd need to do is just map those in the column here. So user_id to use the sender, user_recip, and so on and so forth. In order to allow that automatic pipeline to generate from underneath. The last thing you do is then select the model you want to use to score your transactions today. So I'm going to use the CBA transaction abuse model and click score. What's going on in the background here is really that pipeline, that generalized framework that we've talked about where we've generated the feature engineering and then scored the model that shown here today.
And you're directed to a more dashboard and reporting component of the app from here. And these can be completely configured on the basis of what's most important for your reporting. However, we have the transactions scored, the number of relationships identified by the model, whether any of theirs have been repeat offenders to their continued abuse over a period of time, the number of customers that potentially have been contacted over time, as well as if any warning letters have been sent. We then get a breakdown just below of that on a month by month basis as to the effort that's so far gone into that. Finally, down the bottom you're shown the results of your model and your transaction estate. Here you can see the person who sent, the person who received, the month in which those transactions were scored, as well as the score from the model.
Interpreting Data Models and Recognizing Abuse With Wave
You can also see this human results validation column here, and that will make a little bit more sense as we go on, but at the moment, they're all currently classless as class is not checked, and that's where the feedback loop really comes in. So if we take a look at the profiles tab here, we're looking at the user relationship of User 30 to User 31, and you can see from the middle graph here that it has a relatively high abuse score of 0.997. What we can then do is move across the page and get a really high level view of some summary statistics about that relationship. Whether they're repeat offenders, the number of transactions potentially in the month is 48 here, the average transaction amount, which is $4, which is relatively low, as well as if they're transacting with any other people within the transaction estate.
If we scroll further down, we can take a look at the types of terminologies and things that are being sent from User 30 to User 31. So we can see references to kids, answering the phone, coming over, maybe some swear words, however, I've bleeped those out for ease of use. And we can get an overview and really quickly identify from that, whether this is actually a case of that systematic abuse or not abuse, which we can pass back to the model. Finally, to dig even deeper into this relationship, you can take a look at the transactions that have been sent to and from these people. So we look at User 30 to User 31, sending a lot of abuse and systematic words around wanting to see their kids, some fainted insults, and then threats. And you can see that User 31 here has responded saying, please stop.
Using Human Feedback to Train Data Models
Please don't, please don't do this. Which is really a great warning sign for someone to quickly identify that this person needs help and might be someone you want to talk to. And this is where the human feedback and the human in the loop is really involved. So we can confirm this as a case of abuse. And this can be utilized in two ways to let people know that yes, this person needs to be supported, but also utilized to build your own training data set in the future to build your own models in the training data set module. You can also see from this example here, we can quite quickly identify that yes, this isn't really a case of abuse from the terminology that we see down here. And quickly pass that feedback back. The final module we have is the training data set module where you can bring your own data sets and train your own pipeline utilizing the same functionality.
To do that, you need to bring the same transaction data set that we saw before, as well as a labeled relationship data set. So taking our example from before we have User 30 to User 31 in the month of August, and that was classed as abuse, so that's a one, and then 17 to 10 is not abuse, and that would be a zero. So passing that in here, we also give you the ability to limit the scale of the problem of what you're trying to do. And this is really based on the financial institution's definition of abuse and what you're trying to identify. Potentially, there's terminology in there around the minimum number of transactions required to be classed as high risk abuse. So this can be added in here in order to reduce the scale of the problem. Millions of transactions a day is obviously quite a lot for a model to go through.
We then give you the ability in the next column to configure the types of feature engineering that you want to go into your final model. So we have those simple text features around the length of the transaction, the number of capitalization, and things like that, as well as those BERT Embedding based models that help you detect the emotion or the toxicity of the transaction description sent, as well as some sentiment models there. And those are really your transaction NLP features that you're utilizing. Then you aggregate those up to the relationship level. So the User 30 to User 31 level, and you get those features Luiz mentioned, that aren't text based, such as the number of transactions, the range and the spread of the dollar figure amount that you're using, as well as potentially time-based figures like has this been going on over the whole period of time, or is this just all in one day?
And things like that to help the model learn. We can finally get the last contextualized features there as well, where we take a look at the reciprocal of that relationship. You'll remember Luiz mentioned that some people like to converse through transaction descriptions. So having conversations about meetings at shopping centers and things like that. But that's not an abusive relationship just because they're sending low value transactions to each other. So including that reciprocal, those reciprocal features of that relationship gives you the context as to whether we have people that want it to stop or we have people that are just conversing and having a conversation. On the last side, you're given the ability then as well to utilize some potential undersampling or oversampling methods for your data set. Configure the number of validation splits you might lack, as well as different scores that might be utilized, such as an F1 score or PRAUC dependent on that. And then finally, configure the model algorithm that you'd like to use, such as H2O AutoML and a range of different solutions. From here, it'll take you to a screen where you can really deep dive into how your model has performed and start drilling down and using those explainability concepts such Shivam mentioned as well.
Cool. So there's a lot of people that were involved in this process, and so we wanted to call out those. So the AI Labs team at CBA, Luiz, Anna, and Kaavya, who's in the audience today. We really want to thank the customer vulnerability team at CommBank, Caroline and Craig, and then those involved on the H2O side, Shivam, Chetan, and Tianchu, who's also in the audience today. Cool. Thank you very much.