Democratizing AI at Intuit: No-Code and Self-Serve Tool for Building & Deploying Models
Watch this session by Mi-Yan, Senior Data Scientist at Intuit and Jolene Huang, Staff Machine Learning Engineer share Intuit's journey on how its ML Platform team is automating various stages of the model development life cycle, making it easier for MLE and DS to build and deploy ML models.
In addition, the ML Platform team wants to provide a no-code solution to analysts and fully automate model development for them. As part of this automation, the team has enabled the capability to use AutoML provided by H2O DAI to build models.
Mi Yan, Senior Data Scientist, Intuit
Jolene Huang, Machine Learning Engineer, Inuit
Read the Full Transcript
Good afternoon, everyone. My name is Mi Yan, data scientist from Intuit. Today Jolene and I will present the democratizing AI at the Intuit. So as we know, the lifecycle of the model development consists of data exploration, feature engineering, model building, model deployment, and also model monitoring. So our machine learning platform at Intuit focused on automating all these stages. So our goal is not only to make it easier for data scientists and also machine learning engineers to build and deploy models easily, but also we would like to provide one no code tools for other internal users at Intuit. In other words, fully automate this whole process. As a part of this automation, we use the automation learning provided by H2O, the AI to build the model. Today we'll share the journey with the community.
Here is the outline, I will describe the motivation to build such an AI service firstly. And then next I will introduce this AI service from different angles, including features, customer benefits, user cases based on these services. And also Jolene will introduce the AI service enablement and the architecture. Lastly, the future works. First one, as our Intuit CTO mentioned, as the AI driven expert platform, Intuit is rapidly innovating to solve our customers' most important financial problems. So currently there are 58 billion machine learning predictions per day at Intuit, and also 330 million AI driven customer interactions per year at Intuit. In other words, there's fast growth and high demand of AI at Intuit currently. So that's why we need AI services.
So, as I mentioned before, there's no code and it is self-serve and can reduce time to inside from data by automating various phases or ML ops. And also we leverage auto machine learning provided by H2O AI and this service built on Intuit platform with guardrails. The last part is integrate with intuitive tech ecosystem capability.
So what our internal customer can benefit from this service. First one, we can accelerate model deployment by reducing time to build a model from about six to eight weeks before to two weeks. And also automatic hyper parameter tuning use machine learning auto machine learning engine can improve the model performance significantly. The third one is we can detect potential data leakage, and also some potential data shift. And also the one advantage is automated model deployment, which is very important for this service. Last one is integrate model monitoring. Next I will introduce the four user cases based on the user service. There are four categories here.
Customer Churn Prediction
The first one is the classification and the regression based on some tableau data. First one is builder. We build one customer churn prediction model and then run a gap analysis to guide action or improving customer certifications. Here, first, this service provides the guardrails during the EDA part. Here are three small part, for example, drop design constant features, and also detect some data leakage. First, we detect some potential data leakage and discuss with users to understand the logic so that we can confirm its data leakage. Third part is a data shift. We detect the data shift between changing data and the testing data, which helps user better understand and prepare data. And then the automated feature engineering capability created some new features, which improve the model performances significantly by about 20%. So in the final model, the model performance in terms of the AUC is about 0.96, which is quite good. And the feedback from our users that the machine model is more accurate when it comes to identify propensity versus the older risk models. So we are very happy about this result.
Natural Language Processing
Next one is our natural language processing. So in this project, we identify prospects with a high propensity to purchase service based on some clickstream data. In this here we test two models. One is a tf-idf, the other is BERT. So as they expected the running time of tf-idf is much shorter than BERT, however, it is kind of surprising that the tf-idf also outperforms BERT. So, because when we talk about the nlp, usually we said, okay, BERT is always outstanding. However, in this case, tf-idf is better. So the reason is because in this case, the input data is phrases, not sentences, which BERT is good at that.
So that's the reason. And also in this project, we provide one capability, which is called customer predefined threshold. So before we kick off with this project, we ask the customer to provide one structure code in one threshold in metrics which they expect. So in this case, we can know if this model's performance is meeting their expectation or not. So here, the expectation from a customer is recall larger than 0.7, at the same time f1 larger than 0.4. So we can see the initial model, which is in red is not that good especially in the recall part. And after sign efforts in the final model, we see both of them exceed the threshold. So that is a successful project in this case.
Time Series Forecast
Next one is time series forecast. At Intuit, there's a product like TurboTax and in the US every year, the tax date, usually around April 15th. So in this case, we can expect there will be some seasonal yearly pattern in some unit sales or traffic. So our service can provide this kind of prediction so that we can assist with yearly resourcing and the forecast. Here's one example. Y axis is relative values, and x access is a date. So we try to predict the 2022 performance in November, 2021. So just assume today is November 2021, and we try to predict next year sun, no matter the unit or sun traffic or sun still. So like that. So we predict the red curl, which is a prediction, and then we just wait for about four months, get the blue curls. So there are two observations here. First one we can see overall looks quite good, especially after February 15th. We can see not only the patterns, the seasonal pattern matches quite well, even the weekly patterns.
The small wiggle also matches pretty good. And you may also notice there are two picks. One is around February, the other is around April 5th, which is a business. Most interestingly, we can see that the relative error is only 2%. So that is pretty good. The second observation is, we may notice that in February, there is still a big discrepancy between the prediction and the real data. In this case, we need some domain experts' help. I just recall the first big measure, we need to find a new space for human. So here is why we need a human being here. Here is a time series for customer use case. Last one is image classification. So if you use the TurboTax, you may just have a impression. TurboTax may ask you to upload your W2 forms, something like that.
So in this case we use this AI service to classify customer uploaded the documents automatically to improve user experience. Here is one user case which is seven class classification in documented classification. So in the confusion matrix, we can see there is a diagonal, it's correct prediction. And I just list the error for each class in the left column, which is not bad. And overall the accuracy is 95%, which is pretty good. So this is a last user case I would like to share. So next I will hand it to Jolene.
What AI Services Enable
Thank you. Hi everyone. So in this slide you are seeing a typical model development lifecycle. And with AI services, we can shorten the lifecycle by automating various spaces such as model training, testing, and validation, deployment and monitoring. And I'll show you how we're doing it. So when an end user comes to the AI services and navigates to the user interface, which is represented by the blue box on the left hand side, they onboard the data set that's already prepared and start a new experiment by clicking on the submit button. And that user action alone sends a request along with relevant model information to the queue, to the job queue. And this case, behind the scenes, the queue is fetched and there's a data ingestion workflow that's submitted. This workflow in this case consists of two components.
There's data ingestion and data validation, and it's shown in circle one, color purple. And in this case, after the workflow completes, the user gets to see the metrics related to the data. And then they get to decide how the metric is looking for their data and before they decide to go on to training a model. So let's say everything looks good as expected, now they're ready to hit the submit button to train a model. So again, there's a request that gets sent to the job queue and behind the scenes, this training workflow starts. And the workflow itself consists of several components. The first component is to start an autoML instance, and in this case, we're leveraging H2O Driverless AI. And once the instance is up and running, the next component will start importing the data, the set of data that were extracted from the previous workflow, split the data by target label and start the training process.
Not only are we leveraging the driverless AI capability to give us the best final version of the model, we're also utilizing the driverless AI's client API to automate the training process. This gives us the flexibility for the automation and also the entire code base is reusable and scalable. Now once the training job is done, the following workflows will monitor for that and the workflows will extract the model artifacts along with a copy of the report and allow users to review the model performance. And if they want to even submit an ad hoc batch scoring process, they can do so. And once they're satisfied with the model results, they can then decide to deploy the model by hitting the deployment button. Now another request will be sent to the job queue, and now it's the deployment workflow, and it will go through a set of model guardrails before it actually deploys the model to production. Once the model is in production, the users can even reschedule a recurring job for running batch scoring jobs. In this case, we assume that it's an offline model and they can even schedule a recurring job for retraining the model on a regular basis. And this is how we're making the AI no-code, self-serve AI solution for our users and don't even need to worry about the technical details.
And for our future work, we'll be expanding the AI services capabilities in the unsupervised learning space. And we'll also be standardizing the data processing to shorten the duration of the project even further. Thank you.