One of the main reasons why we build AI/Machine Learning models is for it to be used in production to support expert decision making. Whether your business is deciding what creatives your customers should be getting on emails or determining a product recommendation for a web page, AI/Models provide relevance/context to customers to drive your business. For healthcare applications, this could mean recommending a patient to consult a health advisor for preventive care, to avoid hospitalization. For retail, this could mean triggering inventory decisions ahead of brewing peak demand. For financial applications, this may indicate a trading decision on a forecast on some market index. The list goes on. Almost every vertical comes with tons of use cases where AI/ML can be efficiently used in production.
AI/ML processes in production works by ‘scoring’ models on data in real-time or batch mode to make decisions. Decisions could be:
Real-time scoring is excellent if you want milli-second response time in making decisions – for example, a retailer is offering recommendations to your users on a website dynamically. Real-time scoring is also instrumental in detecting and flagging fraud or for security when interactions are in-flight. You can even think of real-time scoring in a healthcare environment to detect and alert when medical attention is required. In general, real-time scoring is used where your expert-system should react and trigger downstream processes to mitigate something urgent, that cannot wait.
Batch scoring is useful when we do things like credit risk models and data drift is minimal in transactions arriving in your data lake or warehouse, and scores are considered stationary over a tolerable period. Like sending an email or trigger a customer service call to promote/up-sell/inform or solicit more information from your customer.
Fundamentally, the operational SLAs also drives one of the above. The trade-offs in the scoring environment are also determined by how complex your final model is – like what algorithms were decided to use in scoring + feature engineering effort to transform the incoming data before it’s handed off to the algorithms in the pipeline.
While scoring can happen at the edge or in batch mode, there is no free lunch. Behind a very good scoring environment, there is an effort to build highly accurate models and feature engineering, and that keeps up with new data coming into the training environment. Sometimes you can do training on all data. The holy grail, however, is for the models to learn continuously from new data arriving in the training environment, thus shortening the time to deploy in production – all without losing the fidelity of the model.
If your data scientists are building great models, the main concerns are around how well the code they’ve written is production deployable. If the models often change with algorithms or ensembles built + new feature engineering discovered, how easy it is to move it to production, hand-off to dev-ops or some model management system? How portable your scoring artifacts are in environments that don’t look anything like where the training happened?
Driverless AI – Model deployment of Hard Disk Failure Detection. Data (c) BackBlaze.com