Return to page

H2O.ai WIKI

Machine Learning Lifecycle

What Is the Machine Learning Life Cycle?

The machine learning (ML) life cycle is a recurring process used in data science projects. While it is not a straightforward process, the ML life cycle improves data, models, evaluates, and is continually working. The ML life cycle aids organizations in developing steps for acquiring value and managing resources. 

Phase 1: Data

Quality data is key to a quality model. Having data to train the model is the first phase of the ML life cycle. The initial steps in phase one are: 

Data Collection

A large quantity of raw data is needed for data collection to resolve problems that may arise during model performance. It is recommended to look for existing datasets that fit well with the specific project. Data can be collected from files, mobile devices, databases, and the internet. The data collected will be sorted in a different step of phase one, making quality less important during this particular step.

Data Preparation

Data preparation is when data is organized and randomized. During this step, data can be explored to gain a better understanding of what would lead to a more efficient outcome.

Data Cleaning

Data cleaning is when raw data goes through a variety of filtering techniques to address errors and convert it into a usable format. Errors could include missing values, duplicate data, or invalid data in the dataset collected. Data cleaning assures the quality of the data used in ML does not negatively impact the outcome. 
 

Phase 2: Model

In phase two, the ML model is built using data that was prepared in phase one.

Model development

Model development is the core of the ML life cycle and it begins with the selection of a baseline architecture. There is no need to start with a complex model. Instead, use a simple model that will produce accurate results. 

Training

Training uses datasets to improve the model's performance for a positive outcome of the problem. It is also important for a model's understanding of patterns, rules, and features.
 

Phase 3: Evaluation

Before the model can be deployed, it must be tested. In the evaluation phase, the model is tested for accuracy and precision. An in-depth evaluation will find mistakes and determine why those mistakes were made. Evaluation metrics compare different models and determine if the solution satisfies the goals from the onset of the project. 
 

Phase 4: Production

Now that the model has been trained and passed evaluation, it is time to produce the model. Before deploying the model, there are challenges that need to be addressed. ML monitoring must occur often to ensure the model is running smoothly and operating as expected. Any new data that passes through the model must be evaluated to see if the model is performing correctly. The production phase is a continuous process of testing the model for edge cases and trends that could pose problems. Finally, the model is ready for deployment.
 

Advantages of the Machine Learning Life Cycle

Using the ML life cycle will guide a team’s work and will allow non-technical people to understand what is required to complete a project. Additionally, the ML life cycle can help increase performance and productivity by standardizing the process, vocabulary, and by helping the team be thorough.