Return to page

H2O.ai WIKI

Feature Engineering

What is Feature Engineering?

Feature engineering involves the selection, manipulation, and transformation of raw data into features used in supervised learning. The purpose of feature engineering and selection is to improve the performance of machine-learning algorithms. In consequence, model accuracy on unseen data is improved.

Examples of Feature Engineering

Feature engineering is a machine learning technique that leverages the information in the training set to create new variables. As well as simplifying and speeding up data transformations, feature engineering can enhance model accuracy by producing new features for supervised and unsupervised learning. 

Here are some examples of feature engineering:

1. Continuous Data

The most common type of data is continuous data. This type of data can take any value from a given range. It can be, for example, the price of a product, the temperature of an industrial process, or the coordinates of some geographical feature.

The generation of features depends mainly on the domain data. If we subtract the warehouse price from the shelf price, we can calculate profit. Similarly, if we calculate the distance between two locations on a map, we can calculate distance.

The possibilities for new features are limited only by the available features and known mathematical operations.

2. Categorical features

The second most popular type of data is categorical data, which refers to features that can take on values from a limited set. In many cases, the feature can only take a single value.

It can also happen otherwise, but in that case, the feature is usually separated into a set of features. As an example, the ISO/IEC 5218 standard defines four genders: not known, male, female, and not applicable.

3. Text Features

Feature engineering also involves converting text into a set of representative numerical values. Almost all automatic mining of social media data encodes the text as numbers. It is easiest to encode data by word counts: you take a snippet of text, count the occurrences of each word, and put the results in a table.

4. Image Features

Another common need for machine learning analysis is to encode images appropriately.

Why is Feature Engineering important?

Features engineering encompasses various data engineering techniques, such as selecting relevant features, dealing with missing data, encoding data, and normalizing it. Developing a model's output is one of the most crucial tasks. It is so because machine learning cannot make accurate predictions if the wrong hypotheses are provided. The quality of the view will determine the success of the machine learning model. 

A good feature is critical for accuracy and interoperability. By identifying the essential variables and removing redundant and irrelevant variables, feature selection improves the machine learning process and increases the predictive power of machine learning algorithms.

As with a correlation matrix, feature importance allows you to understand the relationship between the features and the target variable. In addition, it indicates which features are irrelevant to the model.

Feature Engineering vs. Other Technologies & Methodologies

Feature Engineering vs. Feature Selection

With feature engineering, more sophisticated models can be created than if working with raw data. It also allows for the construction of interpretable models from any amount of data. The feature selection process will help limit the number of features to a manageable number.

Feature Engineering vs. Feature Extraction

Feature engineering is converting raw data into features/attributes that better reflect the underlying structure of the data. Feature extraction is the process of transforming raw data into the desired form.

Feature Engineering vs. Hyperparameter Tuning

Feature engineering uses domain knowledge of the data to create features that make machine learning algorithms work. Hyperparameter tuning or optimization refers to selecting the set of optimal hyperparameters for a learning algorithm. Hyperparameter optimization aims to improve model performance by changing model parameters. A feature reduction would be an example of feature engineering about data.

In summary:

Features: Characteristics that describe your problem. Also called attributes.

Parameters: Variables your algorithm tries to tune so you can build an accurate model.