Feature selection process is one of the main components of a feature engineering process. This is how a predictive model is developed by reducing the number of input variables.
Feature selection techniques are employed to reduce the number of input variables by eliminating redundant or irrelevant features. It then narrows the set of features to those most relevant to the machine learning model. A feature selection objective in machine learning identifies the most helpful group of features that can be used to build useful models of the phenomena being studied.
Feature selection is an efficient preprocessing technique for various real-world applications, such as text categorization, remote sensing, image retrieval, microarray analysis, mass spectrum analysis, sequence analysis, etc.
Below are some real-life examples of feature selection:
In the machine learning process, feature selection is used to make the process more accurate. It also increases the prediction power of the algorithms by selecting the most critical variables and eliminating the redundant and irrelevant ones. This is why feature selection is important.
Three key benefits of feature selection are:
Feature selection techniques can be divided into two types: supervised and unsupervised. Supervised methods may be divided into three types: wrapper methods (forward, backward, and stepwise selection), filter methods (ANOVA, Pearson correlation, variance thresholding), and embedded methods (Lasso, Ridge, Decision Tree).
Wrapper methods are used to train a model using a subset of features. According to the conclusions drawn from the previous model, we decide whether to include or exclude certain features from the subgroup. The problem is essentially reduced to a search problem and usually has a high computational cost.
For example, standard wrapper methods are forward feature selection, backward feature elimination, recursive feature elimination, etc.
With wrapper methods, it's best to use the Boruta package, which identifies the importance of a feature by creating shadow features.
Filter methods are generally used as preprocessing steps, and their selection is independent of any machine learning algorithm. Features are instead selected based on their scores in various statistical tests to determine their correlation with the outcome variable, which is subjective.
It is important to remember that filter methods do not remove multicollinearity. Before training models for your data, you must consider the multicollinearity of features.
Embedded methods combine the best features of filtering and wrapping by implementing algorithms with built-in methods for selecting features.
For example, RIDGE and LASSO regression both have inbuilt penalization functions that can reduce overfitting.
The key difference between feature selection and extraction is that feature selection keeps a subset of the original features while feature extraction algorithms transform the data onto a new feature space. Some supervised algorithms already have built-in feature selection, such as Regularized Regression and Random Forests.
Extraction: Getting useful features from existing data.
Selection: Choosing a subset of the original pool of features.
A feature selection is simply selecting or excluding given features without modifying them in any way. Dimensionality reduction reduces the dimensions of features. By contrast, the set of features made by feature selection must be a subset of the original set of features. The set made by dimensionality reduction does not have to be (for example, PCA reduces dimensionality by making new synthetic features by linearly combining the original features, then discarding the less important ones). This way, feature selection is a special case of dimensionality reduction.