- Activation Function
- Confusion Matrix
- Convolutional Neural Networks
- Forward Propagation
- Generative Adversarial Network
- Gradient Descent
- Linear Regression
- Logistic Regression
- Machine Learning Algorithms
- Multilayer Perceptron
- Naive Bayes
- Neural Networking and Deep Learning
- RuleFit
- Stack Ensemble
- Word2Vec
- XGBoost

- Attention Mechanism
- BERT
- Binary Classification
- Classify Token ([CLS])
- Conversational Response Generation
- GLUE (General Language Understanding Evaluation)
- GPT (Generative Pre-Trained Transformers)
- Language Modeling
- Layer Normalization
- Mask Token ([MASK])
- Probability Distribution
- Probing Classifiers
- SQuAD (Stanford Question Answering Dataset)
- Self-attention
- Separate token ([SEP])
- Sequence-to-sequence Language Generation
- Sequential Text Spans
- Text Classification
- Text Generation
- Transformer Architecture
- WordPiece

- AUC-ROC
- Analytical Review
- Autoencoders
- Bias-Variance Tradeoff
- Decision Optimization
- Explanatory Variables
- Exponential Smoothing
- Level of Granularity
- Long Short-Term Memory
- Loss Function
- Model Management
- Precision and Recall
- Predictive Learning
- ROC Curve
- Recommendation system
- Stochastic Gradient Descent
- Target Leakage
- Target Variable
- Underwriting

A

C

D

G

L

M

N

P

R

S

T

X

- Activation Function
- Confusion Matrix
- Convolutional Neural Networks
- Forward Propagation
- Generative Adversarial Network
- Gradient Descent
- Linear Regression
- Logistic Regression
- Machine Learning Algorithms
- Multilayer Perceptron
- Naive Bayes
- Neural Networking and Deep Learning
- RuleFit
- Stack Ensemble
- Word2Vec
- XGBoost

- Attention Mechanism
- BERT
- Binary Classification
- Classify Token ([CLS])
- Conversational Response Generation
- GLUE (General Language Understanding Evaluation)
- GPT (Generative Pre-Trained Transformers)
- Language Modeling
- Layer Normalization
- Mask Token ([MASK])
- Probability Distribution
- Probing Classifiers
- SQuAD (Stanford Question Answering Dataset)
- Self-attention
- Separate token ([SEP])
- Sequence-to-sequence Language Generation
- Sequential Text Spans
- Text Classification
- Text Generation
- Transformer Architecture
- WordPiece

- AUC-ROC
- Analytical Review
- Autoencoders
- Bias-Variance Tradeoff
- Decision Optimization
- Explanatory Variables
- Exponential Smoothing
- Level of Granularity
- Long Short-Term Memory
- Loss Function
- Model Management
- Precision and Recall
- Predictive Learning
- ROC Curve
- Recommendation system
- Stochastic Gradient Descent
- Target Leakage
- Target Variable
- Underwriting

In machine learning (ML), decision trees are algorithms that use a series of if-else decisions to classify input data by the answers generated. Regression trees are a specific form of decision trees which are used to predict numerical outputs instead of classifications. Regression trees are based on a data set, from either historical data or an experiment. The data set contains input variables called predictors and an output variable that the user wants to predict. The regression tree is then built to help predict the outcome of future inputs.

Building a regression tree begins with careful analysis of the data set. The data is plotted for each predictor, similar to how a user would plot a regression. Then, for each graph, the sum of squared residuals (RSS) is calculated at various points across the graph. The RSS is a statistical calculation which aggregates the distance of each data point from the average above and below the selected data point. A higher sum of squared residuals represents a higher variance from the average line. The value which generated the minimum RSS for each predictor is selected as the threshold for the relevant decision. The variable that has the lowest of all the calculated sums of squared residuals becomes the root of the tree. The tree is then built downward based on the sum of squared residuals.

When first created, a new regression tree is prone to overfitting. Regression tree pruning is the process of optimizing a regression tree by removing or splitting decision nodes based on the bias and variance of the output. Pruning modifies the tree to create different variations and find the tree that performs best on a validation data set. Pruning is often performed in a reverse order, meaning the last node generated is the first to be considered for elimination. The cost complexity algorithm can help guide the pruning process. Other methods allow multiple tree variants to work together to provide a more accurate and well supported prediction. These include bagging, boosting, and the random tree method.