- Activation Function
- Confusion Matrix
- Convolutional Neural Networks
- Forward Propagation
- Generative Adversarial Network
- Gradient Descent
- Linear Regression
- Logistic Regression
- Machine Learning Algorithms
- Multilayer Perceptron
- Naive Bayes
- Neural Networking and Deep Learning
- RuleFit
- Stack Ensemble
- Word2Vec
- XGBoost

- Attention Mechanism
- BERT
- Binary Classification
- Classify Token ([CLS])
- Conversational Response Generation
- GLUE (General Language Understanding Evaluation)
- GPT (Generative Pre-Trained Transformers)
- Language Modeling
- Layer Normalization
- Mask Token ([MASK])
- Probability Distribution
- Probing Classifiers
- SQuAD (Stanford Question Answering Dataset)
- Self-attention
- Separate token ([SEP])
- Sequence-to-sequence Language Generation
- Sequential Text Spans
- Text Classification
- Text Generation
- Transformer Architecture
- WordPiece

- AUC-ROC
- Analytical Review
- Autoencoders
- Bias-Variance Tradeoff
- Decision Optimization
- Explanatory Variables
- Exponential Smoothing
- Level of Granularity
- Long Short-Term Memory
- Loss Function
- Model Management
- Precision and Recall
- Predictive Learning
- ROC Curve
- Recommendation system
- Stochastic Gradient Descent
- Target Leakage
- Target Variable
- Underwriting

A

C

D

G

L

M

N

P

R

S

T

X

- Activation Function
- Confusion Matrix
- Convolutional Neural Networks
- Forward Propagation
- Generative Adversarial Network
- Gradient Descent
- Linear Regression
- Logistic Regression
- Machine Learning Algorithms
- Multilayer Perceptron
- Naive Bayes
- Neural Networking and Deep Learning
- RuleFit
- Stack Ensemble
- Word2Vec
- XGBoost

- Attention Mechanism
- BERT
- Binary Classification
- Classify Token ([CLS])
- Conversational Response Generation
- GLUE (General Language Understanding Evaluation)
- GPT (Generative Pre-Trained Transformers)
- Language Modeling
- Layer Normalization
- Mask Token ([MASK])
- Probability Distribution
- Probing Classifiers
- SQuAD (Stanford Question Answering Dataset)
- Self-attention
- Separate token ([SEP])
- Sequence-to-sequence Language Generation
- Sequential Text Spans
- Text Classification
- Text Generation
- Transformer Architecture
- WordPiece

- AUC-ROC
- Analytical Review
- Autoencoders
- Bias-Variance Tradeoff
- Decision Optimization
- Explanatory Variables
- Exponential Smoothing
- Level of Granularity
- Long Short-Term Memory
- Loss Function
- Model Management
- Precision and Recall
- Predictive Learning
- ROC Curve
- Recommendation system
- Stochastic Gradient Descent
- Target Leakage
- Target Variable
- Underwriting

Naive Bayes classifiers are an assortment of simple and powerful classification algorithms based on Bayes Theorem. They are recommended as a first approach to classify complicated datasets before more refined classifiers are used.

Bayes Theorem is a collection of algorithms that share a common principle. With Bayes theorem, users find the likelihood of A happening, given that B transpired. In the equation provided below, B is the evidence and A is the hypothesis. The fundamental assumption of Bayes Theorem is that predictors are independent. In other words, the existence of one predictor will not influence the other.

P(A|B) = P(B|A)P(A) / P(B)

Naive Bayes classifiers assume that each feature makes an independent and equal contribution to the outcome. The first portion of the assumption is that no pair of features is dependent, making it independent. An example is if the outside humidity is high, that does not mean the outside temperature is also high. The second portion of the assumption is that each predictor has equal importance. For example, when deciding to play golf, a windy day is equally important as temperature.

Common in Natural Language Processing (NLP), multinomial Naive Bayes classifiers infer the tag of text, calculate the probability for a given sample, and output the tag with the greatest probability. Multinomial Naive Bayes classifiers use the frequency of words in a document as features/predictors. They are typically used in document classification.

Because a multinomial Naive Bayes classifier only calculates probability, it is easy to implement. Furthermore it efficiently handles large datasets and is very scalable. Therefore, it is useful in evaluating real-time applications.

Compared with other probability algorithms, multinomial Naive Bayes have lower prediction accuracy and are unsuitable for regression analysis. Therefore this technique is inappropriate for estimating numerical values and should only be used to classify text input.

Unlike multinomial Naive Bayes classifiers, Bernoulli Naive Bayes classifiers use binary (boolean) variables such as yes or no, true or false, etc. The Bernoulli Naive Bayes classifier is used for document classification.

Compared to other Naive Bayes classifiers, Bernoulli Naive Bayes is a fast classifying algorithm that works well with small datasets, delivers accurate results, and can easily handle irrelevant features.

Gaussian Naive Bayes classifiers work with continuous data and the assumption that the values associated with different classes are distributed in accordance to a normal (or Gaussian) distribution. This classifier provides accuracy without excessive effort and is an efficient and user friendly technique to implement.

Naive Bayes algorithms are most commonly used for text classification. There are differences within these algorithms, but each is simple and efficient. While each algorithm would need training data to approximate the parameters needed for evaluation, the Naive Bayes algorithm can give required data quicker than more sophisticated methods, making them valuable in real-world situations.