Notes on Bayes' Theorem and Naive Bayes Classifier
## 1. Bayes' Theorem (Probability Theorem)
Bayes' theorem is a mathematical tool used to calculate conditional probabilities, enabling us to update beliefs based on new evidence.
### Formula:
\[
P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
\]
#### Definitions:
- **P(A | B)**: Posterior probability - probability of event A given event B.
- **P(B | A)**: Likelihood - probability of event B given event A.
- **P(A)**: Prior probability - initial probability of event A.
- **P(B)**: Marginal likelihood - overall probability of event B.
#### Why the Complexity?
Bayes' theorem allows us to compute probabilities in scenarios involving dependencies or conditional relationships. It's particularly useful when direct computation isn't feasible.
---
## 2. Naive Bayes Classifier Intuition
The Naive Bayes classifier applies Bayes' theorem to classify data points into categories by calculating probabilities for each class and selecting the most probable one.
### Steps:
1. Apply Bayes' theorem to calculate the probability of a person walking based on their features.
2. Apply Bayes' theorem to calculate the probability of a person driving based on their features.
- **Prior Probability (P(Class))**: Ratio of the number of instances in a class to the total observations.
- **Likelihood (P(Features | Class))**: Probability of observing the features given the class.
- **Marginal Likelihood (P(Features))**: Probability of observing the features across all classes.
#### Example - Likelihood Calculation:
If calculating the likelihood for walking:
\[
Likelihood = \frac{\text{Number of similar observations for walking}}{\text{Total number of walkers}}
\]
### Step 3: Compare Probabilities
Compare the computed probabilities for each class (e.g., walking vs. driving) and predict the class with the highest probability.
---
## 3. Why is it Called "Naive"?
The "naive" assumption is that all features are **independent** of each other. For example:
- Bayes' theorem assumes that variables like age and salary are independent.
- In reality, features are often correlated, but the assumption simplifies computations while still yielding good performance.
---
## 4. Handling More than Two Classes
For multi-class classification, the Naive Bayes classifier computes probabilities for each class individually and selects the class with the highest posterior probability.
---
## 5. Gaussian Naive Bayes
Gaussian Naive Bayes assumes that the features follow a normal (Gaussian) distribution, making it suitable for continuous data.
---
## 6. Python Implementation
Below is an example implementation of Gaussian Naive Bayes:
### Code:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(x)
print(y)
# Splitting the dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# Training the Naive Bayes model on the training set
2. **Dataset Preparation**: Load and split the dataset into training and test sets.
3. **Feature Scaling**: Standardize features for better performance.
4. **Model Training**: Train the Gaussian Naive Bayes model on the training set.
5. **Prediction**: Use the trained model to predict outcomes for the test set.
6. **Comparison**: Compare predicted results with actual outcomes to evaluate performance.
---
## Summary
Bayes' theorem and the Naive Bayes classifier offer powerful tools for probabilistic reasoning and classification tasks. While the "naive" assumption of independence may not always hold, this approach remains effective for many practical applications, especially with Gaussian distributions for continuous features.
Notes on Bayes' Theorem and Naive Bayes Classifier
## 1. Bayes' Theorem (Probability Theorem)
Bayes' theorem is a mathematical tool used to calculate conditional probabilities, enabling us to update beliefs based on new evidence.
### Formula:
\[
P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}
\]
#### Definitions:
- **P(A | B)**: Posterior probability - probability of event A given event B.
- **P(B | A)**: Likelihood - probability of event B given event A.
- **P(A)**: Prior probability - initial probability of event A.
- **P(B)**: Marginal likelihood - overall probability of event B.
#### Why the Complexity?
Bayes' theorem allows us to compute probabilities in scenarios involving dependencies or conditional relationships. It's particularly useful when direct computation isn't feasible.
---
## 2. Naive Bayes Classifier Intuition
The Naive Bayes classifier applies Bayes' theorem to classify data points into categories by calculating probabilities for each class and selecting the most probable one.
### Steps:
1. Apply Bayes' theorem to calculate the probability of a person walking based on their features.
2. Apply Bayes' theorem to calculate the probability of a person driving based on their features.
### Formula for Posterior Probability:
\[
P(Class | Features) = \frac{P(Features | Class) \cdot P(Class)}{P(Features)}
\]
#### Components:
- **Prior Probability (P(Class))**: Ratio of the number of instances in a class to the total observations.
- **Likelihood (P(Features | Class))**: Probability of observing the features given the class.
- **Marginal Likelihood (P(Features))**: Probability of observing the features across all classes.
#### Example - Likelihood Calculation:
If calculating the likelihood for walking:
\[
Likelihood = \frac{\text{Number of similar observations for walking}}{\text{Total number of walkers}}
\]
### Step 3: Compare Probabilities
Compare the computed probabilities for each class (e.g., walking vs. driving) and predict the class with the highest probability.
---
## 3. Why is it Called "Naive"?
The "naive" assumption is that all features are **independent** of each other. For example:
- Bayes' theorem assumes that variables like age and salary are independent.
- In reality, features are often correlated, but the assumption simplifies computations while still yielding good performance.
---
## 4. Handling More than Two Classes
For multi-class classification, the Naive Bayes classifier computes probabilities for each class individually and selects the class with the highest posterior probability.
---
## 5. Gaussian Naive Bayes
Gaussian Naive Bayes assumes that the features follow a normal (Gaussian) distribution, making it suitable for continuous data.
---
## 6. Python Implementation
Below is an example implementation of Gaussian Naive Bayes:
### Code:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
print(x)
print(y)
# Splitting the dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
# Training the Naive Bayes model on the training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
# Predicting the test set results
y_pred = classifier.predict(x_test)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))
```
### Key Steps in Code:
1. **Import Libraries**: Load necessary Python libraries.
2. **Dataset Preparation**: Load and split the dataset into training and test sets.
3. **Feature Scaling**: Standardize features for better performance.
4. **Model Training**: Train the Gaussian Naive Bayes model on the training set.
5. **Prediction**: Use the trained model to predict outcomes for the test set.
6. **Comparison**: Compare predicted results with actual outcomes to evaluate performance.
---
## Summary
Bayes' theorem and the Naive Bayes classifier offer powerful tools for probabilistic reasoning and classification tasks. While the "naive" assumption of independence may not always hold, this approach remains effective for many practical applications, especially with Gaussian distributions for continuous features.