Notes on Bayes' Theorem and Naive Bayes Classifier

## 1. Bayes' Theorem (Probability Theorem)

Bayes' theorem is a mathematical tool used to calculate conditional probabilities, enabling us to update beliefs based on new evidence.

### Formula:

P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}

#### Definitions:

- **P(A | B)**: Posterior probability - probability of event A given event B.

- **P(B | A)**: Likelihood - probability of event B given event A.

- **P(A)**: Prior probability - initial probability of event A.

- **P(B)**: Marginal likelihood - overall probability of event B.

#### Why the Complexity?

Bayes' theorem allows us to compute probabilities in scenarios involving dependencies or conditional relationships. It's particularly useful when direct computation isn't feasible.

---

## 2. Naive Bayes Classifier Intuition

The Naive Bayes classifier applies Bayes' theorem to classify data points into categories by calculating probabilities for each class and selecting the most probable one.

### Steps:

1. Apply Bayes' theorem to calculate the probability of a person walking based on their features.

2. Apply Bayes' theorem to calculate the probability of a person driving based on their features.

### Formula for Posterior Probability:

P(Class | Features) = \frac{P(Features | Class) \cdot P(Class)}{P(Features)}

#### Components:

- **Prior Probability (P(Class))**: Ratio of the number of instances in a class to the total observations.

- **Likelihood (P(Features | Class))**: Probability of observing the features given the class.

- **Marginal Likelihood (P(Features))**: Probability of observing the features across all classes.

#### Example - Likelihood Calculation:

If calculating the likelihood for walking:

Likelihood = \frac{\text{Number of similar observations for walking}}{\text{Total number of walkers}}

### Step 3: Compare Probabilities

Compare the computed probabilities for each class (e.g., walking vs. driving) and predict the class with the highest probability.

---

## 3. Why is it Called "Naive"?

The "naive" assumption is that all features are **independent** of each other. For example:

- Bayes' theorem assumes that variables like age and salary are independent.

- In reality, features are often correlated, but the assumption simplifies computations while still yielding good performance.

---

## 4. Handling More than Two Classes

For multi-class classification, the Naive Bayes classifier computes probabilities for each class individually and selects the class with the highest posterior probability.

---

## 5. Gaussian Naive Bayes

Gaussian Naive Bayes assumes that the features follow a normal (Gaussian) distribution, making it suitable for continuous data.

---

## 6. Python Implementation

Below is an example implementation of Gaussian Naive Bayes:

### Code:

```python

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Importing the dataset

dataset = pd.read_csv('Social_Network_Ads.csv')

x = dataset.iloc[:, :-1].values

y = dataset.iloc[:, -1].values

print(x)

print(y)

# Splitting the dataset into training and test sets

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=1)

# Feature scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

x_train = sc.fit_transform(x_train)

x_test = sc.transform(x_test)

# Training the Naive Bayes model on the training set

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()

classifier.fit(x_train, y_train)

# Predicting the test set results

y_pred = classifier.predict(x_test)

print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

```

### Key Steps in Code:

1. **Import Libraries**: Load necessary Python libraries.

2. **Dataset Preparation**: Load and split the dataset into training and test sets.

3. **Feature Scaling**: Standardize features for better performance.

4. **Model Training**: Train the Gaussian Naive Bayes model on the training set.

5. **Prediction**: Use the trained model to predict outcomes for the test set.

6. **Comparison**: Compare predicted results with actual outcomes to evaluate performance.

---

## Summary

Bayes' theorem and the Naive Bayes classifier offer powerful tools for probabilistic reasoning and classification tasks. While the "naive" assumption of independence may not always hold, this approach remains effective for many practical applications, especially with Gaussian distributions for continuous features.