Thompson Sampling is a probabilistic algorithm used for decision-making in multi-armed bandit problems. It balances exploration and exploitation by maintaining probability distributions of potential rewards for each option.
Unlike Upper Confidence Bound (UCB), which is deterministic and updates every round, Thompson Sampling is probabilistic and can handle delayed feedback. This makes it practical for applications such as online advertising, where updates can occur in batches rather than in real-time.
## Algorithm Steps
1. **Initialize**: Maintain two counters for each ad:
- The number of times an ad received a reward of 1.
- The number of times an ad received a reward of 0.
2. **Sampling**: At each round, for each ad:
- Draw a random sample from a Beta distribution using the recorded rewards.
3. **Selection**: Choose the ad with the highest sampled value.
4. **Update**: Record the reward obtained from the chosen ad and update the counters accordingly.
## Code Implementation
### Importing Libraries
```python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random # For beta distribution sampling
```
### Loading Dataset
```python
dataset = pd.read_csv('Ads_CTR_Optimisation.csv')
```
### Thompson Sampling Implementation
```python
N = 500 # Number of rounds (users clicking on ads)
d = 10 # Number of ads
ads_selected = [] # List to store selected ads
# Step 1: Initialize counters for rewards
numbers_of_reward_1 = [0] * d # Number of times ad i got reward 1
numbers_of_reward_0 = [0] * d # Number of times ad i got reward 0
total_reward = 0 # Total accumulated reward
# Step 2: Loop over each round
for n in range(0, N):
ad = 0 # Selected ad for this round
max_random = 0 # Maximum beta distribution sample in this round
# Step 3: Draw a sample from the beta distribution for each ad
| Update Requirement | Every round | Can be updated in batches |
| Randomness | None | Uses random sampling |
| Practicality | Works well with immediate updates | Suitable for delayed feedback |
## Conclusion
Thompson Sampling is often more practical than UCB in real-world applications due to its ability to handle delayed feedback and batch updates. This makes it a preferred choice for problems like online advertising, where ads can be updated periodically rather than immediately.
Thompson Sampling
## Overview
Thompson Sampling is a probabilistic algorithm used for decision-making in multi-armed bandit problems. It balances exploration and exploitation by maintaining probability distributions of potential rewards for each option.
Unlike Upper Confidence Bound (UCB), which is deterministic and updates every round, Thompson Sampling is probabilistic and can handle delayed feedback. This makes it practical for applications such as online advertising, where updates can occur in batches rather than in real-time.
## Algorithm Steps
1. **Initialize**: Maintain two counters for each ad:
- The number of times an ad received a reward of 1.
- The number of times an ad received a reward of 0.
2. **Sampling**: At each round, for each ad:
- Draw a random sample from a Beta distribution using the recorded rewards.
3. **Selection**: Choose the ad with the highest sampled value.
4. **Update**: Record the reward obtained from the chosen ad and update the counters accordingly.
## Code Implementation
### Importing Libraries
```python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random # For beta distribution sampling
```
### Loading Dataset
```python
dataset = pd.read_csv('Ads_CTR_Optimisation.csv')
```
### Thompson Sampling Implementation
```python
N = 500 # Number of rounds (users clicking on ads)
d = 10 # Number of ads
ads_selected = [] # List to store selected ads
# Step 1: Initialize counters for rewards
numbers_of_reward_1 = [0] * d # Number of times ad i got reward 1
numbers_of_reward_0 = [0] * d # Number of times ad i got reward 0
total_reward = 0 # Total accumulated reward
# Step 2: Loop over each round
for n in range(0, N):
ad = 0 # Selected ad for this round
max_random = 0 # Maximum beta distribution sample in this round
# Step 3: Draw a sample from the beta distribution for each ad
for i in range(0, d):
random_beta = random.betavariate(numbers_of_reward_1[i] + 1, numbers_of_reward_0[i] + 1)
if random_beta > max_random:
max_random = random_beta
ad = i # Fixing incorrect index assignment
ads_selected.append(ad) # Append selected ad for this round
reward = dataset.values[n, ad] # Get the reward for the selected ad
# Update reward counters
if reward == 1:
numbers_of_reward_1[ad] += 1
else:
numbers_of_reward_0[ad] += 1
total_reward += reward # Update total reward
```
### Visualizing the Results
```python
plt.hist(ads_selected, bins=np.arange(d+1)-0.5, edgecolor='black')
plt.title('Histogram of Ads Selections')
plt.xlabel('Ads')
plt.ylabel('Number of times each ad was selected')
plt.xticks(range(d))
plt.show()
```
## Comparison: UCB vs. Thompson Sampling
| Feature | UCB (Upper Confidence Bound) | Thompson Sampling |
|----------------------|-----------------------------|-------------------|
| Nature | Deterministic | Probabilistic |
| Update Requirement | Every round | Can be updated in batches |
| Randomness | None | Uses random sampling |
| Practicality | Works well with immediate updates | Suitable for delayed feedback |
## Conclusion
Thompson Sampling is often more practical than UCB in real-world applications due to its ability to handle delayed feedback and batch updates. This makes it a preferred choice for problems like online advertising, where ads can be updated periodically rather than immediately.