Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

What is Random Forest?

nishan thapa

Jan 24, 2025

176 views

Random Forest is an ensemble learning method that combines multiple decision trees to create a more powerful machine learning model.
It improves predictions by aggregating the results from several decision trees to reduce variance and prevent overfitting.
How Does It Work?
Ensemble Learning: Combine multiple machine learning models (decision trees) to create one strong model.
Steps:
Step 1: Randomly select kkk data points from the training set.
Step 2: Build a decision tree using these kkk data points.
Step 3: Repeat Steps 1 and 2 NNN times (where NNN is the number of trees to build).
Step 4: For a new data point:
Each tree predicts the category for the data point.
The majority vote determines the final prediction.
Real-World Example
Microsoft Kinect used Random Forest for real-time human pose recognition.
Referenced article: Real-Time Human Pose Recognition in Parts from Single Depth Images.
Key Code Breakdown
1. Importing Libraries
python Copy Edit import pandas as pd import numpy as np import matplotlib.pyplot as plt
2. Loading and Preparing the Dataset
python Copy Edit dataset = pd.read_csv('Social_Network_Ads.csv') # Features (independent variables) and target variable (dependent variable) x = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values
3. Splitting Data into Training and Test Sets
python Copy Edit from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)
80-20 split: 80% of the data for training, 20% for testing.
random_state=1: Ensures reproducibility.
4. Feature Scaling
python Copy Edit from sklearn.preprocessing import StandardScaler sc = StandardScaler() x_train = sc.fit_transform(x_train) x_test = sc.transform(x_test)
Scaling ensures all features are on the same scale, which helps the algorithm perform better.
5. Training the Random Forest Model
python Copy Edit from sklearn.ensemble import RandomForestClassifier # Random Forest Classifier with 10 trees and entropy as the splitting criterion classifier = RandomForestClassifier(n_estimators=10, criterion='entropy', random_state=0) classifier.fit(x_train, y_train)
n_estimators=10: Number of decision trees (reduced from the default of 100 due to a small dataset).
criterion='entropy': Splitting criterion to minimize entropy (impurity).
fit(): Trains the model on the training data.
6. Predicting the Test Set Results
python Copy Edit y_pred = classifier.predict(x_test) # Compare predictions with actual test labels print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), axis=1))
predict(): Makes predictions for the test data.
Error Explanation:
If y_pred (predicted values) and y_test (actual values) have different lengths, np.concatenate() will fail.
In this case, ensure you predict on x_test and not mistakenly on x_train.
7. Evaluating Model Performance
python Copy Edit from sklearn.metrics import confusion_matrix, accuracy_score cm = confusion_matrix(y_test, y_pred) # Creates a confusion matrix print("Confusion Matrix:\n", cm) accuracy = accuracy_score(y_test, y_pred) # Calculates accuracy print("Accuracy Score:", accuracy)
Confusion Matrix:
Summarizes the performance by showing true positives, true negatives, false positives, and false negatives.
Accuracy:
Measures how often the model makes correct predictions.
Summary
Random Forest builds multiple decision trees using random subsets of the data and features.
Key Parameters:
n_estimators: Number of trees in the forest.
criterion: Splitting method (e.g., gini or entropy).
Steps:
Train the model on the training set.
Predict the test set results.
Evaluate using metrics like the confusion matrix and accuracy.

Written by nishan thapa