Scaling ensures all features are on the same scale, which helps the algorithm perform better.
5. Training the Random Forest Model
python
Copy
Edit
from sklearn.ensemble import RandomForestClassifier
# Random Forest Classifier with 10 trees and entropy as the splitting criterion
classifier = RandomForestClassifier(n_estimators=10, criterion='entropy', random_state=0)
classifier.fit(x_train, y_train)
n_estimators=10: Number of decision trees (reduced from the default of 100 due to a small dataset).
criterion='entropy': Splitting criterion to minimize entropy (impurity).
fit(): Trains the model on the training data.
6. Predicting the Test Set Results
python
Copy
Edit
y_pred = classifier.predict(x_test)
# Compare predictions with actual test labels
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), axis=1))
predict(): Makes predictions for the test data.
Error Explanation:
If y_pred (predicted values) and y_test (actual values) have different lengths, np.concatenate() will fail.
In this case, ensure you predict on x_test and not mistakenly on x_train.
7. Evaluating Model Performance
python
Copy
Edit
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred) # Creates a confusion matrix
print("Confusion Matrix:\n", cm)
accuracy = accuracy_score(y_test, y_pred) # Calculates accuracy
print("Accuracy Score:", accuracy)
Confusion Matrix:
Summarizes the performance by showing true positives, true negatives, false positives, and false negatives.
Accuracy:
Measures how often the model makes correct predictions.
Summary
Random Forest builds multiple decision trees using random subsets of the data and features.
Key Parameters:
n_estimators: Number of trees in the forest.
criterion: Splitting method (e.g., gini or entropy).
Steps:
Train the model on the training set.
Predict the test set results.
Evaluate using metrics like the confusion matrix and accuracy.
How Does It Work?
Real-World Example
Key Code Breakdown
1. Importing Libraries
2. Loading and Preparing the Dataset
3. Splitting Data into Training and Test Sets
4. Feature Scaling
5. Training the Random Forest Model
6. Predicting the Test Set Results
7. Evaluating Model Performance
Summary