Random Forest Classifier for Employee Attrition Prediction

    This code implements a Random Forest Classifier to predict employee attrition in an HR dataset. The goal is to assess and understand the factors that contribute to employee turnover.

Importing Necessary Libraries:


  • pandas: Used for data manipulation and analysis.
  • matplotlib.pyplot: For data visualization.
  • train_test_split: Splits the dataset into training and testing sets.
  • LabelEncoder: Encodes categorical labels into numerical values.
  • RandomForestClassifier: Implements the Random Forest classification algorithm.
  • confusion_matrix: Measures the performance of the classification model.
  • accuracy_score: Computes the accuracy of the model.
  • SMOTE: Synthetic Minority Over-sampling Technique for dealing with imbalanced data.
Data Loading:


  • df1: Loads the HR dataset from a CSV file.
Data Preprocessing:



  • df: Creates a copy of the dataset for manipulation.
  • LabelEncoder: Encodes the 'Attrition' column from text labels to numerical values (0 for 'No', 1 for 'Yes').
  • pd.get_dummies(): Converts categorical columns into binary (dummy) variables.
  • Dropping columns: Removes columns 'EmployeeCount' and 'EmployeeNumber' as they are not relevant for prediction.
Data Splitting and Balancing:


  • X: Features (independent variables).
  • y: Target variable (attrition).
  • SMOTE: Applies oversampling to balance the dataset.
  • train_test_split: Splits the dataset into training and testing sets.
Model Training:


  • RandomForestClassifier: Initializes a Random Forest classifier with 100 decision trees.
Model Evaluation:


  • y_pred: Predicts attrition for the test set.
  • accuracy_score: Calculates the accuracy of the model and prints the result.

Accuracy_Score:



  • Our model has accuracy score of 92.71%

Summary:
This code performs data preprocessing, model training, and evaluation using a Random Forest Classifier to predict employee attrition. The accuracy score at the end measures the model's performance in predicting attrition based on the given dataset.

No comments:

Post a Comment

Pages