How to Build and Evaluate an ANN Model for Customer Churn Prediction Using Python
Share
Condition for Building and Evaluating an ANN Model for Customer Churn Prediction Using Python
Description:
This code demonstrates building and evaluating an Artificial Neural Network
(ANN) model to predict customer churn using the Churn_Modelling dataset.
It includes data preprocessing steps like encoding categorical variables,
scaling features, addressing class imbalance, and splitting data into training
and testing sets. The model is trained with binary cross-entropy loss, and performance
metrics such as accuracy, precision, recall, and a confusion matrix are calculated
and visualized.
Step-by-Step Process
Import Libraries:
Import the necessary libraries for data manipulation, visualization, and model building.
Examine the Dataset:
Check for missing or NaN values in the dataset.
Preprocess the Data:
Convert categorical features into numeric using LabelEncoder, scale the data using StandardScaler.
Visualize the Data:
Visualize class distribution and any class imbalance using Seaborn.
Build and Train the ANN Model:
Create a neural network with hidden layers and ReLU activation, using sigmoid for output.
Evaluate the Model:
Calculate accuracy, precision, recall, F1-score, and plot the confusion matrix.
Sample Source Code
# Code for ANN Model
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score,
f1_score, recall_score, precision_score)
df = pd.read_csv("/path/to/Churn_Modelling.csv")
print("Columns")
print(df.columns)
# Check for missing values
print("Checking Missing Values")
print(df.isnull().sum())
# Convert categorical features into numeric
label = LabelEncoder()
for i in df.columns:
if df[i].dtypes == object:
df[i] = label.fit_transform(df[i])
# Split dataset
x = df.drop('Exited', axis=1)
y = df['Exited']
# Normalize data
scaler = StandardScaler()
x = scaler.fit_transform(x)
# Plot class distribution
sns.countplot(x=y)
# Split train and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)