How to Predict Breast Cancer Using a Multi-Layer Perceptron with Scikit-Learn in Python?
Share
Condition for Predicting Breast Cancer Using a Multi-Layer Perceptron with Scikit-Learn in Python
Description: The given code implements a machine learning model using the Breast Cancer dataset to predict the diagnosis (malignant or benign). It performs data preprocessing, including handling null values, scaling the features, and encoding the target variable. The model is built using a Multi-layer Perceptron (MLP) classifier with a neural network architecture, trained on the dataset to classify the diagnosis.
Step-by-Step Process
Step1: The Breast Cancer dataset is loaded using pd.read_csv().
Step2: The code checks for any missing values in the dataset using df.isnull().sum().
Step3: It calculates the correlation between the features using df.corr() and visualizes it with a heatmap using seaborn.
Step4: The distribution of the target variable (diagnosis) is visualized using a bar plot to check for class imbalance.
Step5: The target variable (diagnosis) is converted from categorical to numeric values using LabelEncoder().
Step6: The features are scaled using StandardScaler() to normalize the data and improve model performance.
Step7: The dataset is split into training and testing sets using train_test_split() with 80% training data and 20% testing data.
Step8: A Multi-layer Perceptron (MLP) classifier is defined with hidden layers (128, 64, 32) using the MLPClassifier() from sklearn.
Step9: The model is trained using the training data with mlp.fit().
Step10: The model is evaluated using the test data, and various performance metrics (accuracy, F1 score, recall, precision) are computed. A confusion matrix is visualized using seaborn.heatmap() to assess the model’s predictions.
Sample Code
#Import Necessary Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score,
f1_score, recall_score, precision_score)
df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/breast-cancer.csv")
#check null values
df.isnull().sum()
#calculate correlation between features
# Compute the correlation matrix
correlation_matrix = df.corr()
# Display the correlation matrix
print(correlation_matrix)
# Plot the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()
x = df.drop('diagnosis', axis=1)
y = df['diagnosis']
# Count the number of samples per class
class_counts = y.value_counts()
# Plot the class distribution
plt.figure(figsize=(8, 6))
sns.barplot(x=class_counts.index, y=class_counts.values, palette="viridis")
plt.title('Class Balance Check', fontsize=16)
plt.xlabel('Class', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
#converting object to numeric in target column
label = LabelEncoder()
y = label.fit_transform(y)
#Scaling the input data
scaler = StandardScaler()
x = scaler.fit_transform(x)
#Split the train_test_data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)
# Define the MLP Classifier
mlp = MLPClassifier(hidden_layer_sizes=(128, 64, 32),
activation='relu',
solver='adam',
max_iter=10,
batch_size=2,
random_state=42,
verbose=True)
# Train the model
mlp.fit(X_train, y_train)
# Make predictions
y_pred = mlp.predict(X_test)
print("___Performance_Metrics___\n")
print('Classification_Report:\n', classification_report(y_test, y_pred))
print('Confusion_Matrix:\n', confusion_matrix(y_test, y_pred))
print('Accuracy_Score: ', accuracy_score(y_test, y_pred))
print('F1_Score: ', f1_score(y_test, y_pred))
print('Recall_Score: ', recall_score(y_test, y_pred))
print('Precision_Score: ', precision_score(y_test, y_pred))
#Plot Confusion Matrix
# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot confusion matrix using seaborn heatmap
plt.figure(figsize=(6, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['0', '1'], yticklabels=['0', '1'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()