Building and Evaluating an LSTM Model for Predicting Adult Income from a Dataset

How to Build and Evaluate an LSTM Model for Predicting Adult Income from a Dataset

Condition for Building and Evaluating an LSTM Model for Predicting Adult Income from a Dataset

Description:
The code preprocesses the Adult Income dataset by encoding categorical variables, normalizing features, and splitting the data into training and testing sets. It builds a Long Short-Term Memory (LSTM) model for binary classification to predict income categories (<=50k or >50k). The model's performance is evaluated using metrics such as accuracy, precision, recall, F1 score, and a confusion matrix.

Step-by-Step Process

Import Libraries:
Import essential libraries like pandas, matplotlib, seaborn, and TensorFlow for data processing and model building.
Load and Inspect Data:
Load the Adult Income dataset, check for missing or null values, and confirm the data types.
Preprocess Data:
Encode categorical columns, compute a correlation matrix, and check the distribution of the target variable.
Scale Data:
Normalize the feature data to ensure better convergence during model training.
Build and Train LSTM Model:
Create an LSTM model with two LSTM layers and one dense output layer for binary classification. Train the model with training data.
Evaluate and Visualize:
Evaluate the model's performance using accuracy, precision, recall, F1 score, and plot a confusion matrix.

Sample Source Code

# Import Necessary Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense, Input, LSTM
from tensorflow.keras.models import Model
from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score,
f1_score, recall_score, precision_score)

import warnings
warnings.filterwarnings("ignore")

df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/adult.csv")

# Check Nan values
print("Check Nan values\n")
print(df.isna().sum())

# Check Null Values
print("Check Null Values\n")
print(df.isnull().sum())

# Check dtypes of features
print(df.dtypes)

# Convert object dtypes to numeric
label = LabelEncoder()

for i in df.columns:
if df[i].dtypes == 'object':
df[i] = label.fit_transform(df[i])

# Compute the correlation matrix
correlation_matrix = df.corr()

# Display the correlation matrix
print(correlation_matrix)

# Plot the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

x = df.drop('income',axis=1)
y = df['income']

# Count the number of samples per class
class_counts = y.value_counts()

# Plot the class distribution
plt.figure(figsize=(8, 6))
sns.barplot(x=class_counts.index, y=class_counts.values, palette="viridis")
plt.title('Class Balance Check', fontsize=16)
plt.xlabel('Class', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

# Scaling the input data
scaler = StandardScaler()
x = scaler.fit_transform(x)

# Split the train_test_data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)

def LSTM_model(input_shape):
# Input layer
inputs = Input(shape=(input_shape[1], input_shape[2]))

# LSTM layers
lstm_layer1 = LSTM(64, return_sequences=True)(inputs)
lstm_layer2 = LSTM(32, return_sequences=False)(lstm_layer1)

# Output layer
output_layer = Dense(1, activation='sigmoid')(lstm_layer2)
# Build the model
lstm_model = Model(inputs=inputs, outputs=output_layer)

# Compile the model with Adam optimizer and binary crossentropy loss function
lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

return lstm_model

# Reshape input data to 3D (samples, timesteps, features)
X_train_lstm = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test_lstm = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))

# Instantiate and train the LSTM model
model = LSTM_model(X_train_lstm.shape)
model.summary()

model.fit(X_train_lstm, y_train, batch_size=2, epochs=10, validation_data=(X_test_lstm, y_test))

y_pred = model.predict(X_test_lstm)
y_pred = [1 if i > 0.5 else 0 for i in y_pred]

# Calculate confusion matrix
cm = confusion_matrix(y_test, y_pred)

class_labels = ['<=50k', '>50k']

# Plot the heatmap with correct labels
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_labels, yticklabels=class_labels)
plt.title('Confusion Matrix Heatmap')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

print("___Performance_Metrics___\n")
print('Classification_Report:\n', classification_report(y_test, y_pred))
print('Confusion_Matrix:\n', confusion_matrix(y_test, y_pred))
print('Accuracy_Score: ', accuracy_score(y_test, y_pred))
print('F1_Score: ', f1_score(y_test, y_pred))
print('Recall_Score: ', recall_score(y_test, y_pred))
print('Precision_Score: ', precision_score(y_test, y_pred))

Screenshots

List

Office Address

Social List

How to Build and Evaluate an LSTM Model for Predicting Adult Income from a Dataset

Condition for Building and Evaluating an LSTM Model for Predicting Adult Income from a Dataset

Step-by-Step Process

Sample Source Code

Screenshots

S-Logix (OPC) Private Limited