List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Build and Evaluate an LSTM Model for Predicting Adult Income from a Dataset

Building and Evaluating an LSTM Model for Predicting Adult Income from a Dataset

Condition for Building and Evaluating an LSTM Model for Predicting Adult Income from a Dataset

  • Description:
    The code preprocesses the Adult Income dataset by encoding categorical variables, normalizing features, and splitting the data into training and testing sets. It builds a Long Short-Term Memory (LSTM) model for binary classification to predict income categories (<=50k or >50k). The model's performance is evaluated using metrics such as accuracy, precision, recall, F1 score, and a confusion matrix.
Step-by-Step Process
  • Import Libraries:
    Import essential libraries like pandas, matplotlib, seaborn, and TensorFlow for data processing and model building.
  • Load and Inspect Data:
    Load the Adult Income dataset, check for missing or null values, and confirm the data types.
  • Preprocess Data:
    Encode categorical columns, compute a correlation matrix, and check the distribution of the target variable.
  • Scale Data:
    Normalize the feature data to ensure better convergence during model training.
  • Build and Train LSTM Model:
    Create an LSTM model with two LSTM layers and one dense output layer for binary classification. Train the model with training data.
  • Evaluate and Visualize:
    Evaluate the model's performance using accuracy, precision, recall, F1 score, and plot a confusion matrix.
Sample Source Code
  • # Import Necessary Libraries
    import pandas as pd
    from sklearn.preprocessing import LabelEncoder, StandardScaler
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.layers import Dense, Input, LSTM
    from tensorflow.keras.models import Model
    from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score,
    f1_score, recall_score, precision_score)

    import warnings
    warnings.filterwarnings("ignore")

    df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/adult.csv")

    # Check Nan values
    print("Check Nan values\n")
    print(df.isna().sum())

    # Check Null Values
    print("Check Null Values\n")
    print(df.isnull().sum())

    # Check dtypes of features
    print(df.dtypes)

    # Convert object dtypes to numeric
    label = LabelEncoder()

    for i in df.columns:
    if df[i].dtypes == 'object':
    df[i] = label.fit_transform(df[i])

    # Compute the correlation matrix
    correlation_matrix = df.corr()

    # Display the correlation matrix
    print(correlation_matrix)

    # Plot the heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
    plt.title('Correlation Heatmap')
    plt.show()

    x = df.drop('income',axis=1)
    y = df['income']

    # Count the number of samples per class
    class_counts = y.value_counts()

    # Plot the class distribution
    plt.figure(figsize=(8, 6))
    sns.barplot(x=class_counts.index, y=class_counts.values, palette="viridis")
    plt.title('Class Balance Check', fontsize=16)
    plt.xlabel('Class', fontsize=14)
    plt.ylabel('Count', fontsize=14)
    plt.xticks(fontsize=12)
    plt.yticks(fontsize=12)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()

    # Scaling the input data
    scaler = StandardScaler()
    x = scaler.fit_transform(x)

    # Split the train_test_data
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)

    def LSTM_model(input_shape):
    # Input layer
    inputs = Input(shape=(input_shape[1], input_shape[2]))

    # LSTM layers
    lstm_layer1 = LSTM(64, return_sequences=True)(inputs)
    lstm_layer2 = LSTM(32, return_sequences=False)(lstm_layer1)

    # Output layer
    output_layer = Dense(1, activation='sigmoid')(lstm_layer2)
    # Build the model
    lstm_model = Model(inputs=inputs, outputs=output_layer)

    # Compile the model with Adam optimizer and binary crossentropy loss function
    lstm_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    return lstm_model

    # Reshape input data to 3D (samples, timesteps, features)
    X_train_lstm = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
    X_test_lstm = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))

    # Instantiate and train the LSTM model
    model = LSTM_model(X_train_lstm.shape)
    model.summary()

    model.fit(X_train_lstm, y_train, batch_size=2, epochs=10, validation_data=(X_test_lstm, y_test))

    y_pred = model.predict(X_test_lstm)
    y_pred = [1 if i > 0.5 else 0 for i in y_pred]

    # Calculate confusion matrix
    cm = confusion_matrix(y_test, y_pred)

    class_labels = ['<=50k', '>50k']

    # Plot the heatmap with correct labels
    plt.figure(figsize=(6, 5))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_labels, yticklabels=class_labels)
    plt.title('Confusion Matrix Heatmap')
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.show()

    print("___Performance_Metrics___\n")
    print('Classification_Report:\n', classification_report(y_test, y_pred))
    print('Confusion_Matrix:\n', confusion_matrix(y_test, y_pred))
    print('Accuracy_Score: ', accuracy_score(y_test, y_pred))
    print('F1_Score: ', f1_score(y_test, y_pred))
    print('Recall_Score: ', recall_score(y_test, y_pred))
    print('Precision_Score: ', precision_score(y_test, y_pred))
Screenshots
  • LSTM Model Output Screenshot