List of Topics:
Location Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Build and Evaluate a Deep Neural Network for Air Quality Classification Using TensorFlow and Scikit-Learn

Building and Evaluating a Deep Neural Network Model for Air Quality Classification

Condition for Building and Evaluating a Deep Neural Network Model for Air Quality Classification

  • Description:
    Condition for building and evaluating a deep neural network for air quality classification involves preprocessing the dataset, handling missing values, and scaling features. It includes constructing a model with hidden layers, training it using TensorFlow, and assessing performance using classification metrics. The model predicts air quality levels and is evaluated based on accuracy, F1 score, and confusion matrix.
Process
  • Necessary libraries:
    Pandas, Numpy, Scikit-Learn, and TensorFlow are imported for data manipulation, model building, and evaluation. Visualization libraries such as Matplotlib and Seaborn are also included for plotting.
  • Load the dataset:
    The dataset is loaded into a Pandas DataFrame using pd.read_csv() from the specified file path. This dataset contains pollution-related features and air quality labels.
  • Handle missing values:
    The presence of missing values in the dataset is checked using isnull().sum(). This ensures that the dataset is clean and ready for processing.
  • Correlation matrix:
    The correlation matrix of the dataset is computed to understand the relationships between features. A heatmap is plotted to visualize these correlations.
  • Label encoding:
    The categorical 'Air Quality' column is converted into numeric values using LabelEncoder(). This transformation prepares the target variable for model training.
  • Independent and dependent variables:
    Independent variables (X) are selected by dropping the 'Air Quality' column, while the dependent variable (y) is the 'Air Quality' column itself.
  • Scaling the data:
    The feature data is scaled using StandardScaler() to ensure that all features are on the same scale, improving the model's performance and convergence.
  • Train-test split:
    The dataset is split into training and testing sets using train_test_split(). This division allows for training the model on one subset and evaluating it on another.
  • Define the Deep Neural Network (DNN):
    A Deep Neural Network (DNN) model is defined with input, hidden, and output layers using TensorFlow's Keras API. The model uses ReLU activations for hidden layers and softmax for multi-class classification.
  • Model training and evaluation:
    The model is trained using the training data for 50 epochs, and predictions are made on the test set. The performance is evaluated using metrics like accuracy, confusion matrix, and F1 score, followed by visualization of the confusion matrix.
Sample Source Code
  • # Import Necessary Libraries
    import pandas as pd
    import numpy as np
    from sklearn.preprocessing import LabelEncoder, StandardScaler
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.layers import Dense, Input
    from tensorflow.keras.models import Model
    from sklearn.metrics import (classification_report, confusion_matrix, accuracy_score, f1_score, recall_score, precision_score)

    import warnings
    warnings.filterwarnings("ignore")
    df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/updated_pollution_dataset.csv")

    # Check null values
    print("Check null values")
    df.isnull().sum() # Find Correlation
    correlation_matrix = df.corr()

    # Display the correlation matrix
    print(correlation_matrix)

    # Plot the heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
    plt.title('Correlation Heatmap')
    plt.show()

    # Check Nan values
    print("Check Nan values")
    print(df.isnull().sum())

    # Check types of columns
    print("Check types of columns")
    print(df.dtypes)
    df['Air Quality'].value_counts()
    # Convert object target column into numeric
    label = LabelEncoder()
    df['Air Quality'] = label.fit_transform(df['Air Quality'])

    # Define Dependent and Independent variables
    x = df.drop('Air Quality', axis=1)
    y = df['Air Quality']

    # Scaling the Data
    scaler = StandardScaler()
    x = scaler.fit_transform(x)

    # Split the train-test data
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)

    def DNN_model(input_shape):
    # Input layer
    inputs = Input(shape=(input_shape,))

    # Hidden layers
    layer1 = Dense(32, activation='relu')(inputs)
    layer2 = Dense(16, activation='relu')(layer1)
    # Output layer
    output_layer = Dense(4, activation='softmax')(layer2)

    # Build the model
    ann_model = Model(inputs=inputs, outputs=output_layer)

    # Compile the model with Adam optimizer and binary crossentropy loss function
    ann_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    return ann_model

    model = DNN_model(X_train.shape[1])

    # model Summary
    model.summary()

    model.fit(X_train, y_train, batch_size=2, epochs=50, validation_data=(X_test, y_test))

    y_pred = model.predict(X_test)
    y_pred = [np.argmax(i) for i in y_pred]

    # Calculate confusion matrix
    cm = confusion_matrix(y_test, y_pred)

    class_labels = ['Moderate', 'Good', 'Hazardous', 'Poor']

    # Plot the heatmap with correct labels
    plt.figure(figsize=(6, 5))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_labels, yticklabels=class_labels)
    plt.title('Confusion Matrix Heatmap')
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.show()

    print("___Performance Metrics___\n")
    print('Classification Report:\n', classification_report(y_test, y_pred))
    print('Confusion Matrix:\n', confusion_matrix(y_test, y_pred))
    print('\n')
    print('Accuracy Score: ', accuracy_score(y_test, y_pred))
    print('F1 Score (macro): ', f1_score(y_test, y_pred, average='macro'))
    print('Recall Score (macro): ', recall_score(y_test, y_pred, average='macro'))
    print('Precision Score (macro): ', precision_score(y_test, y_pred, average='macro'))
Screenshots
  • DNN Output Screenshot