How to Build a Regression Model Using Keras to Predict Students Weights in Python?
Share
Condition for Building a Regression Model Using Keras to Predict Students Weights in Python
Description: The given code uses the NHANES dataset to predict the weight of individuals based on various features. It preprocesses the data by scaling the features, splits it into training and testing sets, and builds a regression model using an artificial neural network (ANN).
Step-by-Step Process
Step1: Import necessary libraries for data manipulation, model building, and evaluation, including TensorFlow, pandas, and sklearn.
Step2: Load the NHANES dataset using pandas and check for missing or NaN values.
Step3: Separate the dependent variable (weight) from the independent variables, then scale the features using StandardScaler.
Step4: Split the dataset into training and testing sets using train_test_split.
Step5: Build a neural network model with multiple dense layers and ReLU activation functions for hidden layers, and a linear activation for the output layer.
Step6: Compile the model using the Adam optimizer, Mean Squared Error (MSE) loss function, and mean_squared_error as a metric.
Step7: Train the model with the training data, using a batch size of 2 and 10 epochs, while validating on the test set.
Step8: Use the trained model to predict the weight of the test set.
Step9: Calculate performance metrics like MSE, RMSE, MAE, and R-squared to evaluate the model's accuracy.
Step10: Plot a histogram of the residuals (differences between predicted and actual values) to inspect the model's error distribution.
Sample Code
#Import Necessary Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/NHANES Weight and Height.csv")
#Checking Missing Values
print("Checking Missing Values\n")
print(df.isnull().sum())
#Checking Nan Values
print('\n')
print("Checking Nan Values\n")
print(df.isna().sum())
#Split Dependent and Independent Variables
x = df.drop('Weight (kg)', axis=1)
y = df['Weight (kg)']
#Scaling the data
scaler = StandardScaler()
x = scaler.fit_transform(x)
#Split the data for training and testing
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)
def ANN_model_regression(input_shape):
# Input layer
inputs = Input(shape=(input_shape,))
# Hidden layers
layer1 = Dense(128, activation='relu')(inputs)
layer2 = Dense(64, activation='relu')(layer1)
layer3 = Dense(32, activation='relu')(layer2)
layer4 = Dense(16, activation='relu')(layer3)
# Output layer
output_layer = Dense(1, activation='linear')(layer4)
# Build the model
ann_model = Model(inputs=inputs, outputs=output_layer)
# Compile the model with Adam optimizer and regression loss function
ann_model.compile(optimizer='adam', loss='mse', metrics=['mean_squared_error'])
return ann_model
model = ANN_model_regression(X_train.shape[1])
#Summary of Model
model.summary()
model.fit(X_train, y_train, batch_size=2, epochs=10, validation_data=(X_test, y_test))
y_pred = model.predict(X_test)
y_pred = y_pred.ravel()
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
# Assuming y_test and y_pred are the actual and predicted values
print("___Performance_Metrics___\n")
print('Mean Squared Error (MSE): ', mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error (RMSE):', mean_squared_error(y_test, y_pred, squared=False)) # RMSE
print('Mean Absolute Error (MAE): ', mean_absolute_error(y_test, y_pred))
print('R-squared (R2 Score): ', r2_score(y_test, y_pred))
#Inspect Residuals (difference between predicted and actual values)
y_pred = y_pred.ravel()
residuals = y_test - y_pred
plt.hist(residuals, bins=30)
plt.title('Residuals Histogram')
plt.show()