How to Predict Houses Sold Above the Median Price Using Keras and Deep Learning?
Share
Condition for Predicting Houses Sold Above the Median Price Using Keras and Deep Learning
Description: This code predicts whether a house was sold for more than the median price using a deep learning model built with Keras. It preprocesses the data by encoding categorical features and scaling numerical ones, then trains an artificial neural network (ANN) to classify houses based on their features. This approach is ideal for binary classification tasks like identifying high-value houses in real estate datasets.
Step-by-Step Process
Step1: Load necessary libraries for data manipulation, preprocessing, and building the neural network with Keras.
Step2: Read the housing dataset containing features and house prices.
Step3: Create a binary target column Target, where 1 indicates prices above or equal to the median, and 0 indicates lower prices.
Step4: Use LabelEncoder to transform categorical features into numerical format.
Step5: Apply StandardScaler to normalize numerical features for better model performance.
Step6: Divide the dataset into training and testing sets to evaluate model performance.
Step7: Define a neural network with two hidden layers, dropout for regularization, and a sigmoid activation in the output layer for binary classification.
Step8: Compile the model using the Adam optimizer and binary cross-entropy loss,and train it on the training set with validation on the test set.
Step9: Use the trained model to predict probabilities on the test set and convert them into binary classes (1 or 0).
Step10: Calculate and display metrics like accuracy, F1 score, precision, recall,and the confusion matrix to assess the model's performance.
Sample Code
#Import Necessary Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder,StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense,Dropout,Input
from tensorflow.keras.models import Model
from sklearn.metrics import (classification_report,confusion_matrix,accuracy_score,
f1_score,recall_score,precision_score)
df = pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/Housing.csv")
df['price'].median()
#create a column for label using median price condition
df['Target'] = [1 if i>=4340000 else 0 for i in df['price']]
label = LabelEncoder()
for i in df.columns:
if df[i].dtype == 'object':
df[i] = label.fit_transform(df[i])
scaler = StandardScaler()
x = df.drop(['Target'],axis=1)
x = scaler.fit_transform(x)
y = df['Target']
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)
def ANN_model(input_shape):
# Input layer
inputs = Input(shape=(input_shape,))
# Hidden layers
layer1 = Dense(32, activation='relu')(inputs)
Dropout1 = Dropout(0.2)(layer1)
layer2 = Dense(16, activation='relu')(Dropout1)
Dropout2 = Dropout(0.2)(layer2)
# Output layer
output_layer = Dense(1, activation='sigmoid')(Dropout2)
# Build the model
ann_model = Model(inputs=inputs, outputs=output_layer)
# Compile the model with Adam optimizer and binary crossentropy loss function
ann_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return ann_model
model = ANN_model(X_train.shape[1])
model.fit(X_train,y_train,batch_size=2,epochs=10,validation_data=(X_test,y_test))
y_pred = model.predict(X_test)
y_pred = [1 if i>0.5 else 0 for i in y_pred]
print("___Performance_Metrics___\n")
print('Classification_Report:\n',classification_report(y_test, y_pred))
print('Confusion_Matrix:\n',confusion_matrix(y_test, y_pred))
print('Accuracy_Score: ',accuracy_score(y_test, y_pred))
print('F1_Score: ',f1_score(y_test, y_pred))
print('Recall_Score: ',recall_score(y_test, y_pred))
print('Precision_Score: ',precision_score(y_test, y_pred))