Building a Text Classification Model Using GRU for Sentiment Analysis with IMDB Dataset

How to Build a Text Classification Model Using GRU for Sentiment Analysis with IMDB Dataset

Condition for Building a Text Classification Model Using GRU for Sentiment Analysis with IMDB Dataset

Description:
The code loads and preprocesses the IMDB dataset by cleaning text, removing stopwords, and applying TF-IDF vectorization. It then builds an LSTM model (using a GRU layer) for binary sentiment classification and trains it. Finally, it evaluates the model's performance using classification metrics and visualizes the results with a confusion matrix.

Step-by-Step Process

Load Libraries:
Import essential libraries like pandas, sklearn, nltk, and tensorflow for data preprocessing and model building.
Preprocess Text Data:
Clean the text by removing HTML tags, special characters, and stopwords. Tokenize and lemmatize the text.
Encode and Vectorize Data:
Encode sentiment labels and transform the text into numerical features using TF-IDF.
Build and Train Model:
Use GRU as the core layer in an LSTM model, compile it, and train it with the prepared data.
Evaluate and Visualize:
Evaluate the model on the test dataset, generate a classification report, and display a confusion matrix.

Sample Source Code

# Import Necessary Libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, GRU
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

# Load and preprocess IMDB dataset
df = pd.read_csv("IMDB_Dataset.csv")

# Preprocess text
def preprocess_text(text):
text = re.sub(r'<.*?>', '', text)
text = re.sub(r'[^a-zA-Z\s]', '', text)
tokens = word_tokenize(text.lower())
stop_words = set(stopwords.words('english'))
tokens = [WordNetLemmatizer().lemmatize(word) for word in tokens if word not in stop_words]
return ' '.join(tokens)

df['review'] = df['review'].apply(preprocess_text)

# Vectorize text
tfidf = TfidfVectorizer(max_features=250)
X = tfidf.fit_transform(df['review']).toarray()

# Encode labels
y = LabelEncoder().fit_transform(df['sentiment'])

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build GRU Model
def build_model():
inputs = Input(shape=(X_train.shape[1],))
x = Dense(64, activation='relu')(inputs)
x = Dense(32, activation='relu')(x)
outputs = Dense(1, activation='sigmoid')(x)
model = Model(inputs, outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model

model = build_model()
model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test), batch_size=32)

# Evaluate Model
y_pred = model.predict(X_test)
print(classification_report(y_test, (y_pred > 0.5).astype(int)))

Screenshots

List

Office Address

Social List

How to Build a Text Classification Model Using GRU for Sentiment Analysis with IMDB Dataset

Condition for Building a Text Classification Model Using GRU for Sentiment Analysis with IMDB Dataset

Step-by-Step Process

Sample Source Code

Screenshots

S-Logix (OPC) Private Limited