List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Perform Sentiment Analysis on Amazon Product Reviews Using Keras and Deep Learning?

Sentiment Analysis using Keras with Deep Learning

Condition for Performing Sentiment Analysis on Amazon Product Reviews Using Keras and Deep Learning

  • Description:
    Amazon product review data by cleaning and tokenizing text,followed by vectorization using TF-IDF. It then splits the dataset into training and testing sets, training a simple artificial neural network (ANN) for binary classification of reviews into two classes.
Step-by-Step Process
  • Step1: Import necessary libraries for data manipulation (pandas), text processing (nltk, re, string),machine learning (sklearn), and deep learning (tensorflow).
  • Step2: Download NLTK resources like stopwords, punkt tokenizer, and wordnet lemmatizer.
  • Step3: Load the Amazon product review dataset from a CSV file and display the first few rows for inspection.
  • Step4: Define and apply a preprocessing function to clean and tokenize the review text.This involves converting to lowercase, removing HTML tags, URLs,non-ASCII characters, special characters, stopwords, and lemmatizing the tokens.
  • Step5: Use TfidfVectorizer to convert the preprocessed text data into a sparse matrix of TF-IDF features,representing the text numerically.
  • Step6: Split the dataset into training and testing sets using train_test_split.
  • Step7: Define a simple artificial neural network (ANN) model with two hidden layers,dropout layers for regularization, and a sigmoid output layer for binary classification.
  • Step8: Compile the model using the Adam optimizer, binary cross-entropy loss function, and accuracy metric.
  • Step9: Train the model on the training set for 10 epochs while validating it on the test set.
  • Step10: Make predictions on the test data, classify them as 0 or 1 based on the threshold of 0.5, and evaluate the model using metrics such as accuracy, F1 score, recall, precision, and a confusion matrix.
Sample Code
  • #Import Necessary Libraries
    import pandas as pd
    import re
    import string
    import nltk
    from nltk.corpus import stopwords
    from nltk.tokenize import word_tokenize
    from nltk.stem import WordNetLemmatizer
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.layers import Dense,Dropout,Input
    from tensorflow.keras.models import Model
    from sklearn.metrics import (classification_report,confusion_matrix,accuracy_score,
    f1_score,recall_score,precision_score)
    import warnings
    warnings.filterwarnings("ignore")
    nltk.download('stopwords')
    nltk.download('punkt')
    nltk.download('wordnet')
    df =
    pd.read_csv("/home/soft12/Downloads/sample_dataset/Website/Dataset/amazon.csv")
    # Display initial rows of the dataset
    print("Initial data preview:")
    print(df.head())
    # Initialize lemmatizer
    lemmatizer = WordNetLemmatizer()
    # Define the preprocessing functions
    def preprocess_text(text):
    text = text.lower()
    text = clean_text(text)
    tokens = word_tokenize(text)
    stopwords_set = set(stopwords.words('english'))
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in
    stopwords_set]
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text
    def clean_text(text):
    # Remove HTML tags using regex
    text = re.sub(r'<.*?>', '', text)
    # Remove URLs
    text = re.sub(r'http\S+', '', text)
    # Remove non-ASCII characters except periods
    text = re.sub(r'[^\x00-\x7F.]', ' ', text)
    # Remove special characters except periods
    text = re.sub(f'[{re.escape(string.punctuation.replace(".", ""))}]', '', text)
    # Remove isolated numbers
    text = re.sub(r'\b\d+\b', '', text)
    # Replace multiple periods with a single space
    text = re.sub(r'\.{2,}', ' ', text)
    # Remove extra spaces after periods
    text = re.sub(r'(?<=\.)\s+', ' ', text).strip()
    return text
    # Apply preprocessing to the text column
    text_data = df['Text'].apply(preprocess_text)
    tfidf_vectorizer = TfidfVectorizer(max_features=250)
    tfidf_features = tfidf_vectorizer.fit_transform(text_data)
    text = tfidf_features.toarray()
    #Split the train_test_data
    X_train,X_test,y_train,y_test =
    train_test_split(text,df['label'],test_size=.2,random_state=42)
    def ANN_model(input_shape):
    # Input layer
    inputs = Input(shape=(input_shape,))
    # Hidden layers
    layer1 = Dense(64, activation='relu')(inputs)
    Dropout1 = Dropout(0.2)(layer1)
    layer2 = Dense(32, activation='relu')(Dropout1)
    Dropout2 = Dropout(0.2)(layer2)
    # Output layer
    output_layer = Dense(1, activation='sigmoid')(Dropout2)
    # Build the model
    ann_model = Model(inputs=inputs, outputs=output_layer)
    # Compile the model with Adam optimizer and binary crossentropy loss function
    ann_model.compile(optimizer='adam', loss='binary_crossentropy',
    metrics=['accuracy'])
    return ann_model
    model = ANN_model(X_train.shape[1])
    model.fit(X_train,y_train,batch_size=2,epochs=10,validation_data=(X_test,y_test))
    y_pred = model.predict(X_test)
    y_pred = [1 if i>0.5 else 0 for i in y_pred]
    print("___Performance_Metrics___\n")
    print('Classification_Report:\n',classification_report(y_test, y_pred))
    print('Confusion_Matrix:\n',confusion_matrix(y_test, y_pred))
    print('Accuracy_Score: ',accuracy_score(y_test, y_pred))
    print('F1_Score: ',f1_score(y_test, y_pred))
    print('Recall_Score: ',recall_score(y_test, y_pred))
    print('Precision_Score: ',precision_score(y_test, y_pred))
Screenshots
  • Sentiment analysis using keras