Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Build a Simple Linear Regression Model using Python?

Simple Linear Regression Model using Python

Condition for Simple Linear Regression Model using Python

  • Description:
    A Simple Linear Regression model is one of the most basic and commonly used algorithms in machine learning, which establishes a relationship between two variables. It assumes that there is a linear relationship between the independent variable (input) and the dependent variable (output). In simple terms, it attempts to predict a continuous target variable based on a single feature.
Why Should We Use Simple Linear Regression?
  • Simple Linear Regression is a go-to method for predictive analysis when:
    You want to understand the relationship between two continuous variables.
    You have a clear linear relationship between the target and feature.
    You are working with small to medium datasets and want interpretable results.
    It's widely used in various industries like finance (predicting stock prices), healthcare (predicting patient outcomes), and marketing (predicting sales based on advertising).
Step-by-Step Process
  • Define the Problem: Identify the dependent (target) variable and independent (feature) variable.
  • Collect the Data: Gather relevant data that can be used to make predictions.
  • Preprocess the Data: Handle missing values, remove outliers, and scale the data if necessary.
  • Visualize the Data: Plot the data to check for any obvious trends or relationships.
  • Train-Test Split: Divide the data into training and testing sets (commonly 80% for training and 20% for testing).
  • Build the Model: Use linear regression from libraries like scikit-learn to fit the model on the training data.
  • Evaluate the Model: Use metrics like Mean Squared Error (MSE) and R-squared to evaluate the model's performance.
  • Interpret Results: Analyze the coefficients (intercept and slope) to understand the relationship between the variables.
  • Make Predictions: Use the trained model to make predictions on new data.
Sample Code
  • # import important liberies
    import pandas as pd # Import pandas to load and manipulate the dataset
    from sklearn.preprocessing import LabelEncoder # For encoding categorical data into
    numeric format
    from sklearn.preprocessing import StandardScaler # To scale data to a standard
    normal distribution
    import matplotlib.pyplot as plt # Used to create visualizations, such as graphs
    import seaborn as sns # For advanced visualizations like heat maps
    from sklearn.model_selection import train_test_split # To split data into training
    and testing sets
    from sklearn.linear_model import LinearRegression # For building the linear
    regression model
    from sklearn.metrics import r2_score # To evaluate model performance (R-squared
    score)
    # To calculate the error between predicted and actual values
    from sklearn.metrics import mean_squared_error,mean_absolute_error
    # Load the Salary dataset
    data = pd.read_csv("Salary Data.csv")
    df = pd.DataFrame(data) # Convert the data into a DataFrame
    # Rename columns to make them more user-friendly
    df = df.rename(columns={'Education Level':'Education_Level','Job Title':'Job_Title', 'Years of Experience':'Experience'})
    # Remove columns with low correlation to salary prediction (Job Title and Gender)
    df = df.drop(['Job_Title', 'Gender'], axis=1)
    # Optionally: Fill missing values with the mean of respective columns (commented out)
    ''' age_mean = df['Age'].mean()
    experience_mean = df['Experience'].mean()
    salary_mean = df['Salary'].mean()
    df['Age'].fillna(age_mean, inplace=True)
    df['Experience'].fillna(experience_mean, inplace=True)
    df['Salary'].fillna(salary_mean, inplace=True)
    ''' # Remove rows with missing values
    df = df.dropna(axis=0)
    # Encode categorical data (Education Level) to numerical format
    l_encoder = LabelEncoder()
    df['Education_Level'] = l_encoder.fit_transform(df['Education_Level'])
    # Calculate the correlation matrix to analyze relationships between variables
    correlation_matrix = df.corr()
    # Visualize the correlation matrix using a heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm',
    square=True)
    plt.title('Correlation Heatmap for Salary Prediction')
    plt.show()
    # Prepare the features (X) and target variable (y)
    x = df.drop(['Salary'], axis=1) # Features (everything except 'Salary')
    y = df['Salary'] # Target variable (Salary)
    # Scale the features to standardize the data (z-scores)
    s_scalar = StandardScaler()
    x = s_scalar.fit_transform(x)
    # Split the data into training (90%) and testing (10%) sets
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
    # Train a linear regression model using the training data
    model = LinearRegression()
    model.fit(x_train, y_train)
    # Predict the salary using the test data
    y_predict = model.predict(x_test)
    # Calculate the R-squared score to measure the accuracy of the model
    r2_score_value = r2_score(y_test, y_predict)
    print(f"Prediction accuracy (R-squared) between predicted and test data:
    {r2_score_value *100}")
    # Calculate the Mean Squared Error (MSE) to assess prediction error
    mean_square = mean_squared_error(y_test, y_predict)
    print(f"Prediction error (Mean Squared Error) between predicted and test data:
    {mean_square}")
    # Calculate the Mean Squared Error (MSE) to assess prediction error
    mean_absolute = mean_absolute_error(y_test, y_predict)
    print(f"Prediction error (mean_absolute_error) between predicted and test data:
    {mean_absolute}")
Screenshots
  • Simple Linear Regression