How to Perform Predictions in Simple Linear Regression with Python
Share
Condition for Simple Linear Regression Prediction
Description:
Simple Linear Regression helps model the relationship between an independent and a dependent variable. The goal is to find the best-fitting line (linear equation) to predict the dependent variable.
The process includes data collection, preprocessing, model training, prediction, and evaluation. Key performance metrics include R-squared (R²) and Mean Squared Error (MSE).
Step-by-Step Process
Data Collection: Collect data with pairs of independent (x) and dependent (y) variables.
Data Preprocessing: Handle missing values, outliers, and scale features if necessary.
Splitting the Data: Split the data into training and testing sets (e.g., 80% for training and 20% for testing).
Model Building: Train a linear regression model using the training data.
Model Evaluation: Use R² and MSE to evaluate the model’s performance on test data.
Model Interpretation: Interpret the slope (m) and intercept (b) to understand the relationship between x and y.
Making Predictions: Predict new values for unseen data.
Model Improvement (Optional): Consider multiple linear regression or polynomial features if the data is complex.
Sample Source Code
# Salary prediction using Linear Regression in Machine Learning
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# Load the dataset
data = pd.read_csv("Salary Data.csv")
df = pd.DataFrame(data)