How to Split Dataset into Train Test in Linear Regression
Share
Description of Train Test Split in Linear Regression
Description:
In machine learning, when you're training a model, it’s important to split your dataset
into two parts:
Training Set: Used to train the model (to learn patterns).
Test Set: Used to evaluate the model's performance on unseen data.
For a Simple Linear Regression problem (predicting a dependent variable y using one
independent variable x), the dataset is split so that the training set helps the model
learn the relationship between x and y, and the test set checks how well the model
generalizes to new data.
Step-by-Step Process
Import Required Libraries:
pandas for data manipulation.
train_test_split from scikit-learn to split the data.
LinearRegression from scikit-learn to train the regression model.
Load or Create Dataset:
You can either load a dataset (e.g., from a CSV file) or create a sample dataset
manually.
Split the Dataset:
Use train_test_split() to divide the data into training and test sets.
Train the Model:
Fit a simple linear regression model to the training data.
Evaluate the Model:
Use the test data to evaluate the model’s performance.
Sample Source Code
# Code for train and test split
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression