Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Split Dataset into Train Test in Linear Regression

Train Test Split Linear Regression

Description of Train Test Split in Linear Regression

  • Description:
    In machine learning, when you're training a model, it’s important to split your dataset into two parts:
    • Training Set: Used to train the model (to learn patterns).
    • Test Set: Used to evaluate the model's performance on unseen data.
    For a Simple Linear Regression problem (predicting a dependent variable y using one independent variable x), the dataset is split so that the training set helps the model learn the relationship between x and y, and the test set checks how well the model generalizes to new data.
Step-by-Step Process
  • Import Required Libraries:
    pandas for data manipulation.
    train_test_split from scikit-learn to split the data.
    LinearRegression from scikit-learn to train the regression model.
  • Load or Create Dataset:
    You can either load a dataset (e.g., from a CSV file) or create a sample dataset manually.
  • Split the Dataset:
    Use train_test_split() to divide the data into training and test sets.
  • Train the Model:
    Fit a simple linear regression model to the training data.
  • Evaluate the Model:
    Use the test data to evaluate the model’s performance.
Sample Source Code
  • # Code for train and test split

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression

    data = {
    'Bedrooms': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Price': [150000, 180000, 220000, 260000, 300000, 330000, 360000, 400000, 430000, 450000]
    }
    df = pd.DataFrame(data)

    # Split the dataset into input (X) and output (y)
    X = df[['Bedrooms']]
    y = df['Price']

    # Split the dataset into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train the Simple Linear Regression model
    model = LinearRegression()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    print("Slope (coefficient):", model.coef_)
    print("Intercept:", model.intercept_)

    print("\nX train data:\n", X_train)
    print("\nY train data:\n", y_train)

    print("\nX test data:\n", X_test)
    print("\nY test data:\n", y_test)

    print("\nOriginal dataset is:\n", df)
Screenshots
  • Train Test Split Result