Amazing technological breakthrough possible @S-Logix

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • +91- 81240 01111

Social List

How to implement linear regression using Spark with Python


To implement linear regression using Spark with python


  Set up Spark Context and Spark session

  Load the Data set

  Covert the data to dense vectors(Features and Label)

  Transform the dataset to dataframe

  Identify categorical features, and index them

  Split the data into train and test set

  Fit the linear regression model

  Predict using the test set

  Take the summary of the model and evaluate metrics

Sample Code

from pyspark.sql import SparkSession
from import Vectors
from import LinearRegression
from import Pipeline
from import VectorIndexer
from import RegressionEvaluator
import sklearn.metrics
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark regression example”)\
#Load the data set‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../weight-height.csv”)
#Convert the dataset to dense vector
def transData(data):
return r: [Vectors.dense(r[-2]),r[-1]]).toDF([‘features’,’label’])
transformed= transData(df)
featureIndexer = VectorIndexer(inputCol=”features”, \
# Split the data into training and test sets (20% held out for testing)
(trainingData, testData) = transformed.randomSplit([0.8, 0.2])
# Define LinearRegression algorithm
lr = LinearRegression()
pipeline = Pipeline(stages=[featureIndexer,lr])
model =
lrm = model.stages[-1]
#Print the coefficient and intercept(Summary)
print(“Coefficient : “+str(lrm.coefficients))
print(“Intercept : “+str(lrm.intercept))
#To predict the data
#Evaluate the RMSE and required the metrics
print(“The RMSE value is : “,rmse)
y_true =“label”).toPandas()
y_pred =“prediction”).toPandas()
r2_score = sklearn.metrics.r2_score(y_true, y_pred)
print(“R^2 score: {0}”.format(r2_score))

implement linear regression using Spark with Python