Research breakthrough possible @S-Logix

Office Address

  • 2nd Floor, #7a, High School Road, Secretariat Colony Ambattur, Chennai-600053 (Landmark: SRM School) Tamil Nadu, India
  • +91- 81240 01111

Social List

How to check whether residuals are normally distributed or not using python?


To check the distribution of residuals in python.


  Import the libraries.

  Read the sample data.

  Take model summary.

  Check residuals are follows normal distribution or not.

  Plot the residuals.

Sample Code

#import libraries
import scipy.stats as stats
import statsmodels.api as sm
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#read the data set

#creating data frame

#assigning the independent variable
X = df[[‘rating’,’bonus’]]

#assigning the dependent variable
Y = df[‘salary’]

#Build multiple linear regression
X = sm.add_constant(X)

#fit the variables in to the linear model
model = sm.OLS(Y, X, hasconst=True).fit()

#print the intercept and regression coefficients
print_model = model.summary()

#residuals visualization
mu = np.mean(model.resid)
sigma = np.std(model.resid)
pdf = stats.norm.pdf(sorted(model.resid), mu, sigma)
plt.hist(model.resid, bins=50, normed=True)
plt.plot(sorted(model.resid), pdf, color=’r’, linewidth=2)

#qq plot
fig, [ax1, ax2] = plt.subplots(1,2, figsize=(10,3))
sm.qqplot(model.resid, stats.t, fit=True, line=’45’,
ax = ax1)
ax1.set_title(“t distribution”)
sm.qqplot(model.resid, stats.norm, fit=True, line=’45’, ax=ax2)
ax2.set_title(“normal distribution”)

check whether not Plot the residuals using python
import pandas as pd
creating data frame
fit the variables in to the linear model