How to Check Normality of Data Using Shapiro-Wilk Test in Python
Share
Condition for Checking Normality of Data Using Shapiro-Wilk Test in Python
Description:
The Shapiro-Wilk test is a statistical test used to check whether a given dataset follows a normal distribution. It is commonly used to assess the normality of data. The null hypothesis for the test is that the data is normally distributed, while the alternative hypothesis is that the data is not normally distributed.
Step-by-Step Process
Install Required Libraries:
You need `scipy` for performing the Shapiro-Wilk test.
Prepare the Data:
Use any numerical dataset (e.g., Salary data in the example).
Use `scipy.stats.shapiro()`:
This function computes the test statistic and p-value for the normality test.
Interpret the Results:
Based on the p-value, decide whether to reject the null hypothesis or not.
# Extract the Salary column
salary_data = df['Salary']
# Perform the Shapiro-Wilk test for normality
statistic, p_value = stats.shapiro(salary_data)
print(df)
print(f"Shapiro-Wilk Test Statistic: {statistic}")
print(f"P-Value: {p_value}")
# Interpret the result
alpha = 0.05
if p_value > alpha:
print("The salary data is likely normally distributed (Fail to Reject H0), because p-value>0.5.")
else:
print("The salary data is likely not normally distributed (Reject H0).")