List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Check Normality of Data Using Shapiro-Wilk Test in Python

Shapiro-Wilk Test for Normality in Python

Condition for Checking Normality of Data Using Shapiro-Wilk Test in Python

  • Description:
    The Shapiro-Wilk test is a statistical test used to check whether a given dataset follows a normal distribution. It is commonly used to assess the normality of data. The null hypothesis for the test is that the data is normally distributed, while the alternative hypothesis is that the data is not normally distributed.
Step-by-Step Process
  • Install Required Libraries:
    You need `scipy` for performing the Shapiro-Wilk test.
  • Prepare the Data:
    Use any numerical dataset (e.g., Salary data in the example).
  • Use `scipy.stats.shapiro()`:
    This function computes the test statistic and p-value for the normality test.
  • Interpret the Results:
    Based on the p-value, decide whether to reject the null hypothesis or not.
Sample Source Code
  • import pandas as pd

    import numpy as np
    from scipy import stats

    data = {
    'Employee ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Salary': [55000, 58000, 60000, 65000, 62000, 67000, 71000, 69000, 73000, 75000]
    }

    df = pd.DataFrame(data)

    # Extract the Salary column
    salary_data = df['Salary']

    # Perform the Shapiro-Wilk test for normality
    statistic, p_value = stats.shapiro(salary_data)

    print(df)

    print(f"Shapiro-Wilk Test Statistic: {statistic}")

    print(f"P-Value: {p_value}")

    # Interpret the result
    alpha = 0.05
    if p_value > alpha:
    print("The salary data is likely normally distributed (Fail to Reject H0), because p-value>0.5.")
    else:
    print("The salary data is likely not normally distributed (Reject H0).")
Screenshots
  • Shapiro-Wilk Test Output