List of Topics:
Research Breakthrough Possible @S-Logix pro@slogix.in

Office Address

Social List

How to Do Descriptive Statistics Using Pandas GroupBy

Descriptive Statistics Using Pandas GroupBy

Condition for Descriptive Statistics using Pandas GroupBy

  • Description:
    In Pandas, the groupby() function is a powerful tool for performing descriptive statistics (mean, median, count, etc.) across grouped subsets of data. This function allows you to group data based on one or more categorical variables and then perform operations (aggregation, transformation, or filtration) on the grouped data.
Step-by-Step Process
  • Import Pandas:
    Import the Pandas library for data manipulation.
  • Prepare Data:
    Create or load a dataset with columns for grouping and numeric values for calculations.
  • Group the Data:
    Use groupby() to group the DataFrame by one or more columns.
  • Apply Aggregation Functions:
    Perform operations such as mean(), sum(), count(), std(), etc., on the grouped data.
  • Customize or Reset the Index:
    Reset the index if needed to return the result as a standard DataFrame.
Sample Source Code
  • import pandas as pd

    data = {
    "Department": ["HR", "Finance", "HR", "IT", "Finance", "IT", "HR"],
    "Employee": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace"],
    "Salary": [50000, 60000, 55000, 70000, 65000, 72000, 52000],
    "Age": [25, 30, 28, 35, 40, 32, 29]
    }
    df = pd.DataFrame(data)

    # Group by 'Department' and calculate descriptive statistics
    grouped = df.groupby("Department")

    # Calculate descriptive statistics
    summary = grouped["Salary"].agg(['mean', 'median', 'min', 'max', 'std'])

    print("Group by 'Department' and calculate descriptive statistics")

    print(summary)

    print()

    print("Group by 'Department' and include count of employees")

    # Group by 'Department' and include count of employees
    summary_with_count = grouped['Salary'].agg(['mean', 'median', 'count'])
    print(summary_with_count)
Screenshots
  • Descriptive Statistics Output