Skip to content

ENH: Add smart_groupby() method for automatic grouping by categorical columns and aggregating numerics #61420

Open
@rit4rosa

Description

@rit4rosa

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently, pandas.DataFrame.groupby() requires users to explicitly specify both the grouping columns and the aggregation functions. This can be repetitive and inefficient, especially during exploratory data analysis on large DataFrames with many columns. A common use case like “group by all categorical columns and compute the mean of numeric columns” requires verbose, manual setup.

Feature Description

Add a new method to DataFrame called smart_groupby(), which intelligently infers grouping and aggregation behavior based on the column types of the DataFrame.

Proposed behavior:

  • If no parameters are passed:
    • Group by all columns of type object, category, or bool
    • Aggregate all remaining numeric columns using the mean
  • Optional keyword parameters:
    • by: specify grouping columns explicitly
    • agg: specify aggregation function(s) (default is "mean")
    • exclude: exclude specific columns from grouping or aggregation

Alternative Solutions

Currently, users must write verbose code to accomplish the same:

group_cols = [col for col in df.columns if df[col].dtype == 'category']
agg_cols = [col for col in df.columns if pd.api.types.is_numeric_dtype(df[col])]
df.groupby(group_cols)[agg_cols].mean()

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions