Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently, pandas.DataFrame.groupby() requires users to explicitly specify both the grouping columns and the aggregation functions. This can be repetitive and inefficient, especially during exploratory data analysis on large DataFrames with many columns. A common use case like “group by all categorical columns and compute the mean of numeric columns” requires verbose, manual setup.
Feature Description
Add a new method to DataFrame called smart_groupby(), which intelligently infers grouping and aggregation behavior based on the column types of the DataFrame.
Proposed behavior:
- If no parameters are passed:
- Group by all columns of type object, category, or bool
- Aggregate all remaining numeric columns using the mean
- Optional keyword parameters:
- by: specify grouping columns explicitly
- agg: specify aggregation function(s) (default is "mean")
- exclude: exclude specific columns from grouping or aggregation
Alternative Solutions
Currently, users must write verbose code to accomplish the same:
group_cols = [col for col in df.columns if df[col].dtype == 'category']
agg_cols = [col for col in df.columns if pd.api.types.is_numeric_dtype(df[col])]
df.groupby(group_cols)[agg_cols].mean()
Additional Context
No response