ENH: Add support for multi-column quantiles of DataFrame

Is your feature request related to a problem?

For dataframes, Pandas currently only supports per-column quantiles; that is, given df[['c', 'a']].quantile(...), Pandas will compute the individual quantiles for columns c and a:

>>> df = pd.DataFrame({'a': [1, 0, 11, 12, 2], 'b': [1, 2, 3, 4, 5], 'c': [0, 1, 5, 2, 3]})
>>> df[['c', 'a']].quantile([0, 0.5, 1])
       c     a
0.0  0.0   0.0
0.5  2.0   2.0
1.0  5.0  12.0

It would be nice if Pandas also supported multi-column quantiles; that is, given df[['c', 'a']].quantiles(...), Pandas would compute the quantiles for the dataframe sorted by all columns. This is currently implemented by cuDF's dataframe:

In [11]: gdf = cudf.DataFrame({"a": [1, 1, 1, 1, 1], "b": [5, 4, 3, 2, 1]})
In [12]: gdf[["a", "b"]].quantiles([0, 0.5, 1])
Out[12]: 
     a  b
0.0  1  1
0.5  1  3
1.0  1  5

Describe the solution you'd like

I imagine the addition of multi-column quantiles support could happen in two ways:

the addition of a default kwarg to DataFrame.quantile to specify whether or not we want multi-column quantiles
the addition of a new method to compute multi-column quantiles independent from the logic of quantile

In either case, my preference here would be to have this functionality accessible via DataFrame.quantiles, to maintain consistency with cuDF.

API breaking implications

I can't think of any breakages this would cause, as long as any direct changes to quantile ensure that the original behavior is maintained by default.

Describe alternatives you've considered

This could be accomplished by sorting the dataframe by all columns and then indexing based on manually computed quantiles, but I imagine there's a more performant way to do this.

Additional context

If this functionality were added, along with a multi-columnar searchsorted, it would enable Dask dataframes to compute sort_values with multiple sort-by columns, using an algorithm roughly similar to that of dask-cudf.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add support for multi-column quantiles of DataFrame #43881

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: Add support for multi-column quantiles of DataFrame #43881

Description

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions