Skip to content

BUG: SeriesGroupBy.value_counts sorts when sort=False #50482

Closed
@rhshadrach

Description

@rhshadrach

In the 2nd example, the column gender is reverse-lexicographically sorted (male then female).

df = DataFrame(
    {
        "gender": ["male", "male", "female", "male", "female", "male"],
        "education": ["low", "medium", "high", "low", "high", "low"],
        "country": ["US", "FR", "US", "FR", "FR", "FR"],
    }
)
gb = df.groupby(["country", "gender"], as_index=True, sort=False)
result = gb.value_counts(sort=False)
print(result)
# country  gender  education
# US       male    low          1
# FR       male    medium       1
# US       female  high         1
# FR       male    low          2
#          female  high         1
# dtype: int64

gb2 = df.groupby(["country", "gender"], as_index=True, sort=False)["education"]
result2 = gb2.value_counts(sort=False)
print(result2)
# country  gender  education
# US       male    low          1
# FR       male    low          2
#                  medium       1
# US       female  high         1
# FR       female  high         1
# Name: education, dtype: int64

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupby

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions