Skip to content

BUG: MultiIndex.unique losing ea dtype #48335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 1, 2022
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Aug 31, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

the MultiIndex.unique code path is casting the values to tuples and runs through unique1d. This is really slow.

       before           after         ratio
     [e7414aa1]       [d17eb141]
     <48184^2>        <perf_mi_unique>
-         422±2ms       30.3±0.5ms     0.07  multiindex_object.Unique.time_unique_dups(('Int64', <NA>))
-        496±70ms         28.4±1ms     0.06  multiindex_object.Unique.time_unique_dups(('int64', 0))
-         811±4ms       32.9±0.2ms     0.04  multiindex_object.Unique.time_unique(('int64', 0))
-        812±20ms       32.8±0.9ms     0.04  multiindex_object.Unique.time_unique(('Int64', <NA>))
@phofl phofl changed the title BUG: MultiIndex.unique losing ea type Aug 31, 2022
midx = MultiIndex.from_arrays([a, b], names=["a", "b"])
result = midx.unique()

a = Series([1, 2, NA], dtype="Int64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, can you avoid 1-letter variable names

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thx. Stupid copy paste miss

@phofl
Copy link
Member Author

phofl commented Sep 1, 2022

cc @mroeschke for a second set of eyes

@mroeschke mroeschke added this to the 1.6 milestone Sep 1, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phofl phofl merged commit ee1c4c1 into pandas-dev:main Sep 1, 2022
@phofl phofl deleted the perf_mi_unique branch September 1, 2022 20:35
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants