Skip to content

ENH: Use MaskedEngine for numeric pyarrow dtypes #51316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 16, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 10, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
idx = pd.Index(list(range(1_000_000)), dtype="int64[pyarrow]")
idxer = list(range(1, 500_000, 10))

pr 
%timeit idx.get_indexer(idxer)
4.85 ms ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

main
%timeit idx.get_indexer(idxer)
16.9 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

cc @mroeschke Is this roughly what you had in mind?

@phofl phofl marked this pull request as draft February 10, 2023 23:23
@mroeschke
Copy link
Member

Nice! Yeah this is what I had in mind.

@phofl phofl marked this pull request as ready for review February 13, 2023 20:00
@phofl
Copy link
Member Author

phofl commented Feb 13, 2023

Added a whatsnew.

We should probably figure out a better way of converting to numpy when na_value is specified and compatible with the associated NumPy dtype

# Conflicts:
#	pandas/core/indexes/base.py
@phofl
Copy link
Member Author

phofl commented Feb 16, 2023

@mroeschke do we want to get this in before the rc?

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance Arrow pyarrow functionality labels Feb 16, 2023
@mroeschke mroeschke added this to the 2.0 milestone Feb 16, 2023
@phofl phofl merged commit a131266 into pandas-dev:main Feb 16, 2023
@phofl phofl deleted the pyarrow_engine branch February 16, 2023 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
2 participants