-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: avoid copy in _obj_with_exclusions, _selected_obj #51090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm; I think we can remove a case the logic in _selected_obj, but might be wrong.
if self._selection is not None: | ||
if is_hashable(self._selection): | ||
# i.e. a single key, so selecting it will return a Series. | ||
# In this case, _obj_with_exclusions would wrap the key | ||
# in a list and return a single-column DataFrame. | ||
return self.obj[self._selection] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any time self.obj[self._selection]
results in a Series will be when self
is a SeriesGroupBy and self.obj
is a Series, which is handled above.
If the frame has duplicate columns, then we can windup in the case where self._selection
is hashable and self
is a DataFrameGroupBy, but in this case I think self.obj[self_selection]
will then agree with self._obj_with_exclusions
(both frames).
So I think we can still return obj_with_exclusions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That logic works for SeriesGroupBy/DataFrameGroupBy, but this code path is shared by Resampler, which defines _gotitem kludgily. Until we change that, test_resample_groupby_agg_listlike would fail if we return obj_with_exclusions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we change that
(this is part of why i pinged you about the non-unique columns in agg_dict_like. If we can make sure to always pass subset
to _gotitem
, i think that makes it a lot easier to de-kludge Resample._gotitem)
Do you think the memory perf improvement is worth a note in the whatsnew? |
added |
Thanks @jbrockmendel |
i think the describe PR will need rebasing (might want to wait until after #51113 is merged) and it should be good to go |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.cc @rhshadrach