BUG: Series(nullable).tolist() -> numpy scalars #49890

lukemanley · 2022-11-24T14:18:47Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.0.0.rst file if fixing a bug or adding a new feature.

Series.tolist documentation states:

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Nullable dtype behavior seems to be inconsistent with the docstring and other dtypes:

main:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # numpy.int64
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

PR:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

The fix also provides an improvement in performance:

import pandas as pd
import numpy as np

ser = pd.Series(np.arange(10**6), dtype="Int64")

%timeit ser.tolist()

# 45.5 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 28.5 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR

phofl

Could you profile this for pyarrow as well and with NA?

jreback · 2022-11-24T14:36:07Z

hmm i seem to recall that we very specifically have python scalers in all of these list conversion methods (eg iter as well)

i would be surprised this doesn't break many things

edit: i think i misread your post
you are making things consistent ok!

lukemanley · 2022-11-24T14:48:49Z

Could you profile this for pyarrow as well and with NA?

pyarrow doesn't hit this method. It hits ExtensionArray.tolist. I have a separate PR in the works to make that faster.

Here is this PR profiled with NA:

import pandas as pd
import numpy as np

ser = pd.Series(np.arange(10**6), dtype="Int64")
ser.iloc[::2] = pd.NA

%timeit ser.tolist()

# 92.9 ms ± 1.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 59.5 ms ± 4.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR

lukemanley · 2022-11-24T14:50:17Z

edit: i think i misread your post
you are making things consistent ok!

Yes, sorry my wording should have been more clear. This makes nullable return python scalars which is consistent with other dtypes

phofl · 2022-11-24T14:55:55Z

Thx, missed the file name. good that casting to object is still faster

phofl · 2022-11-24T14:56:28Z

Could you merge main? We got some deprecation warnings from numpy that are fixed on main

phofl · 2022-11-25T16:25:08Z

pandas/tests/series/methods/test_tolist.py

+            [1],
+            "int64[pyarrow]",
+            int,
+            marks=td.skip_if_no("pyarrow", min_version="1.0.0"),


I think min version is 7 or something similar, please adjust accordingly

updated to version 6 which is the min version tested, although this test should pass with any version. It looks like there are a handful of other tests that have less than 6.0.

I think you can get rid of the min version then

ah, thanks. updated

mroeschke

LGTM merge when ready @phofl

phofl · 2022-11-29T20:51:48Z

thx @lukemanley

nullable tolist to return python scalars

0cc09c1

lukemanley added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 24, 2022

gh refs

5df5815

phofl reviewed Nov 24, 2022

View reviewed changes

lukemanley added 3 commits November 24, 2022 09:59

Merge remote-tracking branch 'upstream/main' into nullable-tolist

8692e2e

fix test

1fd54e7

fix test

2e99f2e

phofl reviewed Nov 25, 2022

View reviewed changes

lukemanley added 4 commits November 25, 2022 21:07

update min version

d0e4f54

Merge remote-tracking branch 'upstream/main' into nullable-tolist

e2ef416

remove min_version

e3105e2

formatting

af9095a

mroeschke approved these changes Nov 29, 2022

View reviewed changes

phofl approved these changes Nov 29, 2022

View reviewed changes

phofl added this to the 2.0 milestone Nov 29, 2022

phofl merged commit c872a8b into pandas-dev:main Nov 29, 2022

lukemanley deleted the nullable-tolist branch December 20, 2022 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series(nullable).tolist() -> numpy scalars #49890

BUG: Series(nullable).tolist() -> numpy scalars #49890

lukemanley commented Nov 24, 2022 •

edited

Loading

phofl left a comment

jreback commented Nov 24, 2022 •

edited

Loading

lukemanley commented Nov 24, 2022 •

edited

Loading

lukemanley commented Nov 24, 2022

phofl commented Nov 24, 2022

phofl commented Nov 24, 2022

phofl Nov 25, 2022

lukemanley Nov 26, 2022

phofl Nov 28, 2022

lukemanley Nov 28, 2022

mroeschke left a comment

phofl commented Nov 29, 2022

BUG: Series(nullable).tolist() -> numpy scalars #49890

BUG: Series(nullable).tolist() -> numpy scalars #49890

Conversation

lukemanley commented Nov 24, 2022 • edited Loading

phofl left a comment

Choose a reason for hiding this comment

jreback commented Nov 24, 2022 • edited Loading

lukemanley commented Nov 24, 2022 • edited Loading

lukemanley commented Nov 24, 2022

phofl commented Nov 24, 2022

phofl commented Nov 24, 2022

phofl Nov 25, 2022

Choose a reason for hiding this comment

lukemanley Nov 26, 2022

Choose a reason for hiding this comment

phofl Nov 28, 2022

Choose a reason for hiding this comment

lukemanley Nov 28, 2022

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

phofl commented Nov 29, 2022

lukemanley commented Nov 24, 2022 •

edited

Loading

jreback commented Nov 24, 2022 •

edited

Loading

lukemanley commented Nov 24, 2022 •

edited

Loading