Skip to content

Series[dt64].astype(int64) vs Series[Sparse[dt64]].astype(int64) #49631

Open
@jbrockmendel

Description

@jbrockmendel
dti = pd.date_range("2016-01-01", periods=3)
ser = pd.Series(dti)
ser[0] = pd.NaT

dense = ser._values
sparse = pd.core.arrays.SparseArray(ser.values)

>>> dense.astype("int64")
array([-9223372036854775808,  1451692800000000000,  1451779200000000000])

>>> sparse.astype("int64")
[...]
ValueError: Cannot convert NaT values to integer

>>> sparse.astype("Sparse[int64]")
[0, 1451692800000000000, 1451779200000000000]
Fill: 0
IntIndex
Indices: array([1, 2], dtype=int32)

The dense version goes through DatetimeArray.astype, for which .astype to int64 is basically a view (xref #45034). The Sparse version goes through astype_nansafe which specifically checks for NaTs when going from dt64->int64. I expected this to match the non-sparse behavior.

When converting to Sparse[int64], we only call astype_nansafe on the non-NaT elements so it doesn't raise, but when converting the fill_value from NaT it somehow gets 0, whereas I expected that to raise.

Side-notes:
ser.astype(pd.SparseDtype(ser.dtype)) raises, as does dense.astype("Sparse[int64]")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions