Skip to content

BUG:dataframe.groupby('some_column').timedelta.sum() results wrong when timedelta contains NaT (pandas=1.3.0) #42659

Closed
@minientertainment

Description

@minientertainment
  • [y] I have checked that this issue has not already been reported.

  • [y] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

this is a tiny example

tmp = pd.DataFrame({'id': ['1624477460271-3908654213', '1624477460271-3908654213'], 'dt':[np.nan, pd.to_timedelta('0:0:2')]})
tmp
Out[6]: 
                         id              dt
0  1624477460271-3908654213             NaT
1  1624477460271-3908654213 0 days 00:00:02

Problem description

dataframe.groupby('some_column').timedelta.sum() results wrong when timedelta contains NaT

tmp.groupby('id').dt.sum()
Out[7]: 
id
1624477460271-3908654213   -106752 days +00:12:45.145224192
Name: dt, dtype: timedelta64[ns]

Expected Output

tmp.dt.sum()
Out[8]: Timedelta('0 days 00:00:02')

Output of pd.show_versions()

[pandas 1.3.0]

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReduction Operationssum, mean, min, max, etc.RegressionFunctionality that used to work in a prior pandas versionTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions