Skip to content

API: any/all logical operation for datetime-like dtypes #34479

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

In my attempt to enable DataFrame reductions for ExtensionArrays (and at the same time cleaning up DataFrame._reduce a bit) at #32867, I am running into the following inconsistency.

Consider a Timedelta array, and how it behaves for the any and all logical functions (on current master, bit similar for released pandas):

>>> td = pd.timedelta_range(0, periods=3) 

# Index
>>> td.any() 
...
TypeError: cannot perform any with this index type: TimedeltaIndex

# Series
>>> pd.Series(td).any()  
...
TypeError: cannot perform any with type timedelta64[ns]

# DataFrame
>>> pd.DataFrame({'a': td}).any()
a    True
dtype: bool

(and exactly the same pattern for datetime64[ns])

For DataFrame, it works because there we call pandas.core.nanops.nanany(), and our internal version of nan-aware any/all work fine with timedelta64/datetime64 dtypes.
The reason it fails for Index and Series is because we simply up front say: this operation is not possible with this dtype (we never get to call nanany() in those cases). But in the DataFrame reduction, we simply try it on df.values, and it works.

This is clearly an inconsistency that we should solve. And especially for my PR #32867 this gives problems, as I want to rely on the array (column-wise) ops for DataFrame reductions, but thus those give a differen result leading to failing tests.


The question is thus: do we think that the any() and all() operations make sense for datetime-like data?
(and if yes, we can simply enable them for the array/index/series case)

  • For timedelta, I think you can say there is a concept of "zero" and "non-zero" (which is what any/all are testing in the end, for non-boolean, numeric data)
  • But for datetimes this might make less sense? The fact that 1970-01-01 00:00:00 would be "False" is just an implementation detail of the numbering starting at that point in time.

As reference, this operation fails in numpy for both timedelta and datetime:

>>> np.array([0, 1], dtype="datetime64[ns]").any()
...
TypeError: No loop matching the specified signature and casting was found for ufunc logical_or

>>> np.array([0, 1], dtype="timedelta64[ns]").any() 
...
TypeError: No loop matching the specified signature and casting was found for ufunc logical_or

But for timedeltas, that might rather be an issue of "never got implemented" rather than "deliberately not implemented" ? (there was a similar conversation about "mean" for timedelta) cc @shoyer

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions