Skip to content

ENH/API: resolution inference in vectorized datetime parsing #55564

Closed
@jbrockmendel

Description

@jbrockmendel

I'm in the process of implementing resolution inference for vectorized datetime parsing in array_to_datetime. This issue is to track and discuss design issues.

  1. Should we implement this as a breaking change in 3.0 or bugfix in 2.x? We changed the scalar behavior in 2.0.
  2. What to do when mixed resolutions are detected? e.g. pd.to_datetime(["2016-01-01", "2016-01-01T02:03:04.050607"])? ATM in the branch I have going I get the resolution from the first non-NaT entry and apply that everywhere.
    i) Are we OK with that value-dependent behavior?
    ii) What if we see a np.datetime64("nat", "s") i.e. it has a reso attached. should we infer "s" from that?
  3. What resolution to infer for "today" or "now"? ATM the Timestamp constructor gives "ns", but bc it goes through the stdlib datetime.now, I'm thinking "us" might be more appropriate. (i also don't care that much)
  4. ATM i only handle array_to_datetime, also need to handle array_strptime.
  5. Need to do the same for the timedelta64 paths.

ATM 1455 tests are failing locally. Hopefully these are mostly bc they hard-code ns in "expected".

cc @MarcoGorelli

Issues that I think implementing this will address

Metadata

Metadata

Assignees

No one assigned

    Labels

    ConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtypeEnhancementNon-Nanodatetime64/timedelta64 with non-nanosecond resolution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions