Closed
Description
I'm in the process of implementing resolution inference for vectorized datetime parsing in array_to_datetime. This issue is to track and discuss design issues.
- Should we implement this as a breaking change in 3.0 or bugfix in 2.x? We changed the scalar behavior in 2.0.
- What to do when mixed resolutions are detected? e.g.
pd.to_datetime(["2016-01-01", "2016-01-01T02:03:04.050607"])
? ATM in the branch I have going I get the resolution from the first non-NaT entry and apply that everywhere.
i) Are we OK with that value-dependent behavior?
ii) What if we see a np.datetime64("nat", "s") i.e. it has a reso attached. should we infer "s" from that? - What resolution to infer for "today" or "now"? ATM the Timestamp constructor gives "ns", but bc it goes through the stdlib datetime.now, I'm thinking "us" might be more appropriate. (i also don't care that much)
- ATM i only handle array_to_datetime, also need to handle array_strptime.
- Need to do the same for the timedelta64 paths.
ATM 1455 tests are failing locally. Hopefully these are mostly bc they hard-code ns in "expected".
Issues that I think implementing this will address
- BUG: Assignment of Timestamp Scalar uses micrsosecond precision, Series uses nano #55487
- ENH: Support non-nano inference in to_datetime #55096
- BUG: Inconsistent datetime conversion behavior when constructing a DataFrame with Python datetimes. #55014
- BUG: Out of bound dates can be saved to feather but not loaded #47832
- BUG: Out of bounds nanosecond timestamp when s was specified #57349
- BUG: Inconsistent resolution from constructing with and assigning datetime values #57080
- [ ]