Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# next line OK
date = pd.to_datetime("Wed, 1 Dec 2021 08:00:00 -0600 (CST)", errors='coerce', utc=True, format='mixed')
# next line raises Exception -> AttributeError: 'NoneType' object has no attribute 'total_seconds'
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format='mixed')
Issue Description
I have many dates to parse, some have a TimeZone like "(CET)", "(CST)" and much others, some not. The format is not predictable, so I cannot pass a predefined format string. The shown examples may be similar here, but this is not the case in real life. After some hours of analysis I finally found one specific date, which actually raises an exception.
Sun, 14 Apr 2024 20:00:00 +0200 (CET)
Expected Behavior
First I would expect that with errors='coerce'
no error will be raised even the format is completely wrong, it should instead return "NaT", as the documentation suggests.
Second to me there is no "big" difference between the working date string Wed, 1 Dec 2021 08:00:00 -0600 (CST)
and the one that raises an error Sun, 14 Apr 2024 20:00:00 +0200 (CET)
. I.e. both have the same format, the biggest difference is the TimeZone abbreviation, which is present in both cases, but different. In fact, if I omit (CET)
, the string can be parsed correctly.
As a workaround I could manually check whether a TimeZone abbreviation is present and remove it prior to call to_datetime
. Especially when the time offset is present as well, this information is kind of redundant, i.e. it should not even be of interest for to_datetime
. But this workaround should not be necessary in my opinion, as I think this is a bug and should be resolved in "to_datetime".
See also a similar issue here: #54479. Although I cannot reproduce this in my environment.
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.12.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.10.7-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Thu, 29 Aug 2024 16:48:57 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 2.1.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : None
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None