Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
data = {
"A": [15, np.nan, 5, 4],
"B": [15, 5, np.nan, 4],
}
df = pd.DataFrame(data)
df = df.astype({"A": "Int64", "B": "float64"})
df["C"] = df["A"] / df["B"]
print(df)
print(df.dtypes)
print(df.dropna(subset=["C"]))
This example outputs:
A B C
0 15 15.0 1.0
1 <NA> 5.0 <NA>
2 5 NaN NaN
3 4 4.0 1.0
A Int64
B float64
C Float64
dtype: object
A B C
0 15 15.0 1.0
2 5 NaN NaN
3 4 4.0 1.0
Problem description
Dividing an Int64
column by a float64
column results in a Float64
column that can contain both <NA>
and NaN
values. However, only <NA>
values will be removed by the dropna
function. I would expect only <NA>
values in the new column and I would also expect the dropna
function to remove all NA variants (both <NA>
and NaN
).
Expected Output
The row with index 2 should be dropped.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.9.4.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.3.0
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 49.2.1
Cython : 0.29.23
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.23.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.0
fastparquet : 0.6.3
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None