Skip to content

BUG: No error raised when write dta files using to_stata() if the column contains -np.inf #45350

Closed
@flcong

Description

@flcong

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
pd.DataFrame({'x': [1, np.inf]}).to_stata('test.dta')
pd.DataFrame({'x': [1, -np.inf]}).to_stata('test.dta')

Issue Description

If any column contains np.inf, then an error is raised when trying to write the DataFrame to dta file:

ValueError: Column x has a maximum value of infinity which is outside the range supported by Stata.

However, if the column contains -np.inf instead, no error is raised, but when opening the dta file (test.dta), -np.inf is displayed as -1.#INF in Stata:

     +-----------------+
     | index         x |
     |-----------------|
  1. |     0         1 |
  2. |     1   -1.#INF |
     +-----------------+

Expected Behavior

I expect the same error is raised if any column contains either np.inf or -np.inf.

It seems to be related to the following lines:

pandas/pandas/io/stata.py

Lines 607 to 612 in 66e3805

value = data[col].max()
if np.isinf(value):
raise ValueError(
f"Column {col} has a maximum value of infinity which is outside "
"the range supported by Stata."
)

When checking for infinity, line 607 pick up the maximum value of the column, which will be np.inf if it exists. However, it will not pick up -np.inf, which is the minimum value of the column. Hence, an easy fix would be replacing the quoted lines with

maxvalue = data[col].max() 
minvalue = data[col].min()
if np.isinf(maxvalue) or np.isinf(minvalue): 
    raise ValueError( 
        f"Column {col} has a maximum value of infinity (or a minimum value of -infinity) which is outside " 
        "the range supported by Stata." 
    ) 

Installed Versions

INSTALLED VERSIONS

commit : 7c48ff4
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.utf8

pandas : 1.2.5
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.1
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.06.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.22
tables : None
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.53.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions