Open
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently, pandas has separate IO methods for each file format (to_csv, read_parquet, etc.). This requires users to:
- Remember multiple method names
- Change code when switching formats
Feature Description
A unified save
/read
API would simplify common IO operations while maintaining explicit control when needed:
- File type is inferred from the filepath extension, but a
format
arg can be passed to be explicit, raising an error in some cases where the inferred file type disagrees with passed file type. - Both methods accept
**kwargs
and pass them along to the underlying file-type-specific pandas IO methods. - Optionally, support some basic translation across discrepancies in arg names in existing IO methods (i.e. "usecols" in
read_csv
vs "columns" inread_parquet
).
# Simplest happy path:
df.save('data.csv') # Uses to_csv
df = pd.read('data.parquet') # Uses read_parquet
# Optionally, be explicit about expected file type
df.save('data.csv', format="csv") # Uses to_csv
df = pd.read('data.parquet', format="parquet") # Uses read_parquet
# Raises ValueError for conflicting format info:
df.save('data.csv', format='parquet') # Conflicting types
df.save('data.txt', format='csv') # .txt implies text format
# Reading allows overrides for misnamed files (or should we require users to rename their files properly first?)
df = pd.read('mislabeled.txt', format='parquet')
# Not sure if we should allow save when inferred file type is not a standard type:
df.save('data', format='csv') # No extension, needs type
df.save('mydata.unknown', format='csv') # Unclear extension
Alternative Solutions
Existing functionality is OK, just not the simplest to use.
Additional Context
No response