Skip to content

ENH: support defaultdict in read_csv dtype parameter��#41574

Closed
@jtbr

Description

@jtbr

I have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I'd like to load them as floats without read_csv having to spend the time divining their types.

dtype allows providing a dict, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict, with a default of np.float32, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv. Presumably read_csv is not directly querying the dictionary, but rather checking first whether an item is there.

If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren't being treated as float32 and why read_csv was still trying to determine the types of these columns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions