Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Within my company we use pandas quite extensively, and with it we use the testing part for our unit tests, more specifically the assert_frame_equal
however, for the type of data frames we use the output of a failing assertion is completely unreadable.
Feature Description
Here is a MWE:
In [1]: import pandas as pd
...: from pandas.testing import assert_frame_equal
...: from datetime import datetime
...: import numpy as np
...: df1 = pd.DataFrame(np.ones((365,3)),index=pd.date_range(datetime(2022,1,1),datetime(2022,12,31)),columns=['a','b','c'])
...: df2 = df1.copy(deep=True)
...: df2.iloc[-1,0] = 0
...: assert_frame_equal(df1,df2)
AssertionError: DataFrame.iloc[:, 0] (column name="a") are different
DataFrame.iloc[:, 0] (column name="a") values are different (0.27397 %)
[index]: [2022-01-01T00:00:00.000000000, 2022-01-02T00:00:00.000000000, 2022-01-03T00:00:00.000000000, 2022-01-04T00:00:00.000000000, 2022-01-05T00:00:00.000000000, 2022-01-06T00:00:00.000000000, 2022-01-07T00:00:00.000000000, 2022-01-08T00:00:00.000000000, 2022-01-09T00:00:00.000000000, 2022-01-10T00:00:00.000000000, 2022-01-11T00:00:00.000000000, 2022-01-12T00:00:00.000000000, 2022-01-13T00:00:00.000000000, 2022-01-14T00:00:00.000000000, 2022-01-15T00:00:00.000000000, 2022-01-16T00:00:00.000000000, 2022-01-17T00:00:00.000000000, 2022-01-18T00:00:00.000000000, 2022-01-19T00:00:00.000000000, 2022-01-20T00:00:00.000000000, 2022-01-21T00:00:00.000000000, 2022-01-22T00:00:00.000000000, 2022-01-23T00:00:00.000000000, 2022-01-24T00:00:00.000000000, 2022-01-25T00:00:00.000000000, 2022-01-26T00:00:00.000000000, 2022-01-27T00:00:00.000000000, 2022-01-28T00:00:00.000000000, 2022-01-29T00:00:00.000000000, 2022-01-30T00:00:00.000000000, 2022-01-31T00:00:00.000000000, 2022-02-01T00:00:00.000000000, 2022-02-02T00:00:00.000000000, 2022-02-03T00:00:00.000000000, 2022-02-04T00:00:00.000000000, 2022-02-05T00:00:00.000000000, 2022-02-06T00:00:00.000000000, 2022-02-07T00:00:00.000000000, 2022-02-08T00:00:00.000000000, 2022-02-09T00:00:00.000000000, 2022-02-10T00:00:00.000000000, 2022-02-11T00:00:00.000000000, 2022-02-12T00:00:00.000000000, 2022-02-13T00:00:00.000000000, 2022-02-14T00:00:00.000000000, 2022-02-15T00:00:00.000000000, 2022-02-16T00:00:00.000000000, 2022-02-17T00:00:00.000000000, 2022-02-18T00:00:00.000000000, 2022-02-19T00:00:00.000000000, 2022-02-20T00:00:00.000000000, 2022-02-21T00:00:00.000000000, 2022-02-22T00:00:00.000000000, 2022-02-23T00:00:00.000000000, 2022-02-24T00:00:00.000000000, 2022-02-25T00:00:00.000000000, 2022-02-26T00:00:00.000000000, 2022-02-27T00:00:00.000000000, 2022-02-28T00:00:00.000000000, 2022-03-01T00:00:00.000000000, 2022-03-02T00:00:00.000000000, 2022-03-03T00:00:00.000000000, 2022-03-04T00:00:00.000000000, 2022-03-05T00:00:00.000000000, 2022-03-06T00:00:00.000000000, 2022-03-07T00:00:00.000000000, 2022-03-08T00:00:00.000000000, 2022-03-09T00:00:00.000000000, 2022-03-10T00:00:00.000000000, 2022-03-11T00:00:00.000000000, 2022-03-12T00:00:00.000000000, 2022-03-13T00:00:00.000000000, 2022-03-14T00:00:00.000000000, 2022-03-15T00:00:00.000000000, 2022-03-16T00:00:00.000000000, 2022-03-17T00:00:00.000000000, 2022-03-18T00:00:00.000000000, 2022-03-19T00:00:00.000000000, 2022-03-20T00:00:00.000000000, 2022-03-21T00:00:00.000000000, 2022-03-22T00:00:00.000000000, 2022-03-23T00:00:00.000000000, 2022-03-24T00:00:00.000000000, 2022-03-25T00:00:00.000000000, 2022-03-26T00:00:00.000000000, 2022-03-27T00:00:00.000000000, 2022-03-28T00:00:00.000000000, 2022-03-29T00:00:00.000000000, 2022-03-30T00:00:00.000000000, 2022-03-31T00:00:00.000000000, 2022-04-01T00:00:00.000000000, 2022-04-02T00:00:00.000000000, 2022-04-03T00:00:00.000000000, 2022-04-04T00:00:00.000000000, 2022-04-05T00:00:00.000000000, 2022-04-06T00:00:00.000000000, 2022-04-07T00:00:00.000000000, 2022-04-08T00:00:00.000000000, 2022-04-09T00:00:00.000000000, 2022-04-10T00:00:00.000000000, ...]
[left]: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]
[right]: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...]
There are three things that I can think of that could make this problem more manageable in order of my preference:
- adding a parameter to only display the differences:
assert_frame_equal(df1,df2,display_diff_only=True)
index_diff: ['2022-12-31T00:00:00.000000000]
left: [1.0]
right: [0.0]
I was unable to find where the printing code in pandas was located for this so I can't provide actual code examples but it would not involve much more than only passing the difference between the series/dataframes to the function that prints the assertions instead of the full objects.
2. add a parameter that prints the output in column format rather than row.
assert_frame_equal(df1,df2,display_columnar=True)
index a b
2022-01-01T00:00:00.000000000 1.0 1.0
2022-01-02T00:00:00.000000000 1.0 1.0
...
2022-12-31T00:00:00.000000000 0.0 1.0
- The final suggestion is to have more options for printing the date times
assert_frame_equal(df1,df2,strftime="%d-%M-%Y")
index_diff: ['2022-01-01','2022-01-02',...]
left: [1.0,1.0,...]
right: [1.0,1.0,...]
As said, I couldn't find where these things live in the code base so I can't really provide implementation examples but they are simple/ similar enough to existing functionalities that I'm hoping my usage examples are clear enough.
Alternative Solutions
The only other solution I'm aware of something akin to this answer: https://stackoverflow.com/a/72452894 which is to write manual loops to do the comparison yourself so you can output the diff.
Additional Context
No response