Skip to content

BUG/COMPAT: assert_* functions failing with nested arrays and latest numpy #50360

Open
@jorisvandenbossche

Description

@jorisvandenbossche

When using the latest nightly numpy (1.25.0.dev0+..), we are getting some errors in the pyarrow test suite, that come from pandas functions (assert_series_equal, -> array_equivalent). Those errors don't happen with numpy 1.24.

A few examples that we encountered in the pyarrow tests seem all to boil down to the situation where you have multiple levels of nesting (an array of arrays of arrays), but where the one is a numpy array of nested numpy arrays, and the other is a numpy array of nested lists.

import pandas._testing as tm

# one level of nesting -> ok
s1 = pd.Series([np.array([1, 2, 3]), np.array([4, 5])], dtype=object)
s2 = pd.Series([[1, 2, 3], [4, 5]], dtype=object)

>>> tm.assert_series_equal(s1, s2)

# >1 level of nesting -> fails
s1 = pd.Series([np.array([[1, 2, 3], [4, 5]], dtype=object), np.array([[6], [7, 8]], dtype=object)], dtype=object)
s2 = pd.Series([[[1, 2, 3], [4, 5]], [[6], [7, 8]]], dtype=object)

>>> tm.assert_series_equal(s1, s2)
...
File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan)
    579 else:
    580     try:
--> 581         if np.any(np.asarray(left_value != right_value)):
    582             return False
    583     except TypeError as err:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Another example, that is essentially the same situation, but that gives a different error message (from numpy):

# construct inner array with explicitly setting up the shape (otherwise numpy will infer it as 2D)
arr = np.empty(2, dtype=object)
arr[:] = [np.array([None, 'b'], dtype=object), np.array(['c', 'd'], dtype=object)]
s1 = pd.Series(np.array([arr, None], dtype=object))
s2 = pd.Series(np.array([list([[None, 'b'], ['c', 'd']]), None], dtype=object))

In [32]: tm.assert_series_equal(s1, s2)
...
File ~/scipy/repos/pandas-build-arrow/pandas/core/dtypes/missing.py:581, in _array_equivalent_object(left, right, strict_nan)
    579 else:
    580     try:
--> 581         if np.any(np.asarray(left_value != right_value)):
    582             return False
    583     except TypeError as err:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Another case with nesting of dictionaries (with the same error traceback):

s1 = pd.Series([{'f1': 1, 'f2': np.array(['a', 'b'], dtype=object)}], dtype=object)
s2 = pd.Series([{'f1': 1, 'f2': ['a', 'b']}], dtype=object)

>>> tm.assert_series_equal(s1, s2)
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Compatpandas objects compatability with Numpy or Python functionsTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions