Closed
Description
Probably caused by #50048, and probably related to #52437 (another change in behaviour that might have been caused by the same PR). In essence the change is from using DataFrame.from_records
to processing the records into a list of column arrays and then using DataFrame(dict(zip(columns, arrays)))
. That has slightly different behaviour.
Dummy reproducer for the read_sql
change:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': [0.1, 0.2, 0.3]})
from sqlalchemy import create_engine
eng = create_engine("sqlite://")
df.to_sql("test_table", eng, index=False)
pd.read_sql("SELECT a, b, a +1 as a FROM test_table;", eng)
With pandas 1.5 this returns
a b a
0 1 0.1 2
1 2 0.2 3
2 3 0.3 4
with pandas 2.0 this returns
a b
0 2 0.1
1 3 0.2
2 4 0.3
I don't know how much we want to support duplicate column names in read_sql, but it is a change in behaviour, and the new behaviour of just silently ignoring it / dropping some data also isn't ideal IMO.
cc @phofl