Skip to content

API/ENH: df.stack(level=[0,1], dropna=False) behavior #8851

Closed
@seth-p

Description

@seth-p

df.stack(level=[0,1], dropna=False) appears to be equivalent to df.stack(level=0, dropna=False).stack(level=0, dropna=False). While this makes a certain amount of sense, it results in additional rows that in many cases are probably not expected/desired. In particular, df.stack(level=[0,1], dropna=False) is not equivalent to df.stack(level=[0,1], dropna=True) even when df contains no missing values, which seems counterintuitive.

I think that when stacking multiple levels, one may want them stacked in one go, rather than sequentially -- so that when there are no missing values, df.stack(level=[0,1], dropna=False) would produce the same result as df.stack(level=[0,1], dropna=True).

Here is an example of current behavior. Since df has no missing values, I would want [6] to produce the same results as [5], not [7].

Python 3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:16:31) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.3.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.zeros((2,3)), columns=pd.MultiIndex.from_tuples([('A','x'), ('A','y'), ('B','z')], names=['Upper', 'Lower']))

In [4]: df
Out[4]:
Upper  A     B
Lower  x  y  z
0      0  0  0
1      0  0  0

In [5]: df.stack(level=[0,1], dropna=True)
Out[5]:
   Upper  Lower
0  A      x        0
          y        0
   B      z        0
1  A      x        0
          y        0
   B      z        0
dtype: float64

In [6]: df.stack(level=[0,1], dropna=False)
Out[6]:
   Upper  Lower
0  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
1  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
dtype: float64

In [7]: df.stack(level=0, dropna=False).stack(level=0, dropna=False)
Out[7]:
   Upper  Lower
0  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
1  A      x         0
          y         0
          z       NaN
   B      x       NaN
          y       NaN
          z         0
dtype: float64

In [13]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 8
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.15.1
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: None
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.6.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions