Skip to content

PERF: MultiIndex.size #48723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 23, 2022
Merged

PERF: MultiIndex.size #48723

merged 2 commits into from
Sep 23, 2022

Conversation

lukemanley
Copy link
Member

Improvement comes from avoiding MultiIndex._values

import pandas as pd
import numpy as np

mi = pd.MultiIndex.from_product(
    [
        pd.date_range('1970-01-01', periods=1000),
        np.arange(1000),
    ]
)

%timeit mi.copy().size
111 ms ± 9.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)       <- main
17.7 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)  <- PR

At least one existing asv shows improvement (there may be others):

 before           after         ratio
     [cda0f6be]       [8395c77a]
     <main>           <multiindex-size>
-      25.1±0.4ms         18.3±1ms     0.73  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'symmetric_difference')
-      24.8±0.4ms         16.7±2ms     0.68  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'symmetric_difference')
-      24.4±0.8ms       15.8±0.3ms     0.65  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'symmetric_difference')
-      25.3±0.4ms         16.4±1ms     0.65  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'symmetric_difference')
-      23.9±0.4ms       13.5±0.2ms     0.57  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference')
-      23.0±0.6ms         12.1±2ms     0.53  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'symmetric_difference')
-        25.1±1ms       13.0±0.6ms     0.52  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference')
-      23.0±0.4ms       11.2±0.6ms     0.49  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'symmetric_difference')
@lukemanley lukemanley added Performance Memory or execution speed performance MultiIndex labels Sep 23, 2022
@mroeschke mroeschke added this to the 1.6 milestone Sep 23, 2022
@mroeschke mroeschke merged commit 2fbdd1e into pandas-dev:main Sep 23, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the multiindex-size branch September 24, 2022 00:48
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* add MultiIndex.size

* whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
2 participants