Remove codepath asymmetry in dataframe count() #9136

qwhelan · 2014-12-22T23:05:39Z

@jreback I noticed a codepath asymmetry in core.frame.count that leads to a substantial difference in dropna() performance depending on the axis. Using the path df.dropna(axis=0) takes yields a 2.5-5x improvement.

$ python vb_suite/test_perf.py -b upstream/master -t HEAD -r "dropna" -S -n 30
Invoked with :
--ncalls: 3
--repeats: 30


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_dropna_axis1_any                       |  18.5493 | 101.6117 |   0.1826 |
frame_dropna_axis1_all                       |  48.0193 | 128.8197 |   0.3728 |
frame_dropna_axis0_any                       |  17.0240 |  17.3127 |   0.9833 |
frame_dropna_axis0_all                       |  43.1127 |  43.3304 |   0.9950 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [84ad341] : Remove codepath asymmetry in dataframe count()
Base   [099a02c] : Merge pull request #9061 from behzadnouri/nan-pivot

pivot & unstack with nan in the index


                        count       mean        std       min        25%        50%        75%         max
frame_dropna_axis1_any      3  40.114509  54.044075  0.182551   9.365917  18.549283  60.080489  101.611694
frame_dropna_axis1_all      3  59.070599  64.932673  0.372764  24.196047  48.019330  88.419517  128.819704
frame_dropna_axis0_any      3  11.773351   9.345549  0.983328   9.003684  17.024040  17.168363   17.312686
frame_dropna_axis0_all      3  29.146001  24.379745  0.994976  22.053826  43.112675  43.221513   43.330352

jreback · 2014-12-22T23:10:37Z

on a mixed type frame this will cause a big drop in perf as the .values will convert everything to object (and actually may not work correctly) - so I believe some tests should fail and need a more comprehensive perf metric

the notnull(frame) will do a block by block comparison

qwhelan · 2014-12-22T23:30:48Z

@jreback Test suite passes for me and here's a full vbench run:

Invoked with :
--ncalls: 3
--repeats: 3


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_dropna_axis1_any                       |  18.6021 | 107.5943 |   0.1729 |
frame_dropna_axis1_all                       |  50.8420 | 135.4791 |   0.3753 |
stats_rolling_mean                           |   0.5297 |   0.7520 |   0.7044 |
timeseries_1min_5min_ohlc                    |   0.6154 |   0.8706 |   0.7068 |
groupby_ngroups_100_std                      |   0.3447 |   0.4273 |   0.8066 |
frame_mask_bools                             |   4.5987 |   5.3089 |   0.8662 |
groupby_first_float32                        |   2.2663 |   2.5516 |   0.8882 |
join_dataframe_index_single_key_bigger       |   8.7790 |   9.8763 |   0.8889 |
dtype_infer_float64                          |   0.5950 |   0.6610 |   0.9002 |
datetime_index_intersection                  |   0.2736 |   0.3020 |   0.9061 |
groupby_ngroups_100_cumcount                 |   0.5593 |   0.6150 |   0.9094 |
stats_rank_pct_average                       |  23.4833 |  25.7940 |   0.9104 |
timeseries_year_incr                         |   0.0140 |   0.0153 |   0.9119 |
groupby_last_float32                         |   2.5160 |   2.7587 |   0.9120 |
frame_ctor_nested_dict_int64                 |  60.6680 |  66.2240 |   0.9161 |
groupby_last_object                          |  12.6603 |  13.6907 |   0.9247 |
join_non_unique_equal                        |   0.6026 |   0.6470 |   0.9315 |
read_parse_dates_iso8601                     |   1.0910 |   1.1697 |   0.9327 |
frame_fancy_lookup_all                       |  12.2587 |  13.1373 |   0.9331 |
dtype_infer_float32                          |   0.5314 |   0.5690 |   0.9338 |
frame_ctor_list_of_dict                      |  53.3586 |  56.9321 |   0.9372 |
reindex_fillna_backfill                      |   0.4384 |   0.4677 |   0.9373 |
frame_ctor_dtindex_BQuarterEndx1             |   1.1443 |   1.2163 |   0.9408 |
dataframe_resample_min_numpy                 |   1.2994 |   1.3796 |   0.9418 |
frame_apply_ref_by_name                      |  11.4590 |  12.1663 |   0.9419 |
groupby_last_datetimes                       |   9.0296 |   9.5770 |   0.9428 |
strings_match                                |   4.5849 |   4.8576 |   0.9439 |
groupby_ngroups_10000_sum                    |   1.8410 |   1.9477 |   0.9452 |
read_store_table_wide                        |  14.8213 |  15.6531 |   0.9469 |
reindex_fillna_pad                           |   0.2156 |   0.2276 |   0.9473 |
frame_dtypes                                 |   0.0787 |   0.0830 |   0.9483 |
frame_xs_row                                 |   0.0307 |   0.0323 |   0.9484 |
frame_reindex_both_axes                      |  26.8114 |  28.2600 |   0.9487 |
frame_iteritems_cached                       |   0.4303 |   0.4531 |   0.9498 |
stat_ops_frame_mean_int_axis_0               |   3.2063 |   3.3704 |   0.9513 |
replace_replacena                            |   0.4367 |   0.4573 |   0.9550 |
append_frame_single_mixed                    |   1.3397 |   1.4027 |   0.9551 |
timeseries_year_apply                        |   0.0137 |   0.0143 |   0.9556 |
groupby_transform_multi_key1                 |  49.4933 |  51.7907 |   0.9556 |
frame_mult                                   |   3.0183 |   3.1580 |   0.9558 |
sql_read_query_sqlalchemy                    |  38.2897 |  39.9540 |   0.9583 |
timeseries_timestamp_downsample_mean         |   3.2314 |   3.3687 |   0.9592 |
frame_iteritems                              |  22.4950 |  23.4103 |   0.9609 |
reindex_multiindex                           |   1.0570 |   1.0997 |   0.9612 |
frame_multi_and                              |  18.1444 |  18.8723 |   0.9614 |
frame_getitem_single_column                  |  16.0767 |  16.7073 |   0.9623 |
frame_interpolate_some_good                  |   1.0077 |   1.0459 |   0.9635 |
strings_findall                              |   6.9053 |   7.1664 |   0.9636 |
frame_isnull                                 |   0.3897 |   0.4043 |   0.9638 |
read_csv_infer_datetime_format_ymd           |   1.7837 |   1.8493 |   0.9645 |
groupby_ngroups_10000_prod                   |   1.8241 |   1.8903 |   0.9649 |
groupby_ngroups_10000_std                    |   1.8506 |   1.9177 |   0.9650 |
groupby_transform_multi_key3                 | 545.7989 | 565.5087 |   0.9651 |
datetimeindex_unique                         |   0.0757 |   0.0784 |   0.9655 |
frame_reindex_axis1                          | 137.0380 | 141.8733 |   0.9659 |
groupby_transform_multi_key2                 |  34.9090 |  36.1343 |   0.9661 |
frame_apply_pass_thru                        |   3.4084 |   3.5246 |   0.9670 |
eval_frame_mult_python_one_thread            |  12.7517 |  13.1846 |   0.9672 |
replace_fillna                               |   0.9201 |   0.9513 |   0.9672 |
frame_apply_lambda_mean                      |   4.1850 |   4.3253 |   0.9676 |
frame_ctor_dtindex_MonthEndx2                |   1.0470 |   1.0820 |   0.9676 |
groupby_last_float64                         |   2.6507 |   2.7383 |   0.9680 |
read_csv_skiprows                            |  10.9327 |  11.2924 |   0.9681 |
frame_ctor_nested_dict                       |  55.3357 |  57.1256 |   0.9687 |
frame_ctor_dtindex_CDayx1                    |   0.9580 |   0.9886 |   0.9691 |
timeseries_custom_bday_cal_decr              |   0.0203 |   0.0210 |   0.9697 |
frame_get_dtype_counts                       |   0.0776 |   0.0800 |   0.9702 |
groupby_multi_python                         |  64.9753 |  66.9520 |   0.9705 |
groupby_ngroups_100_var                      |   0.3077 |   0.3170 |   0.9707 |
frame_ctor_dtindex_CBMonthBeginx1            |   2.4143 |   2.4866 |   0.9709 |
multiindex_with_datetime_level_sliced        |   0.1463 |   0.1507 |   0.9710 |
groupby_nth_float32_none                     |  68.0116 |  70.0103 |   0.9715 |
stat_ops_frame_mean_int_axis_1               |   3.3977 |   3.4957 |   0.9720 |
timeseries_custom_bmonthbegin_decr_n         |   0.1976 |   0.2033 |   0.9722 |
groupby_ngroups_100_first                    |   0.3660 |   0.3763 |   0.9725 |
groupby_multi_different_functions            |   8.7744 |   9.0214 |   0.9726 |
series_constructor_ndarray                   |   0.0143 |   0.0147 |   0.9730 |
frame_to_html_mixed                          | 155.8827 | 160.1920 |   0.9731 |
frame_shift_axis0                            |   9.5163 |   9.7777 |   0.9733 |
groupby_ngroups_100_tail                     |   0.6057 |   0.6223 |   0.9733 |
groupby_frame_apply                          |  27.2137 |  27.9597 |   0.9733 |
frame_html_repr_trunc_mi                     |  27.5931 |  28.3493 |   0.9733 |
sql_string_write_sqlalchemy                  |  78.5370 |  80.6839 |   0.9734 |
frame_boolean_row_select                     |   0.1924 |   0.1976 |   0.9735 |
timeseries_custom_bday_apply                 |   0.0117 |   0.0120 |   0.9735 |
groupby_frame_apply_overhead                 |   6.3247 |   6.4950 |   0.9738 |
frame_shift_axis_1                           |  13.6857 |  14.0533 |   0.9738 |
frame_ctor_dtindex_YearBeginx1               |   0.9534 |   0.9789 |   0.9739 |
groupby_ngroups_100_cummin                   |  10.7513 |  11.0384 |   0.9740 |
frame_object_equal                           |   6.7577 |   6.9373 |   0.9741 |
frame_apply_np_mean                          |   4.4157 |   4.5323 |   0.9743 |
groupby_transform_series                     |  16.4993 |  16.9323 |   0.9744 |
groupby_ngroups_10000_size                   |   3.4057 |   3.4947 |   0.9745 |
frame_to_csv_date_formatting                 |  10.0207 |  10.2787 |   0.9749 |
groupby_ngroups_10000_var                    |   1.8206 |   1.8671 |   0.9751 |
groupby_ngroups_100_cumsum                   |  10.9870 |  11.2667 |   0.9752 |
frame_from_records_generator_nrows           |   0.6963 |   0.7140 |   0.9753 |
frame_reindex_axis0                          | 126.9610 | 130.1666 |   0.9754 |
frame_html_repr_trunc_si                     |  21.0907 |  21.6153 |   0.9757 |
frame_ctor_dtindex_Nanox1                    |   0.8257 |   0.8460 |   0.9760 |
timeseries_custom_bday_apply_dt64            |   0.0130 |   0.0133 |   0.9760 |
packers_read_sql                             | 475.9804 | 487.6467 |   0.9761 |
frame_mult_st                                |   4.7491 |   4.8637 |   0.9764 |
frame_ctor_dtindex_CBMonthBeginx2            |   2.1600 |   2.2117 |   0.9766 |
frame_drop_dup_inplace                       |   2.0154 |   2.0630 |   0.9769 |
groupby_ngroups_10000_max                    |   1.8667 |   1.9093 |   0.9776 |
groupby_transform                            | 111.2770 | 113.7640 |   0.9781 |
groupby_int_count                            |   3.1677 |   3.2380 |   0.9783 |
frame_apply_axis_1                           |  67.9550 |  69.4570 |   0.9784 |
groupby_transform_multi_key4                 | 103.6166 | 105.8946 |   0.9785 |
timeseries_large_lookup_value                |   0.0147 |   0.0150 |   0.9788 |
stat_ops_frame_sum_float_axis_0              |   3.2840 |   3.3543 |   0.9790 |
frame_insert_500_columns_end                 |  72.5020 |  74.0326 |   0.9793 |
series_align_left_monotonic                  |  13.2531 |  13.5280 |   0.9797 |
read_store                                   |   1.4670 |   1.4973 |   0.9797 |
frame_ctor_dtindex_MonthEndx1                |   1.0383 |   1.0596 |   0.9799 |
frame_getitem_single_column2                 |  16.4550 |  16.7880 |   0.9802 |
frame_mult_no_ne                             |   4.4196 |   4.5087 |   0.9803 |
stat_ops_series_std                          |   0.4000 |   0.4079 |   0.9805 |
groupby_nth_object_any                       | 826.3570 | 842.6287 |   0.9807 |
timedelta_convert_string                     |  79.3490 |  80.9013 |   0.9808 |
query_datetime_index                         |  13.2960 |  13.5486 |   0.9814 |
sql_datetime_read_as_native_sqlalchemy       |  20.8353 |  21.2290 |   0.9815 |
groupby_ngroups_10000_cumsum                 | 997.5189 | 1016.2710 |   0.9815 |
read_csv_comment2                            |  20.3653 |  20.7390 |   0.9820 |
frame_insert_100_columns_begin               |  26.0140 |  26.4893 |   0.9821 |
groupby_ngroups_100_diff                     |  10.1566 |  10.3407 |   0.9822 |
timestamp_series_compare                     |   7.6750 |   7.8133 |   0.9823 |
series_drop_duplicates_string                |   0.3377 |   0.3436 |   0.9827 |
timeseries_asof_nan                          |   2.3669 |   2.4083 |   0.9828 |
groupby_multi_count                          |   5.7224 |   5.8214 |   0.9830 |
stat_ops_frame_mean_float_axis_0             |   3.2860 |   3.3417 |   0.9834 |
frame_ctor_dtindex_CustomBusinessDayx1       |   0.9670 |   0.9833 |   0.9834 |
datetimeindex_infer_dst                      |   2.2013 |   2.2384 |   0.9835 |
timeseries_custom_bmonthend_decr_n           |   0.2236 |   0.2274 |   0.9836 |
frame_ctor_dtindex_CustomBusinessDayx2       |   0.9584 |   0.9743 |   0.9836 |
groupby_ngroups_10000_sem                    |   2.7193 |   2.7643 |   0.9837 |
groupby_nth_datetimes_any                    | 1038.0707 | 1055.2243 |   0.9837 |
timeseries_add_irregular                     |  10.5687 |  10.7396 |   0.9841 |
frame_ctor_dtindex_CBMonthEndx1              |   3.0603 |   3.1094 |   0.9842 |
groupby_ngroups_10000_min                    |   1.8713 |   1.9013 |   0.9842 |
frame_to_csv                                 | 106.9413 | 108.6443 |   0.9843 |
frame_drop_duplicates_na                     |  13.0047 |  13.2117 |   0.9843 |
sql_float_read_table_sqlalchemy              |  13.0567 |  13.2636 |   0.9844 |
left_outer_join_index                        | 1907.0257 | 1936.9203 |   0.9846 |
timeseries_custom_bday_cal_incr_neg_n        |   0.0203 |   0.0207 |   0.9846 |
reindex_fillna_pad_float32                   |   0.1799 |   0.1827 |   0.9848 |
write_store_mixed                            |  12.2370 |  12.4257 |   0.9848 |
strings_replace                              |   9.6880 |   9.8370 |   0.9849 |
series_string_vector_slice                   | 177.1453 | 179.8406 |   0.9850 |
groupby_ngroups_100_describe                 | 123.9127 | 125.7833 |   0.9851 |
frame_ctor_dtindex_CDayx2                    |   0.9597 |   0.9737 |   0.9856 |
frame_repr_wide                              |   9.9707 |  10.1153 |   0.9857 |
groupby_ngroups_10000_cumprod                | 998.0867 | 1012.5403 |   0.9857 |
indexing_dataframe_boolean                   |  23.9023 |  24.2480 |   0.9857 |
series_ctor_from_dict                        |   2.1916 |   2.2227 |   0.9860 |
frame_reindex_both_axes_ix                   |  27.9637 |  28.3584 |   0.9861 |
timedelta_convert_string_seconds             |  86.0843 |  87.2510 |   0.9866 |
timeseries_with_format_no_exact              | 563.7447 | 571.1673 |   0.9870 |
packers_read_json                            | 148.1733 | 150.1200 |   0.9870 |
frame_add                                    |   2.8750 |   2.9120 |   0.9873 |
frame_ctor_dtindex_YearEndx2                 |   0.9830 |   0.9956 |   0.9873 |
reindex_fillna_backfill_float32              |   0.1810 |   0.1833 |   0.9874 |
frame_repr_tall                              |  15.4930 |  15.6867 |   0.9877 |
groupby_ngroups_10000_cummax                 | 985.2877 | 997.4423 |   0.9878 |
groupby_ngroups_100_unique                   |   5.4403 |   5.5074 |   0.9878 |
read_csv_roundtrip_converter                 |   2.2157 |   2.2427 |   0.9880 |
sql_float_write_fallback                     |  43.3389 |  43.8657 |   0.9880 |
groupby_transform_ufunc                      |  85.5910 |  86.6243 |   0.9881 |
frame_ctor_dtindex_YearBeginx2               |   0.9499 |   0.9614 |   0.9881 |
groupby_ngroups_100_value_counts             |  39.1530 |  39.6246 |   0.9881 |
groupby_ngroups_100_count                    |   0.4110 |   0.4157 |   0.9885 |
frame_ctor_dtindex_YearEndx1                 |   0.9763 |   0.9876 |   0.9886 |
frame_loc_dups                               |   0.7643 |   0.7730 |   0.9887 |
eval_frame_and_all_threads                   |  25.3456 |  25.6294 |   0.9889 |
frame_fillna_inplace                         |   6.4066 |   6.4777 |   0.9890 |
groupby_frame_nth_none                       |   2.0367 |   2.0579 |   0.9897 |
frame_to_string_floats                       |  20.9633 |  21.1804 |   0.9897 |
frame_nonunique_equal                        |   6.8734 |   6.9423 |   0.9901 |
series_align_irregular_string                |  45.1504 |  45.5993 |   0.9902 |
sql_write_fallback                           |  70.3687 |  71.0684 |   0.9902 |
groupby_ngroups_10000_unique                 | 462.8704 | 467.4703 |   0.9902 |
eval_frame_chained_cmp_all_threads           |  20.1324 |  20.3323 |   0.9902 |
groupby_ngroups_10000_skew                   | 935.9633 | 945.1787 |   0.9903 |
datetimeindex_normalize                      |   2.2387 |   2.2604 |   0.9904 |
reshape_unstack_simple                       |   2.1416 |   2.1623 |   0.9904 |
strings_extract                              |  30.8023 |  31.0953 |   0.9906 |
packers_write_json                           |  80.5470 |  81.3080 |   0.9906 |
concat_series_axis1                          |  64.6434 |  65.2483 |   0.9907 |
stat_ops_frame_sum_int_axis_1                |   3.1667 |   3.1960 |   0.9908 |
stat_ops_level_frame_sum                     |   2.0533 |   2.0723 |   0.9908 |
lib_fast_zip                                 |   5.8857 |   5.9383 |   0.9911 |
series_getitem_label_slice                   |   0.0453 |   0.0457 |   0.9913 |
frame_interpolate                            |  65.2444 |  65.8100 |   0.9914 |
series_value_counts_strings                  |   3.5717 |   3.6023 |   0.9915 |
groupby_sum_booleans                         |   0.9027 |   0.9104 |   0.9915 |
frame_from_series                            |   0.0866 |   0.0873 |   0.9918 |
strings_title                                |   6.6930 |   6.7480 |   0.9919 |
stat_ops_frame_sum_int_axis_0                |   3.2443 |   3.2707 |   0.9919 |
sql_datetime_write_sqlalchemy                | 119.3853 | 120.3063 |   0.9923 |
dataframe_resample_min_string                |   1.2780 |   1.2877 |   0.9925 |
sql_float_read_query_sqlalchemy              |  11.5434 |  11.6300 |   0.9926 |
groupby_ngroups_100_cumprod                  |  10.9897 |  11.0710 |   0.9926 |
strings_contains_few_noregex                 |   1.8903 |   1.9036 |   0.9930 |
frame_ctor_dtindex_BDayx1                    |   0.9256 |   0.9321 |   0.9931 |
eval_frame_add_python                        |  15.9137 |  16.0240 |   0.9931 |
strings_upper                                |   4.1416 |   4.1700 |   0.9932 |
frame_ctor_dtindex_BQuarterEndx2             |   1.1430 |   1.1507 |   0.9933 |
frame_ctor_dtindex_CBMonthEndx2              |   3.0470 |   3.0673 |   0.9934 |
index_float64_boolean_indexer                |   3.3293 |   3.3513 |   0.9934 |
datetimeindex_converter                      |   0.5593 |   0.5630 |   0.9935 |
sql_read_table_sqlalchemy                    |  37.1761 |  37.4183 |   0.9935 |
groupby_multi_cython                         |  10.9857 |  11.0527 |   0.9939 |
reindex_frame_level_align                    |   0.6000 |   0.6037 |   0.9939 |
groupby_ngroups_100_min                      |   0.3813 |   0.3836 |   0.9940 |
groupby_agg_builtins2                        |  30.1764 |  30.3554 |   0.9941 |
series_timestamp_compare                     |   8.0540 |   8.1016 |   0.9941 |
frame_ctor_dtindex_BusinessDayx2             |   0.9547 |   0.9604 |   0.9941 |
frame_ctor_dtindex_QuarterBeginx2            |   1.0257 |   1.0317 |   0.9941 |
frame_drop_duplicates                        |  13.7166 |  13.7966 |   0.9942 |
series_align_int64_index                     |  27.2223 |  27.3810 |   0.9942 |
frame_ctor_dtindex_BQuarterBeginx2           |   1.2170 |   1.2240 |   0.9942 |
packers_read_pack                            |  64.6993 |  65.0670 |   0.9943 |
frame_constructor_ndarray                    |   0.0703 |   0.0707 |   0.9944 |
frame_ctor_dtindex_BDayx2                    |   0.9560 |   0.9613 |   0.9945 |
index_float64_boolean_series_indexer         |   3.3430 |   3.3614 |   0.9945 |
append_frame_single_homogenous               |   0.9284 |   0.9333 |   0.9947 |
frame_multi_and_st                           |  32.1097 |  32.2754 |   0.9949 |
join_dataframe_index_multi                   |  13.8043 |  13.8727 |   0.9951 |
dtype_infer_timedelta64_2                    |   9.1603 |   9.2057 |   0.9951 |
strings_repeat                               |   3.5223 |   3.5396 |   0.9951 |
groupby_nth_object_none                      | 415.0560 | 417.0973 |   0.9951 |
groupby_nth_float64_none                     |  54.4900 |  54.7580 |   0.9951 |
sort_level_zero                              |   9.2470 |   9.2901 |   0.9954 |
frame_ctor_dtindex_DateOffsetx2              |   0.8554 |   0.8593 |   0.9954 |
packers_write_stata                          |  21.3070 |  21.4037 |   0.9955 |
panel_shift                                  |   0.0720 |   0.0723 |   0.9956 |
packers_read_hdf_table                       |  26.6783 |  26.7960 |   0.9956 |
multiindex_from_product                      |   8.5984 |   8.6340 |   0.9959 |
frame_from_records_generator                 |  50.0429 |  50.2487 |   0.9959 |
strings_startswith                           |   3.1630 |   3.1757 |   0.9960 |
indexing_dataframe_boolean_rows              |   0.2394 |   0.2403 |   0.9960 |
strings_cat                                  |   0.5023 |   0.5043 |   0.9961 |
plot_timeseries_period                       |  93.7583 |  94.1243 |   0.9961 |
groupby_ngroups_100_last                     |   0.3710 |   0.3723 |   0.9964 |
strings_count                                |   4.8520 |   4.8693 |   0.9964 |
series_value_counts_int64                    |   1.5814 |   1.5867 |   0.9966 |
groupby_apply_dict_return                    |  25.1904 |  25.2734 |   0.9967 |
frame_ctor_dtindex_BYearBeginx1              |   1.2080 |   1.2120 |   0.9967 |
strings_get_dummies                          |  57.1720 |  57.3383 |   0.9971 |
match_strings                                |   0.3293 |   0.3303 |   0.9971 |
merge_2intkey_nosort                         |  11.1633 |  11.1943 |   0.9972 |
strings_rstrip                               |   2.8996 |   2.9077 |   0.9972 |
query_store_table                            |   3.6613 |   3.6707 |   0.9974 |
strings_pad                                  |   3.2317 |   3.2399 |   0.9974 |
eval_frame_and_python                        |  37.8340 |  37.9264 |   0.9976 |
stat_ops_level_series_sum_multiple           |   3.8520 |   3.8610 |   0.9977 |
strings_len                                  |   1.3874 |   1.3904 |   0.9978 |
reshape_stack_simple                         |   1.6863 |   1.6900 |   0.9978 |
period_setitem                               |  11.7803 |  11.8054 |   0.9979 |
timeseries_iter_datetimeindex                | 520.1587 | 521.2603 |   0.9979 |
sql_write_sqlalchemy                         | 286.8440 | 287.4134 |   0.9980 |
sql_string_write_fallback                    |  41.7540 |  41.8253 |   0.9983 |
strings_lstrip                               |   2.9674 |   2.9724 |   0.9983 |
panel_from_dict_all_different_indexes        |  82.1180 |  82.2493 |   0.9984 |
groupby_ngroups_10000_cumcount               |  51.9000 |  51.9823 |   0.9984 |
groupby_nth_datetimes_none                   | 407.5426 | 408.1493 |   0.9985 |
stat_ops_level_frame_sum_multiple            |   5.4066 |   5.4146 |   0.9985 |
write_store_table                            |  22.2797 |  22.3097 |   0.9987 |
panel_pct_change_major                       | 4664.3544 | 4670.1253 |   0.9988 |
groupby_multi_size                           |  17.9507 |  17.9723 |   0.9988 |
frame_ctor_dtindex_DateOffsetx1              |   0.8660 |   0.8670 |   0.9989 |
frame_ctor_dtindex_BYearEndx1                |   1.2314 |   1.2327 |   0.9989 |
timeseries_iter_datetimeindex_preexit        |   9.5157 |   9.5260 |   0.9989 |
frame_multi_and_no_ne                        |  21.6930 |  21.7103 |   0.9992 |
panel_pct_change_minor                       | 4684.6507 | 4687.6900 |   0.9994 |
timeseries_iter_periodindex_preexit          |   8.8243 |   8.8301 |   0.9994 |
frame_ctor_dtindex_BMonthEndx1               |   1.0529 |   1.0533 |   0.9996 |
timeseries_asof                              |   2.4884 |   2.4890 |   0.9997 |
panel_from_dict_two_different_indexes        |  55.9740 |  55.9836 |   0.9998 |
packers_write_sql                            | 2071.1230 | 2071.1553 |   1.0000 |
timeseries_asof_single                       |   0.0190 |   0.0190 |   1.0000 |
timedelta_convert_int                        |   0.1136 |   0.1136 |   1.0000 |
timestamp_ops_diff2                          |  18.5594 |  18.5564 |   1.0002 |
eval_frame_chained_cmp_python                |  43.9810 |  43.9737 |   1.0002 |
write_store_table_wide                       |  82.8057 |  82.7880 |   1.0002 |
index_int64_union                            |  56.0040 |  55.9796 |   1.0004 |
groupby_ngroups_10000_nunique                | 3166.3330 | 3164.4727 |   1.0006 |
strings_lower                                |   4.5850 |   4.5823 |   1.0006 |
frame_ctor_dtindex_BusinessDayx1             |   0.9206 |   0.9201 |   1.0006 |
stat_ops_frame_mean_float_axis_1             |   3.5447 |   3.5423 |   1.0007 |
frame_fancy_lookup                           |   2.6730 |   2.6710 |   1.0007 |
frame_ctor_dtindex_QuarterEndx2              |   1.1343 |   1.1334 |   1.0008 |
stat_ops_frame_sum_float_axis_1              |   3.3437 |   3.3406 |   1.0009 |
write_store_table_mixed                      |  27.6550 |  27.6290 |   1.0009 |
timeseries_with_format_replace               | 746.1514 | 745.3683 |   1.0011 |
frame_ctor_dtindex_Microx2                   |   0.6740 |   0.6733 |   1.0011 |
groupby_ngroups_100_all                      |   6.9750 |   6.9674 |   1.0011 |
write_store                                  |   4.7197 |   4.7130 |   1.0014 |
strings_endswith                             |   3.1797 |   3.1743 |   1.0017 |
groupby_ngroups_10000_rank                   | 985.7924 | 984.0270 |   1.0018 |
sql_read_query_fallback                      |  32.2456 |  32.1850 |   1.0019 |
groupby_series_nth_any                       |   3.1114 |   3.1054 |   1.0019 |
groupby_ngroups_100_median                   |   0.3227 |   0.3220 |   1.0020 |
eval_frame_add_python_one_thread             |  15.9946 |  15.9620 |   1.0020 |
index_str_boolean_series_indexer             |   7.7050 |   7.6880 |   1.0022 |
timeseries_1min_5min_mean                    |   0.5416 |   0.5403 |   1.0024 |
frame_ctor_dtindex_Hourx1                    |   0.6730 |   0.6713 |   1.0025 |
panel_from_dict_equiv_indexes                |  34.8890 |  34.8023 |   1.0025 |
sql_datetime_read_and_parse_sqlalchemy       |  17.3173 |  17.2730 |   1.0026 |
timeseries_to_datetime_YYYYMMDD              |   5.9891 |   5.9737 |   1.0026 |
groupby_multi_different_numpy_functions      |   8.7274 |   8.7043 |   1.0026 |
strings_strip                                |   3.1287 |   3.1203 |   1.0027 |
strings_contains_many                        |   4.3583 |   4.3464 |   1.0027 |
strings_center                               |   3.2463 |   3.2373 |   1.0028 |
stats_corr_spearman                          |  59.0444 |  58.8796 |   1.0028 |
strings_encode_decode                        |   0.2133 |   0.2127 |   1.0030 |
groupby_ngroups_10000_last                   |   1.9473 |   1.9410 |   1.0032 |
groupby_ngroups_10000_value_counts           | 3824.3503 | 3811.9609 |   1.0033 |
frame_ctor_dtindex_BMonthBeginx2             |   1.2104 |   1.2063 |   1.0034 |
eval_frame_mult_python                       |  11.4170 |  11.3787 |   1.0034 |
frame_ctor_dtindex_Minutex2                  |   0.6690 |   0.6667 |   1.0035 |
groupby_ngroups_10000_mad                    | 3611.5239 | 3599.0787 |   1.0035 |
packers_write_pack                           |  22.7517 |  22.6727 |   1.0035 |
frame_interpolate_some_good_infer            |   1.9139 |   1.9073 |   1.0035 |
groupby_ngroups_10000_tail                   |  53.7267 |  53.5363 |   1.0036 |
groupby_ngroups_10000_cummin                 | 990.6786 | 987.0000 |   1.0037 |
index_str_boolean_indexer                    |   8.3167 |   8.2856 |   1.0038 |
frame_reindex_columns                        |   0.2507 |   0.2497 |   1.0038 |
stats_rank_pct_average_old                   |  24.4993 |  24.4021 |   1.0040 |
frame_ctor_dtindex_BQuarterBeginx1           |   1.2236 |   1.2187 |   1.0040 |
join_dataframe_index_single_key_small        |   8.2960 |   8.2617 |   1.0042 |
groupby_ngroups_100_rank                     |  10.9740 |  10.9283 |   1.0042 |
timeseries_period_downsample_mean            |   9.2010 |   9.1614 |   1.0043 |
timeseries_custom_bday_cal_incr              |   0.0174 |   0.0173 |   1.0046 |
dtype_infer_int32                            |   0.5891 |   0.5864 |   1.0046 |
strings_contains_many_noregex                |   1.9867 |   1.9774 |   1.0047 |
groupby_ngroups_10000_count                  |   1.8460 |   1.8370 |   1.0049 |
index_float64_div                            |   2.1810 |   2.1703 |   1.0049 |
groupby_ngroups_100_nunique                  |  32.6370 |  32.4736 |   1.0050 |
eval_frame_mult_all_threads                  |   7.0233 |   6.9867 |   1.0052 |
concat_empty_frames2                         |   0.7420 |   0.7380 |   1.0054 |
frame_ctor_dtindex_Hourx2                    |   0.6786 |   0.6750 |   1.0054 |
series_xs_mi_ix                              |   2.9754 |   2.9587 |   1.0056 |
indexing_dataframe_boolean_rows_object       |   0.4157 |   0.4133 |   1.0058 |
frame_ctor_dtindex_Minutex1                  |   0.6716 |   0.6677 |   1.0060 |
dti_reset_index                              |   0.2813 |   0.2797 |   1.0060 |
groupby_indices                              |   4.2757 |   4.2497 |   1.0061 |
frame_ctor_dtindex_Easterx1                  |   1.0210 |   1.0146 |   1.0063 |
groupby_ngroups_10000_mean                   |   1.7073 |   1.6967 |   1.0063 |
timeseries_sort_index                        |   7.1046 |   7.0600 |   1.0063 |
frame_ctor_dtindex_Easterx2                  |   1.0147 |   1.0083 |   1.0064 |
frame_ctor_dtindex_Millix2                   |   0.6734 |   0.6687 |   1.0070 |
eval_frame_add_all_threads                   |   6.9977 |   6.9483 |   1.0071 |
index_datetime_intersection                  |   8.7484 |   8.6863 |   1.0071 |
read_store_table_panel                       |  10.2617 |  10.1870 |   1.0073 |
query_datetime_series                        |  15.9407 |  15.8237 |   1.0074 |
groupby_ngroups_100_size                     |   0.4087 |   0.4056 |   1.0076 |
query_with_boolean_selection                 |  16.6547 |  16.5263 |   1.0078 |
dataframe_resample_max_numpy                 |   1.2826 |   1.2727 |   1.0078 |
frame_ctor_dtindex_QuarterEndx1              |   1.1436 |   1.1347 |   1.0078 |
groupby_ngroups_100_mad                      |  37.4546 |  37.1620 |   1.0079 |
dataframe_resample_mean_string               |   2.2933 |   2.2753 |   1.0079 |
frame_ctor_dtindex_Weekx2                    |   0.8450 |   0.8384 |   1.0080 |
groupby_ngroups_10000_median                 |   2.2680 |   2.2493 |   1.0083 |
sql_float_write_sqlalchemy                   |  80.8311 |  80.1633 |   1.0083 |
frame_ctor_dtindex_BMonthBeginx1             |   1.2256 |   1.2153 |   1.0085 |
panel_from_dict_same_index                   |  33.1450 |  32.8560 |   1.0088 |
concat_empty_frames1                         |   0.7490 |   0.7424 |   1.0089 |
timeseries_iter_periodindex                  | 880.1933 | 872.3943 |   1.0089 |
concat_small_frames                          |  37.3684 |  37.0350 |   1.0090 |
frame_ctor_dtindex_Secondx2                  |   0.6737 |   0.6677 |   1.0090 |
strings_get                                  |   2.2707 |   2.2503 |   1.0091 |
timeseries_custom_bmonthbegin_incr_n         |   0.1850 |   0.1833 |   1.0091 |
groupby_ngroups_10000_pct_change             | 3069.1967 | 3041.4417 |   1.0091 |
groupby_agg_builtins1                        |   7.5793 |   7.5103 |   1.0092 |
frame_ctor_dtindex_Millix1                   |   0.6843 |   0.6780 |   1.0094 |
groupby_ngroups_100_max                      |   0.3740 |   0.3703 |   1.0099 |
stat_ops_level_series_sum                    |   1.4323 |   1.4183 |   1.0099 |
groupby_ngroups_10000_describe               | 12278.1421 | 12156.8247 |   1.0100 |
frame_apply_user_func                        |  67.4516 |  66.7843 |   1.0100 |
groupby_ngroups_100_mean                     |   0.3033 |   0.3003 |   1.0101 |
sql_float_read_query_fallback                |   8.0307 |   7.9500 |   1.0101 |
frame_ctor_dtindex_Weekx1                    |   0.8466 |   0.8380 |   1.0102 |
frame_xs_mi_ix                               |   3.0550 |   3.0240 |   1.0102 |
frame_mask_floats                            |   3.1620 |   3.1293 |   1.0104 |
frame_ctor_dtindex_Dayx2                     |   0.6743 |   0.6673 |   1.0105 |
reshape_pivot_time_series                    | 132.8297 | 131.4223 |   1.0107 |
strings_contains_few                         |   4.2303 |   4.1854 |   1.0107 |
frame_ctor_dtindex_Microx1                   |   0.6830 |   0.6757 |   1.0108 |
groupby_ngroups_100_skew                     |  10.1163 |  10.0080 |   1.0108 |
frame_ctor_dtindex_QuarterBeginx1            |   1.0430 |   1.0316 |   1.0110 |
groupby_multi_series_op                      |  10.1987 |  10.0866 |   1.0111 |
groupby_ngroups_100_sem                      |   0.6766 |   0.6689 |   1.0115 |
timeseries_infer_freq                        |   7.1473 |   7.0653 |   1.0116 |
frame_drop_dup_na_inplace                    |   1.8710 |   1.8493 |   1.0117 |
panel_pct_change_items                       | 5636.5933 | 5570.4817 |   1.0119 |
stats_rank2d_axis0_average                   |  17.0420 |  16.8406 |   1.0120 |
series_drop_duplicates_int                   |   0.5963 |   0.5890 |   1.0124 |
eval_frame_add_one_thread                    |   9.2280 |   9.1143 |   1.0125 |
stats_rank2d_axis1_average                   |   9.2716 |   9.1566 |   1.0126 |
stats_rank_average_int                       |  16.4713 |  16.2663 |   1.0126 |
read_csv_infer_datetime_format_custom        |   6.6024 |   6.5193 |   1.0127 |
groupby_ngroups_10000_first                  |   1.9273 |   1.9030 |   1.0128 |
frame_ctor_dtindex_BMonthEndx2               |   1.0524 |   1.0387 |   1.0132 |
frame_to_csv2                                | 107.0263 | 105.6330 |   1.0132 |
groupby_ngroups_10000_all                    | 641.7347 | 633.2950 |   1.0133 |
groupby_frame_nth_any                        |   4.8727 |   4.8079 |   1.0135 |
join_dataframe_integer_2key                  |   3.5030 |   3.4557 |   1.0137 |
frame_ctor_dtindex_BYearBeginx2              |   1.2236 |   1.2070 |   1.0138 |
write_csv_standard                           |  36.0817 |  35.5883 |   1.0139 |
dataframe_resample_mean_numpy                |   2.3357 |   2.3030 |   1.0142 |
unstack_sparse_keyspace                      |   1.1070 |   1.0913 |   1.0143 |
packers_write_csv                            | 957.6867 | 944.0500 |   1.0144 |
frame_ctor_dtindex_Secondx1                  |   0.6804 |   0.6707 |   1.0145 |
packers_read_csv                             | 143.2724 | 141.2063 |   1.0146 |
index_datetime_union                         |   8.7420 |   8.6147 |   1.0148 |
frame_ctor_dtindex_MonthBeginx2              |   1.0520 |   1.0366 |   1.0148 |
groupby_first_object                         |  14.0340 |  13.8283 |   1.0149 |
groupby_ngroups_100_cummax                   |  11.0037 |  10.8411 |   1.0150 |
eval_frame_and_python_one_thread             |  39.7060 |  39.1093 |   1.0153 |
groupby_first_datetimes                      |   8.6220 |   8.4914 |   1.0154 |
groupby_series_simple_cython                 | 163.5257 | 161.0154 |   1.0156 |
groupby_ngroups_100_any                      |   7.0360 |   6.9261 |   1.0159 |
groupby_frame_cython_many_columns            |   2.1270 |   2.0934 |   1.0161 |
groupby_frame_singlekey_integer              |   1.4733 |   1.4497 |   1.0163 |
sparse_frame_constructor                     |   4.5350 |   4.4603 |   1.0167 |
groupby_ngroups_10000_diff                   | 896.7590 | 881.9984 |   1.0167 |
eval_frame_mult_one_thread                   |   9.3827 |   9.2100 |   1.0188 |
join_dataframe_integer_key                   |   1.2490 |   1.2257 |   1.0190 |
groupby_ngroups_100_prod                     |   0.3770 |   0.3699 |   1.0191 |
groupby_ngroups_100_sum                      |   0.3773 |   0.3700 |   1.0198 |
groupby_ngroups_10000_any                    | 647.5333 | 634.9807 |   1.0198 |
groupby_ngroups_100_pct_change               |  32.0113 |  31.3670 |   1.0205 |
write_store_table_dc                         | 102.1880 | 100.1193 |   1.0207 |
timeseries_custom_bmonthend_incr             |   0.1470 |   0.1440 |   1.0210 |
packers_read_json_date_index                 | 139.0120 | 136.0533 |   1.0217 |
frame_to_csv_mixed                           | 495.9540 | 485.2450 |   1.0221 |
read_csv_standard                            |   9.2714 |   9.0706 |   1.0221 |
index_float64_mul                            |   1.7173 |   1.6800 |   1.0222 |
timeseries_custom_bday_cal_incr_n            |   0.0180 |   0.0176 |   1.0225 |
frame_iloc_dups                              |   0.1946 |   0.1903 |   1.0225 |
ctor_index_array_string                      |   0.0143 |   0.0140 |   1.0227 |
read_csv_infer_datetime_format_iso8601       |   1.4380 |   1.4053 |   1.0232 |
frame_ctor_dtindex_Nanox2                    |   0.8370 |   0.8173 |   1.0241 |
frame_dropna_axis0_any                       |  18.2440 |  17.8137 |   1.0242 |
query_store_table_wide                       |   7.3420 |   7.1683 |   1.0242 |
read_store_table                             |   1.9260 |   1.8797 |   1.0246 |
melt_dataframe                               |   1.5437 |   1.5063 |   1.0248 |
read_table_multiple_date_baseline            |  49.6910 |  48.4769 |   1.0250 |
read_csv_default_converter                   |   1.3626 |   1.3290 |   1.0253 |
strings_slice                                |   2.7270 |   2.6596 |   1.0253 |
timeseries_slice_minutely                    |   0.0413 |   0.0403 |   1.0256 |
timeseries_custom_bday_incr                  |   0.0126 |   0.0123 |   1.0258 |
groupby_pivot_table                          |  13.1516 |  12.8187 |   1.0260 |
packers_write_json_date_index                |  97.4703 |  94.9010 |   1.0271 |
sort_level_one                               |  10.4384 |  10.1577 |   1.0276 |
packers_write_hdf_store                      |  77.3354 |  75.2500 |   1.0277 |
packers_read_hdf_store                       |  23.9290 |  23.2810 |   1.0278 |
join_dataframe_index_single_key_bigger_sort  |   9.5987 |   9.3353 |   1.0282 |
frame_ctor_dtindex_Dayx1                     |   0.6866 |   0.6677 |   1.0284 |
frame_dropna_axis0_all                       |  50.6383 |  49.2280 |   1.0286 |
groupby_series_nth_none                      |   1.0450 |   1.0157 |   1.0289 |
groupby_simple_compress_timing               |  23.2643 |  22.5770 |   1.0304 |
strings_join_split                           |  30.0337 |  29.1383 |   1.0307 |
read_csv_vb                                  |  17.7174 |  17.1843 |   1.0310 |
frame_ctor_dtindex_MonthBeginx1              |   1.0637 |   1.0310 |   1.0317 |
read_store_table_mixed                       |   5.0686 |   4.9113 |   1.0320 |
groupby_transform_series2                    | 107.8140 | 104.4597 |   1.0321 |
read_store_mixed                             |   3.8270 |   3.7070 |   1.0324 |
indexing_panel_subset                        |   0.6994 |   0.6770 |   1.0330 |
index_int64_intersection                     |  11.6087 |  11.2347 |   1.0333 |
groupby_ngroups_10000_head                   |  62.0047 |  59.9450 |   1.0344 |
packers_read_pickle                          | 114.0067 | 110.1387 |   1.0351 |
indexing_dataframe_boolean_st                |  27.9946 |  27.0204 |   1.0361 |
datetimeindex_add_offset                     |   0.2019 |   0.1947 |   1.0371 |
read_csv_precise_converter                   |   1.3170 |   1.2689 |   1.0379 |
replace_large_dict                           | 9887.6210 | 9513.5167 |   1.0393 |
panel_shift_minor                            |   0.0714 |   0.0687 |   1.0394 |
groupby_dt_size                              |  20.5223 |  19.7107 |   1.0412 |
frame_float_equal                            |   1.4460 |   1.3880 |   1.0418 |
read_table_multiple_date                     | 111.1593 | 106.6784 |   1.0420 |
frame_ctor_dtindex_BYearEndx2                |   1.2523 |   1.2004 |   1.0433 |
groupby_dt_timegrouper_size                  |  16.5887 |  15.8983 |   1.0434 |
write_store_table_panel                      |  33.3457 |  31.8854 |   1.0458 |
multiindex_with_datetime_level_full          |   9.4700 |   9.0510 |   1.0463 |
dataframe_reindex                            |   0.2774 |   0.2650 |   1.0465 |
dtype_infer_datetime64                       |   6.9840 |   6.6653 |   1.0478 |
frame_iloc_big                               |   0.1233 |   0.1177 |   1.0479 |
packers_write_hdf_table                      |  53.4286 |  50.8950 |   1.0498 |
frame_reindex_upcast                         |   6.2157 |   5.9077 |   1.0521 |
eval_frame_chained_cmp_python_one_thread     |  45.9140 |  43.6314 |   1.0523 |
frame_fillna_many_columns_pad                |   3.4017 |   3.2300 |   1.0531 |
frame_add_no_ne                              |   4.7170 |   4.4734 |   1.0545 |
packers_write_pickle                         | 117.2147 | 111.0173 |   1.0558 |
frame_add_st                                 |   5.1017 |   4.8254 |   1.0573 |
dti_reset_index_tz                           |   4.9984 |   4.7223 |   1.0585 |
packers_write_stata_with_validation          |  38.0213 |  35.8077 |   1.0618 |
read_csv_thou_vb                             |  16.3200 |  15.3640 |   1.0622 |
sparse_series_to_frame                       | 102.7683 |  96.6427 |   1.0634 |
groupby_ngroups_100_head                     |   0.6347 |   0.5964 |   1.0642 |
datetime_index_union                         |   0.0443 |   0.0416 |   1.0649 |
packers_read_stata_with_validation           |  61.0547 |  57.2973 |   1.0656 |
timestamp_ops_diff1                          |  13.4957 |  12.6610 |   1.0659 |
indexing_dataframe_boolean_no_ne             |  75.4363 |  70.6543 |   1.0677 |
groupby_first_float64                        |   2.4737 |   2.3134 |   1.0693 |
reindex_frame_level_reindex                  |   0.6247 |   0.5829 |   1.0717 |
packers_read_stata                           |  51.3960 |  47.5416 |   1.0811 |
dtype_infer_uint32                           |   0.3657 |   0.3360 |   1.0882 |
timeseries_custom_bmonthend_incr_n           |   0.2033 |   0.1867 |   1.0890 |
groupby_frame_median                         |   5.4690 |   5.0160 |   1.0903 |
merge_2intkey_sort                           |  28.0894 |  25.7593 |   1.0905 |
stats_rank_average                           |  25.4117 |  23.3026 |   1.0905 |
lib_fast_zip_fillna                          |   9.8244 |   8.9866 |   1.0932 |
dataframe_resample_max_string                |   1.3886 |   1.2700 |   1.0934 |
frame_sort_index_by_columns                  |  29.9646 |  27.1937 |   1.1019 |
dtype_infer_int64                            |   0.6380 |   0.5790 |   1.1020 |
timeseries_custom_bday_decr                  |   0.0214 |   0.0193 |   1.1070 |
index_from_series_ctor                       |   0.0180 |   0.0157 |   1.1414 |
timeseries_to_datetime_iso8601               |   3.7277 |   3.2137 |   1.1599 |
timeseries_is_month_start                    |   3.0553 |   2.6130 |   1.1693 |
dtype_infer_timedelta64_1                    | 108.6977 |  92.8040 |   1.1713 |
frame_assign_timeseries_index                |   0.7353 |   0.6060 |   1.2134 |
series_getitem_pos_slice                     |   0.0480 |   0.0373 |   1.2878 |
reindex_daterange_backfill                   |   0.8203 |   0.5903 |   1.3896 |
reindex_daterange_pad                        |   0.8033 |   0.5674 |   1.4159 |
frame_get_numeric_data                       |   0.1733 |   0.0850 |   2.0402 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [6d15282] : Test passing values to notnull
Base   [0fe43a6] : Merge pull request #9120 from minrk/nbviewer-link

fix an nbviewer link in tutorials


                                        count       mean        std       min        25%        50%         75%         max
frame_dropna_axis1_any                      3  42.123092  57.443632  0.172891   9.387472  18.602053   63.098192  107.594331
frame_dropna_axis1_all                      3  62.232099  68.268287  0.375275  25.608621  50.841967   93.160510  135.479053
stats_rolling_mean                          3   0.662018   0.117045  0.529687   0.617042   0.704397    0.728184    0.751972
timeseries_1min_5min_ohlc                   3   0.730928   0.129332  0.615358   0.661079   0.706801    0.788713    0.870625
groupby_ngroups_100_std                     3   0.526194   0.246316  0.344674   0.386000   0.427326    0.616955    0.806584
frame_mask_bools                            3   3.591287   2.386549  0.866217   2.732457   4.598697    4.953822    5.308946
groupby_first_float32                       3   1.902050   0.889544  0.888186   1.577257   2.266328    2.408981    2.551635
join_dataframe_index_single_key_bigger      3   6.514730   4.902918  0.888890   4.833930   8.778969    9.327650    9.876331
dtype_infer_float64                         3   0.718731   0.160584  0.595013   0.627995   0.660976    0.780590    0.900204
datetime_index_intersection                 3   0.493892   0.357224  0.273625   0.287811   0.301997    0.604025    0.906053
groupby_ngroups_100_cumcount                3   0.694597   0.188116  0.559330   0.587185   0.615040    0.762230    0.909420
stats_rank_pct_average                      3  16.729240  13.748139  0.910415  12.196846  23.483276   24.638653   25.794029
timeseries_year_incr                        3   0.313748   0.518030  0.013987   0.014663   0.015338    0.463628    0.911917
groupby_last_float32                        3   2.062247   1.003462  0.912048   1.714039   2.516031    2.637347    2.758662
frame_ctor_nested_dict_int64                3  42.602704  36.208382  0.916103  30.792047  60.667992   63.446005   66.224019
groupby_last_object                         3   9.091931   7.091733  0.924740   6.792542  12.660344   13.175527   13.690710
join_non_unique_equal                       3   0.727030   0.178423  0.602643   0.624816   0.646989    0.789223    0.931458
read_parse_dates_iso8601                    3   1.064473   0.120680  0.932735   1.011869   1.091003    1.130342    1.169682
frame_fancy_lookup_all                      3   8.776382   6.806659  0.933118   6.595903  12.258689   12.698015   13.137341
dtype_infer_float32                         3   0.678060   0.222276  0.531356   0.550191   0.569026    0.751412    0.933799
frame_ctor_list_of_dict                     3  37.075973  31.348026  0.937234  27.147934  53.358634   55.145343   56.932052
reindex_fillna_backfill                     3   0.614456   0.279974  0.438372   0.453035   0.467698    0.702498    0.937298
frame_ctor_dtindex_BQuarterEndx1            3   1.100488   0.142900  0.940804   1.042567   1.144330    1.180331    1.216332
dataframe_resample_min_numpy                3   1.206950   0.233090  0.941820   1.120601   1.299381    1.339515    1.379649
frame_apply_ref_by_name                     3   8.189079   6.286229  0.941864   6.200448  11.459033   11.812687   12.166341
groupby_last_datetimes                      3   6.516501   4.834685  0.942841   4.986234   9.029627    9.303331    9.577036
strings_match                               3   3.462145   2.185150  0.943867   2.764408   4.584948    4.721284    4.857620
groupby_ngroups_10000_sum                   3   1.577970   0.550586  0.945202   1.393095   1.840989    1.894355    1.947721
read_store_table_wide                       3  10.473736   8.260989  0.946863   7.884077  14.821291   15.237172   15.653054
reindex_fillna_pad                          3   0.463499   0.419007  0.215610   0.221610   0.227610    0.587443    0.947277
...                                       ...        ...        ...       ...        ...        ...         ...         ...
packers_write_stata_with_validation         3  24.963612  20.729129  1.061820  18.434755  35.807689   36.914508   38.021326
read_csv_thou_vb                            3  10.915408   8.546486  1.062222   8.213117  15.364011   15.842001   16.319990
sparse_series_to_frame                      3  66.824793  57.033351  1.063385  48.853019  96.642653   99.705497  102.768342
groupby_ngroups_100_head                    3   0.765089   0.259773  0.596364   0.615517   0.634670    0.849451    1.064232
datetime_index_union                        3   0.383625   0.589990  0.041644   0.042995   0.044346    0.554616    1.064885
packers_read_stata_with_validation          3  39.805864  33.602632  1.065577  29.181443  57.297309   59.176008   61.054707
timestamp_ops_diff1                         3   9.074197   6.947911  1.065927   6.863454  12.660980   13.078332   13.495684
indexing_dataframe_boolean_no_ne            3  49.052756  41.625021  1.067681  35.860997  70.654313   73.045293   75.436274
groupby_first_float64                       3   1.952113   0.768736  1.069291   1.691333   2.313375    2.393524    2.473672
reindex_frame_level_reindex                 3   0.759794   0.270936  0.582933   0.603835   0.624736    0.848224    1.071711
packers_read_stata                          3  33.339555  28.003058  1.081073  24.311346  47.541618   49.468795   51.395973
dtype_infer_uint32                          3   0.596629   0.425989  0.336011   0.350833   0.365655    0.726938    1.088221
timeseries_custom_bmonthend_incr_n          3   0.492982   0.516211  0.186682   0.194987   0.203292    0.646133    1.088974
groupby_frame_median                        3   3.858441   2.407948  1.090310   3.053159   5.016009    5.242507    5.469004
merge_2intkey_sort                          3  18.313040  14.960627  1.090455  13.424877  25.759300   26.924332   28.089364
stats_rank_average                          3  16.601609  13.474336  1.090507  12.196571  23.302635   24.357160   25.411685
lib_fast_zip_fillna                         3   6.634735   4.817338  1.093219   5.039925   8.986632    9.405494    9.824355
dataframe_resample_max_string               3   1.250678   0.148543  1.093429   1.181703   1.269976    1.329303    1.388629
frame_sort_index_by_columns                 3  19.420069  15.924387  1.101895  14.147800  27.193705   28.579156   29.964606
dtype_infer_int64                           3   0.772986   0.286452  0.578960   0.608484   0.638008    0.869999    1.101990
timeseries_custom_bday_decr                 3   0.382562   0.627379  0.019312   0.020345   0.021378    0.564187    1.106996
index_from_series_ctor                      3   0.391704   0.649269  0.015736   0.016848   0.017961    0.579688    1.141414
timeseries_to_datetime_iso8601              3   2.700441   1.358650  1.159924   2.186824   3.213724    3.470699    3.727674
timeseries_is_month_start                   3   2.279203   0.986333  1.169287   1.891138   2.612988    2.834161    3.055334
dtype_infer_timedelta64_1                   3  67.557649  58.038920  1.171260  46.987647  92.804035  100.750844  108.697653
frame_assign_timeseries_index               3   0.851547   0.319954  0.605981   0.670632   0.735283    0.974330    1.213377
series_getitem_pos_slice                    3   0.457707   0.718942  0.037273   0.042637   0.048002    0.667924    1.287846
reindex_daterange_backfill                  3   0.933417   0.411469  0.590324   0.705322   0.820319    1.104963    1.389607
reindex_daterange_pad                       3   0.928851   0.437973  0.567357   0.685334   0.803312    1.109598    1.415885
frame_get_numeric_data                      3   0.766170   1.104248  0.084956   0.129143   0.173330    1.106777    2.040225

[527 rows x 8 columns]

jreback · 2014-12-22T23:55:18Z

the isnull determination is prob ok, but the vbenches are all very misleading. try this. In fact should add a mixed-type vbench.

In [1]: data = np.random.randn(10000, 1000)

In [2]: df = DataFrame(data)

In [3]: df.ix[50:1000,20:50] = np.nan

In [4]: df.ix[2000:3000] = np.nan

In [5]: df.ix[:,60:70] = np.nan

In [6]: df2 = df.copy()

In [7]: df2['foo'] = 'bar'

In [8]: %timeit df.dropna(how="any",axis=1)
10 loops, best of 3: 118 ms per loop

In [9]: %timeit df2.dropna(how="any",axis=1)
10 loops, best of 3: 159 ms per loop

In [10]: %timeit df2.dropna(how="any",axis=0)
1 loops, best of 3: 1.69 s per loop

In [11]: %timeit df2.dropna(how="any",axis=1)
10 loops, best of 3: 159 ms per loop

In [12]: %timeit DataFrame(df2.values).dropna(how="any",axis=1)
1 loops, best of 3: 671 ms per loop

In [13]: %timeit DataFrame(df2.values).dropna(how="any",axis=0)
1 loops, best of 3: 1.59 s per loop

jreback · 2014-12-22T23:58:39Z

@qwhelan so I think you can use this perf fix, but only if not self._is_mixed_type, otherwise need to leave it alone on axis=1.

qwhelan · 2014-12-23T00:02:45Z

@jreback Thanks for the pointer. I'll add a mixed-type case to the vb_suite and investigate this further in the next few days (getting on a plane in a few hours).

jreback · 2014-12-23T00:04:59Z

@qwhelan thanks! certainly can be perf improved.

qwhelan · 2014-12-23T19:13:13Z

Still have some cleanup to do, but here's a new vbench with the mixed-dtype case:

Invoked with :
--ncalls: 3
--repeats: 10


-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_dropna_axis0_any_mixed_dtypes          | 178.4977 | 1685.6616 |   0.1059 |
frame_dropna_axis0_all_mixed_dtypes          | 203.9243 | 1664.1789 |   0.1225 |
frame_dropna_axis1_any                       |  19.9773 | 104.9783 |   0.1903 |
frame_dropna_axis1_all                       |  52.2277 | 129.7480 |   0.4025 |
frame_dropna_axis0_all                       |  44.6427 |  45.8930 |   0.9728 |
frame_dropna_axis0_any                       |  17.4924 |  17.8483 |   0.9801 |
frame_dropna_axis1_all_mixed_dtypes          | 195.1770 | 198.6097 |   0.9827 |
frame_dropna_axis1_any_mixed_dtypes          | 170.0930 | 166.3067 |   1.0228 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [239000e] : mixed
Base   [099a02c] : Merge pull request #9061 from behzadnouri/nan-pivot

pivot & unstack with nan in the index

qwhelan · 2014-12-24T01:41:51Z

@jreback I think this patch is ready - let me know if there's anything I should address.

The axis information doesn't seem to be relevant as to which branch to take here, as notnull() is being done on the same axis in both cases. The axis argument is actually flipped here anyway, df.dropna(axis=0) leads to the axis == 1 path, which leads to the .values performance issue you noted.

jreback · 2014-12-24T01:53:34Z

@qwhelan ok this looks gr8! ping when green!

and feel free to look for more of these types of things. Anytime .values is called is usually a red-flag. (as this is only applicable on a single-dtyped frame), except in a small number of cases.

And of course not everything is profiled (though hopefully tested)

qwhelan · 2014-12-24T06:38:55Z

@jreback Travis is giving this a green - looks good to go.

I'm seeing some good candidates for investigation, so I'll try and find some time to dig in over the next few weeks.

Remove codepath asymmetry in dataframe count()

jreback · 2014-12-24T15:55:25Z

@qwhelan thank you sir!

jreback added the Performance Memory or execution speed performance label Dec 22, 2014

jreback added this to the 0.16.0 milestone Dec 22, 2014

qwhelan force-pushed the master branch from 84ad341 to 1f46547 Compare December 24, 2014 01:29

PERF: Utilize mixed dtype information in df.count()

49e27cc

qwhelan force-pushed the master branch from 1f46547 to 49e27cc Compare December 24, 2014 01:30

jreback added a commit that referenced this pull request Dec 24, 2014

Merge pull request #9136 from qwhelan/master

92d7cf7

Remove codepath asymmetry in dataframe count()

jreback merged commit 92d7cf7 into pandas-dev:master Dec 24, 2014

qwhelan mentioned this pull request Dec 28, 2014

PERF: Utilize mixed dtypes in df.count() with MultiIndexes #9163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove codepath asymmetry in dataframe count() #9136

Remove codepath asymmetry in dataframe count() #9136

Uh oh!

qwhelan commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

qwhelan commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

qwhelan commented Dec 23, 2014

Uh oh!

jreback commented Dec 23, 2014

Uh oh!

qwhelan commented Dec 23, 2014

Uh oh!

qwhelan commented Dec 24, 2014

Uh oh!

jreback commented Dec 24, 2014

Uh oh!

qwhelan commented Dec 24, 2014

Uh oh!

jreback commented Dec 24, 2014

Uh oh!

Uh oh!

Uh oh!

Remove codepath asymmetry in dataframe count() #9136

Remove codepath asymmetry in dataframe count() #9136

Uh oh!

Conversation

qwhelan commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

qwhelan commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

jreback commented Dec 22, 2014

Uh oh!

qwhelan commented Dec 23, 2014

Uh oh!

jreback commented Dec 23, 2014

Uh oh!

qwhelan commented Dec 23, 2014

Uh oh!

qwhelan commented Dec 24, 2014

Uh oh!

jreback commented Dec 24, 2014

Uh oh!

qwhelan commented Dec 24, 2014

Uh oh!

jreback commented Dec 24, 2014

Uh oh!

Uh oh!