PERF: DateFrame.sort_values(by=[x,y], inplace=True) speed improvement?

#### Code Sample, a copy-pastable example if possible

```python
import pandas as pd
import numpy as np

cnt = 10000000
np.random.seed(1234)
d_orig = pd.DataFrame({'timestamp': np.random.randint(0,np.iinfo(np.int64).max, cnt), 'user_id': np.random.randint(0,np.iinfo(np.int64).max, cnt)})


d = d_orig.copy()
%timeit d.sort_values(by=['user_id', 'timestamp'], inplace=True)

def sort_one_by_one(d, col1, col2):
    """
    Equivalent to pd.sort_values(by=[col1, col2]), but faster.
    """
    d.sort_values(by=[col2], inplace=True)
    d.sort_values(by=[col1], kind='mergesort', inplace=True) # keeps ordering of sorted col2 same


d = d_orig.copy()
%timeit sort_one_by_one(d, 'user_id', 'timestamp')

```
#### Problem description
I have a timestamped dataset with user ids and other information. I need to process (with numba) sequentially the dataset and for this I need it sorted by user_id and then by timestamp for each user_id.

First and obvious aproach is:
```
data.sort_values(by=['user_id', 'timestamp'], inplace=True)
```
I'm using inplace because the dataset is HUGE (yet fits into RAM and occupis approx 1/3 of computer's RAM) and by this I hope it wont explode much during processing. 
The thing is, this direct approach is slow. I noticed, than sorting first by one column and then sort by the other (stabile sort = mergesort) is much faster. Depending on data used, I saw even 4x shorter time of processing, but on random seed 1234 data it is 3x. 

I think mine solution works (I checked it by checking that the dataset is non-decreasing in user_id and non-decreasing in timestamp for each user_id. 

Do I miss something, will this method work worse somewhere or in some situation?  Both on small and big data (raise, lower the cnt variable) it behaves similarly. 
Would you consider it an enhancement and performance speedup? (very easy to implement ;)


#### Output
1 loop, best of 3: 14.9 s per loop
1 loop, best of 3: 4.96 s per loop


#### Output of ``pd.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.6.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.20.3
pytest: 3.1.3
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.1
scipy: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: DateFrame.sort_values(by=[x,y], inplace=True) speed improvement? #17111

Code Sample, a copy-pastable example if possible

Problem description

Output

Output of `pd.show_versions()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: DateFrame.sort_values(by=[x,y], inplace=True) speed improvement? #17111

Description

Code Sample, a copy-pastable example if possible

Problem description

Output

Output of pd.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`