Closed
Description
.head() and .tail() are great tools for quick data interrogations, but when data is sorted they are often far from representative. It would be great if there was a simple command to pull an arbitrary number of random rows and display them for a more representative way to spotcheck data.
It would behave something like:
def rand_rows(df, num_rows = 5):
from numpy import random as rm
subset = rm.choice(df.index.values, size = num_rows)
return df.loc[subset]
a_data_frame = pd.DataFrame({'col1':range(10,20), 'col2':range(20,30)})
rand_rows(a_data_frame)
rand_rows(a_data_frame, 6)
Metadata
Metadata
Assignees
Labels
No labels