Skip to content

Commit 091df3e

Browse files
Jean-Mathieu DeschenesJean-Mathieu Deschenes
Jean-Mathieu Deschenes
authored and
Jean-Mathieu Deschenes
committed
PERF: Released the GIL from parts of the TextReader class
The GIL was released around the tokenizer functions and the conversion function(_string_convert excluded).
1 parent eb66bcc commit 091df3e

File tree

3 files changed

+239
-84
lines changed

3 files changed

+239
-84
lines changed

asv_bench/benchmarks/gil.py

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,24 @@
11
from .pandas_vb_common import *
22
from pandas.core import common as com
3+
4+
try:
5+
from cStringIO import StringIO
6+
except ImportError:
7+
from io import StringIO
8+
39
try:
410
from pandas.util.testing import test_parallel
11+
512
have_real_test_parallel = True
613
except ImportError:
714
have_real_test_parallel = False
815

16+
917
def test_parallel(num_threads=1):
1018

1119
def wrapper(fname):
1220
return fname
21+
1322
return wrapper
1423

1524

@@ -321,6 +330,7 @@ def run(arr):
321330
algos.kth_smallest(arr, self.k)
322331
run()
323332

333+
324334
class nogil_datetime_fields(object):
325335
goal_time = 0.2
326336

@@ -435,4 +445,47 @@ def time_nogil_rolling_std(self):
435445
@test_parallel(num_threads=2)
436446
def run(arr, win):
437447
rolling_std(arr, win)
438-
run(self.arr, self.win)
448+
run(self.arr, self.win)
449+
450+
451+
class nogil_read_csv(object):
452+
number = 1
453+
repeat = 5
454+
455+
def setup(self):
456+
if (not have_real_test_parallel):
457+
raise NotImplementedError
458+
# Using the values
459+
self.df = DataFrame(np.random.randn(10000, 50))
460+
self.df.to_csv('__test__.csv')
461+
462+
self.rng = date_range('1/1/2000', periods=10000)
463+
self.df_date_time = DataFrame(np.random.randn(10000, 50), index=self.rng)
464+
self.df_date_time.to_csv('__test_datetime__.csv')
465+
466+
self.df_object = DataFrame('foo', index=self.df.index, columns=self.create_cols('object'))
467+
self.df_object.to_csv('__test_object__.csv')
468+
469+
def create_cols(self, name):
470+
return [('%s%03d' % (name, i)) for i in range(5)]
471+
472+
@test_parallel(num_threads=2)
473+
def pg_read_csv(self):
474+
read_csv('__test__.csv', sep=',', header=None, float_precision=None)
475+
476+
def time_nogil_read_csv(self):
477+
self.pg_read_csv()
478+
479+
@test_parallel(num_threads=2)
480+
def pg_read_csv_object(self):
481+
read_csv('__test_object__.csv', sep=',')
482+
483+
def time_nogil_read_csv_object(self):
484+
self.pg_read_csv_object()
485+
486+
@test_parallel(num_threads=2)
487+
def pg_read_csv_datetime(self):
488+
read_csv('__test_datetime__.csv', sep=',', header=None)
489+
490+
def time_nogil_read_csv_datetime(self):
491+
self.pg_read_csv_datetime()

doc/source/whatsnew/v0.17.1.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ Performance Improvements
6060

6161
- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
6262
- Release the GIL on some srolling algos (``rolling_median``, ``rolling_mean``, ``rolling_max``, ``rolling_min``, ``rolling_var``, ``rolling_kurt``, `rolling_skew`` (:issue:`11450`)
63+
- Release the GIL when reading and parsing text files in ``read_csv``, ``read_table`` (:issue:`11272`)
6364
- Improved performance of ``rolling_median`` (:issue:`11450`)
6465

6566
- Improved performance to ``to_excel`` (:issue:`11352`)

0 commit comments

Comments
 (0)