Skip to content

Pandas creates a large number of unnecessary threads #9394

Closed
@thomasj02

Description

@thomasj02

Simply importing pandas creates a huge number of threads:

$ gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug//usr/bin/python2.7...done.
done.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
[New Thread 0x7ffff42b6700 (LWP 19093)]
[New Thread 0x7ffff3ab5700 (LWP 19094)]
[New Thread 0x7ffff12b4700 (LWP 19095)]
[New Thread 0x7fffeeab3700 (LWP 19096)]
[New Thread 0x7fffec2b2700 (LWP 19097)]
[New Thread 0x7fffe9ab1700 (LWP 19098)]
[New Thread 0x7fffe72b0700 (LWP 19099)]
[Thread 0x7ffff42b6700 (LWP 19093) exited]
[Thread 0x7fffe72b0700 (LWP 19099) exited]
[Thread 0x7ffff3ab5700 (LWP 19094) exited]
[Thread 0x7fffec2b2700 (LWP 19097) exited]
[Thread 0x7ffff12b4700 (LWP 19095) exited]
[Thread 0x7fffe9ab1700 (LWP 19098) exited]
[Thread 0x7fffeeab3700 (LWP 19096) exited]
[New Thread 0x7fffe72b0700 (LWP 19103)]
[New Thread 0x7fffe9ab1700 (LWP 19104)]
[New Thread 0x7fffec2b2700 (LWP 19105)]
[New Thread 0x7fffeeab3700 (LWP 19106)]
[New Thread 0x7ffff3cd1700 (LWP 19107)]
[New Thread 0x7ffff12b4700 (LWP 19108)]
[New Thread 0x7fffde6e8700 (LWP 19109)]
[New Thread 0x7fffddee7700 (LWP 19110)]
[New Thread 0x7fffdac85700 (LWP 19111)]
[New Thread 0x7fffda484700 (LWP 19112)]
[New Thread 0x7fffd9c83700 (LWP 19113)]
[New Thread 0x7fffd9482700 (LWP 19114)]
[New Thread 0x7fffd8c81700 (LWP 19115)]
[New Thread 0x7fffd8480700 (LWP 19116)]
[New Thread 0x7fffd7c7f700 (LWP 19117)]
[New Thread 0x7fffd747e700 (LWP 19118)]
>>> 

Setting OMP_NUM_THREADS=1 and NUMEXPR_NUM_THREADS=1 reduces the number of threads created, but it looks like a bunch of blosc threads are still being created.

More seriously, simply importing a small pandas component also creates a ton of threads:

$ export OMP_NUM_THREADS=1
$ export NUMEXPR_NUM_THREADS=1
$ gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug//usr/bin/python2.7...done.
done.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas.tslib import Timestamp
[New Thread 0x7fffecca9700 (LWP 19238)]
[New Thread 0x7fffec4a8700 (LWP 19239)]
[New Thread 0x7fffebca7700 (LWP 19240)]
[New Thread 0x7fffeb4a6700 (LWP 19241)]
[New Thread 0x7fffeaca5700 (LWP 19242)]
[New Thread 0x7fffea4a4700 (LWP 19243)]
[New Thread 0x7fffe9ca3700 (LWP 19244)]
[New Thread 0x7fffe94a2700 (LWP 19245)]

This is kind of wasteful, and it makes it difficult to optimize thread handling on multicore systems using thread pinning or other scheduling techniques.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions