-
Notifications
You must be signed in to change notification settings - Fork 533
ResourceMultiProc plugin and runtime profiler #1372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 79 commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
b5a6024
Removed S3 datasink stuff
pintohutch 70ca457
Started adding in logic for num_threads and changed names of real mem…
pintohutch 36e1446
Added cmd-level threads and memory profiling
pintohutch 61b8d0c
Resolved conflicts from re-basing and pulling in nipype master
pintohutch a3c9be7
Merge branch 'nipy-master' into resource_multiproc
pintohutch 43c0d56
remove MultiProc, MultiprocPlugin is default
carolFrohlich 0bb6d79
change old namespaces
carolFrohlich a68e0e6
Added initial num_threads monitoring code
pintohutch 5dac574
Merged Carol's changes
pintohutch 97e7333
Manual merge of s3_datasink and resource_multiproc branch for cpac run
pintohutch 08a485d
Manual merge of s3_datasink and resource_multiproc branch for cpac run
pintohutch e5945e9
Changed resources fetching to its function and try-blocked it in case…
pintohutch 9cb7a68
Fixed pickling bug of instance method by passing profiling flag inste…
pintohutch 2de8786
Merge pull request #6 from nipy/master
pintohutch 6e3a7b5
Merged resource_multiproc into s3_multiproc
pintohutch fe0a352
Merged resource_multiproc into s3_multiproc
pintohutch 7cc0731
Merge branch 's3_multiproc' into resource_multiproc
pintohutch 7b3d19f
Re-pulled in changes from github
pintohutch 5733af9
Fixed hsarc related to yrt blocking
pintohutch 544dddf
Removed forcing of runtime_profile to be off:
pintohutch c074299
Made when result is None that the end stats are N/A
pintohutch a4e3ae6
Added try-blocks around the runtime profile stats in callback logger
pintohutch e25ac8c
Cleaned up some code and removed recursion from get_num_threads
pintohutch d714a03
Added check for runtime having 'get' attribute
pintohutch 27ee192
Removed print statements
pintohutch c99f834
Removed more print statements and touched up some code to be more lik…
pintohutch 07461cf
Added a fix for the recursive symlink bug (was happening because whil…
pintohutch 116a6a1
Removed node.run level profiling
pintohutch c1376c4
Updated keyword in result dictionary to runtime instead of cmd-level
pintohutch e3f54c1
Removed afni centrality interface (will do that in another branch)
pintohutch cbd08e0
Added back in the automaskinputspec
pintohutch 29bcd80
Added unit tests for runtime_profiler
pintohutch a170644
Added import and reduced resources used
pintohutch 250b6d3
Added runtime_profile to run by default unless the necessary packages…
pintohutch f4b0b73
Cleaned up some of the code to PEP8 and checked for errors
pintohutch 9d19e14
Changed memory parameters to be memory_gb to be more explicit, used r…
pintohutch 0388305
Added checks for python deps and added method using builtin std libra…
pintohutch 1e4ce5b
Fixed exception formatting and import error
pintohutch ace7368
Removed 'Error' from logger info message when memory_profiler or psut…
pintohutch a515c77
Added more code for debugging runtime profiler
pintohutch e1d19cb
improve thread draw algorithm
carolFrohlich 0fdc671
Wrote my own get memory function - seems to work much better
pintohutch 9ad5d24
Restructured unittests and num_threads logic
pintohutch a52395a
Ignored sleeping
pintohutch 2b0a6e2
Remove proc terminology from variable names
pintohutch 97d33bb
Updated dictionary names for results dict
pintohutch 36ded7b
Added recording timestamps
pintohutch 340a7b7
minor bugs
carolFrohlich 7062ec8
Just passed all unittests
pintohutch cf16091
Fixed a small bug in multiproc and added 90% of the user documentatio…
pintohutch 126fa5d
Merge pull request #9 from FCP-INDI/resource_multiproc
pintohutch 7c90d5a
Added some details about gantt chart
pintohutch 8fce738
partial commit to gantt chart
carolFrohlich 950bedb
Merge pull request #11 from FCP-INDI/resource_multiproc
pintohutch 13fddc8
Fixed up gantt chart to plot real time memory
pintohutch 54a2c63
Finished working prototype of gantt chart generator
pintohutch 8ca97c8
remove white space, add labels
carolFrohlich 6ac0bc3
Changed thread count logic
pintohutch 5c2f2c1
Merge branch 'debug_runtime_prof' of https://github.com/fcp-indi/nipy…
pintohutch 7a8383b
Experimented with process STATUS
pintohutch be1ec62
Added global watcher
pintohutch 6fe8391
Debug code
pintohutch fcaec79
Cleaned up debug code
pintohutch de6a7f1
Removed print debug statement
pintohutch 47003a4
Merge pull request #16 from nipy/master
pintohutch e937bdc
Finished documentation with gantt chart image
pintohutch 2683de8
Merge branch 'debug_runtime_prof' of https://github.com/fcp-indi/nipy…
pintohutch 7513f40
Updated docs with dependencies
pintohutch a0920b7
Merge pull request #17 from FCP-INDI/debug_runtime_prof
pintohutch 3814ae9
Moved gantt chart to proper images folder
pintohutch bdcffca
Merge pull request #18 from FCP-INDI/debug_runtime_prof
pintohutch 0189dd8
Fixed some failing unit tests
pintohutch de3b298
Modified thread-monitoring logic and ensured unit tests pass
pintohutch 6f4c2e7
Lowered memory usage for unit tests
pintohutch 0cca692
Changed runtime unit test tolerance to GB instead of %
pintohutch 3f7649c
Merge pull request #19 from nipy/master
pintohutch 8f1b104
Added AttributeError to exception for more specific error handling
pintohutch ffe30da
Removed unexpected indentation from resource scheduler and profiler u…
pintohutch cf08147
Updated docs to reflect png
pintohutch 35bdb2d
Fixed some errors
pintohutch d83d849
Reversed logic for afni OMP_NUM_THREADS
pintohutch b21bca6
Added traceback crash reporting
pintohutch 79d1988
More specific exception handling addressed
pintohutch fd788f8
Added safeguard for not taking 100% of system memory when plugin_arg …
pintohutch d6cafa2
Fix attribute error related to terminal_output attribute
pintohutch 9b46c49
Trying to disable some unittests for TravisCI debugging
pintohutch 46f3275
Commented out more unittests for TravisCI debugging
pintohutch 41c6928
Enabled runtime profiler testing option
pintohutch f0a3889
Added one test back in
pintohutch 0305930
Added threads unit test back in
pintohutch b566b22
Changed max memory to 1 GB
pintohutch e7eac16
Fixed some typos
pintohutch e600d57
Added missing mark
pintohutch 9ee881f
Merge branch 'resource_multiproc' of https://github.com/fcp-indi/nipy…
pintohutch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
.. _resource_sched_profiler: | ||
|
||
============================================ | ||
Resource Scheduling and Profiling with Nipype | ||
============================================ | ||
The latest version of Nipype supports system resource scheduling and profiling. | ||
These features allows users to ensure high throughput of their data processing | ||
while also controlling the amount of computing resources a given workflow will | ||
use. | ||
|
||
|
||
Specifying Resources in the Node Interface | ||
========================================== | ||
Each ``Node`` instance interface has two parameters that specify its expected | ||
thread and memory usage: ``num_threads`` and ``estimated_memory_gb``. If a | ||
particular node is expected to use 8 threads and 2 GB of memory: | ||
|
||
:: | ||
|
||
import nipype.pipeline.engine as pe | ||
node = pe.Node() | ||
node.interface.num_threads = 8 | ||
node.interface.estimated_memory_gb = 2 | ||
|
||
If the resource parameters are never set, they default to being 1 thread and 1 | ||
GB of RAM. | ||
|
||
|
||
Resource Scheduler | ||
================== | ||
The ``MultiProc`` workflow plugin schedules node execution based on the | ||
resources used by the current running nodes and the total resources available to | ||
the workflow. The plugin utilizes the plugin arguments ``n_procs`` and | ||
``memory_gb`` to set the maximum resources a workflow can utilize. To limit a | ||
workflow to using 8 cores and 10 GB of RAM: | ||
|
||
:: | ||
|
||
args_dict = {'n_procs' : 8, 'memory_gb' : 10} | ||
workflow.run(plugin='MultiProc', plugin_args=args_dict) | ||
|
||
If these values are not specifically set then the plugin will assume it can | ||
use all of the processors and memory on the system. For example, if the machine | ||
has 16 cores and 12 GB of RAM, the workflow will internally assume those values | ||
for ``n_procs`` and ``memory_gb``, respectively. | ||
|
||
The plugin will then queue eligible nodes for execution based on their expected | ||
usage via the ``num_threads`` and ``estimated_memory_gb`` interface parameters. | ||
If the plugin sees that only 3 of its 8 processors and 4 GB of its 10 GB of RAM | ||
are being used by running nodes, it will attempt to execute the next available | ||
node as long as its ``num_threads <= 5`` and ``estimated_memory_gb <= 6``. If | ||
this is not the case, it will continue to check every available node in the | ||
queue until it sees a node that meets these conditions, or it waits for an | ||
executing node to finish to earn back the necessary resources. The priority of | ||
the queue is highest for nodes with the most ``estimated_memory_gb`` followed | ||
by nodes with the most expected ``num_threads``. | ||
|
||
|
||
Runtime Profiler and using the Callback Log | ||
=========================================== | ||
It is not always easy to estimate the amount of resources a particular function | ||
or command uses. To help with this, Nipype provides some feedback about the | ||
system resources used by every node during workflow execution via the built-in | ||
runtime profiler. The runtime profiler is automatically enabled if the | ||
psutil_ Python package is installed and found on the system. | ||
|
||
.. _psutil: https://pythonhosted.org/psutil/ | ||
|
||
If the package is not found, the workflow will run normally without the runtime | ||
profiler. | ||
|
||
The runtime profiler records the number of threads and the amount of memory (GB) | ||
used as ``runtime_threads`` and ``runtime_memory_gb`` in the Node's | ||
``result.runtime`` attribute. Since the node object is pickled and written to | ||
disk in its working directory, these values are available for analysis after | ||
node or workflow execution by manually parsing the pickle file contents. | ||
|
||
Nipype also provides a logging mechanism for saving node runtime statistics to | ||
a JSON-style log file via the ``log_nodes_cb`` logger function. This is enabled | ||
by setting the ``status_callback`` parameter to point to this function in the | ||
``plugin_args`` when using the ``MultiProc`` plugin. | ||
|
||
:: | ||
|
||
from nipype.pipeline.plugins.callback_log import log_nodes_cb | ||
args_dict = {'n_procs' : 8, 'memory_gb' : 10, 'status_callback' : log_nodes_cb} | ||
|
||
To set the filepath for the callback log the ``'callback'`` logger must be | ||
configured. | ||
|
||
:: | ||
|
||
# Set path to log file | ||
import logging | ||
callback_log_path = '/home/user/run_stats.log' | ||
logger = logging.getLogger('callback') | ||
logger.setLevel(logging.DEBUG) | ||
handler = logging.FileHandler(callback_log_path) | ||
logger.addHandler(handler) | ||
|
||
Finally, the workflow can be run. | ||
|
||
:: | ||
|
||
workflow.run(plugin='MultiProc', plugin_args=args_dict) | ||
|
||
After the workflow finishes executing, the log file at | ||
"/home/user/run_stats.log" can be parsed for the runtime statistics. Here is an | ||
example of what the contents would look like: | ||
|
||
:: | ||
|
||
{"name":"resample_node","id":"resample_node", | ||
"start":"2016-03-11 21:43:41.682258", | ||
"estimated_memory_gb":2,"num_threads":1} | ||
{"name":"resample_node","id":"resample_node", | ||
"finish":"2016-03-11 21:44:28.357519", | ||
"estimated_memory_gb":"2","num_threads":"1", | ||
"runtime_threads":"3","runtime_memory_gb":"1.118469238281"} | ||
|
||
Here it can be seen that the number of threads was underestimated while the | ||
amount of memory needed was overestimated. The next time this workflow is run | ||
the user can change the node interface ``num_threads`` and | ||
``estimated_memory_gb`` parameters to reflect this for a higher pipeline | ||
throughput. Note, sometimes the "runtime_threads" value is higher than expected, | ||
particularly for multi-threaded applications. Tools can implement | ||
multi-threading in different ways under-the-hood; the profiler merely traverses | ||
the process tree to return all running threads associated with that process, | ||
some of which may include active thread-monitoring daemons or transient | ||
processes. | ||
|
||
|
||
Visualizing Pipeline Resources | ||
============================== | ||
Nipype provides the ability to visualize the workflow execution based on the | ||
runtimes and system resources each node takes. It does this using the log file | ||
generated from the callback logger after workflow execution - as shown above. | ||
The pandas_ Python package is required to use this feature. | ||
|
||
.. _pandas: http://pandas.pydata.org/ | ||
|
||
:: | ||
|
||
from nipype.pipeline.plugins.callback_log import log_nodes_cb | ||
args_dict = {'n_procs' : 8, 'memory_gb' : 10, 'status_callback' : log_nodes_cb} | ||
workflow.run(plugin='MultiProc', plugin_args=args_dict) | ||
|
||
# ...workflow finishes and writes callback log to '/home/user/run_stats.log' | ||
|
||
from nipype.utils.draw_gantt_chart import generate_gantt_chart | ||
generate_gantt_chart('/home/user/run_stats.log', cores=8) | ||
# ...creates gantt chart in '/home/user/run_stats.log.html' | ||
|
||
The `generate_gantt_chart`` function will create an html file that can be viewed | ||
in a browser. Below is an example of the gantt chart displayed in a web browser. | ||
Note that when the cursor is hovered over any particular node bubble or resource | ||
bubble, some additional information is shown in a pop-up. | ||
|
||
* - .. image:: images/gantt_chart.png | ||
:width: 100 % |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -157,6 +157,11 @@ def __init__(self, **inputs): | |
else: | ||
self._output_update() | ||
|
||
# Update num threads estimate from OMP_NUM_THREADS env var | ||
# Default to 1 if not set | ||
import os | ||
self.num_threads = int(os.getenv('OMP_NUM_THREADS', 1)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we do this other way around? If user sets num_threads to say 4 we would set the OMP_NUM_THREADS variable to 4? I think this is how it's done for ants: https://github.com/nipy/nipype/blob/master/nipype/interfaces/ants/base.py#L52 |
||
|
||
def _output_update(self): | ||
""" i think? updates class private attribute based on instance input | ||
in fsl also updates ENVIRON variable....not valid in afni | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ` ?