How to implement model parallelism using PyTorch on an HPC environment?

Hello,
I am trying to implement model parallelism using PyTorch on my HPC environment, which has 4 GPUs available. My goal is to split a neural network model across these GPUs to improve training efficiency.

Here's what I've tried so far:

Followed the PyTorch documentation on model parallelism
Implemented a basic split of the model across GPUs
However, I am encountering performance bottlenecks and underutilization of the GPUs. Can someone guide me on how to implement this in my HPC setup?

Any advice or pointers to resources would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to implement model parallelism using PyTorch on an HPC environment? #896

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to implement model parallelism using PyTorch on an HPC environment? #896

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions