Description
🐛 Bug
I have single Machine, and 4 GPUS.
nn.parallel.DistributedDataParallel can't run at nightly build
This code can run correctly with Pytorch 1.4
Traceback (most recent call last):
File "lstm_toy.py", line 72, in
model = nn.parallel.DistributedDataParallel(model,find_unused_parameters=True,check_reduction=True)
File "/usr/local/anaconda3/envs/torch1.5/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 305, in init
self._ddp_init_helper()
File "/usr/local/anaconda3/envs/torch1.5/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 382, in _ddp_init_helper
expect_sparse_gradient)
RuntimeError: Model replicas must have an equal number of parameters.
To Reproduce
Run below code in Command line:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import time
import os
import sys
import traceback
os.environ['CUDA_VISIBLE_DEVICES'] ='0,1,2,3'
import faulthandler
faulthandler.enable()
from gpu_mem_track import MemTracker
import inspect
frame = inspect.currentframe()
gpu_tracker = MemTracker(frame)
torch.backends.cudnn.benchmark = True
BATCH_SIZE = 4
INPUT_DIM = 2048
OUTPUT_DIM = 5000
EPOCHS = 10000
HIDDEN_DIM = 2048
N_LAYERS=5
SEQ_LEN = 2000
class Net(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, hidden_layers):
super(Net, self).__init__()
self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.output_dim = output_dim
self.hidden_layers = hidden_layers
self.lstm = nn.LSTM(input_dim, hidden_dim, hidden_layers,batch_first=True)
self.h2o = nn.Linear(hidden_dim, output_dim)
def forward(self, x,y):
self.lstm.flatten_parameters()
h_t, _ = self.lstm(x)
output = self.h2o(h_t)
loss = F.mse_loss(output, y)
return loss
X_data = torch.randn((BATCH_SIZE,SEQ_LEN, INPUT_DIM)).cuda()
Y_data = torch.rand((BATCH_SIZE,SEQ_LEN, OUTPUT_DIM)).cuda()
model = Net(INPUT_DIM, HIDDEN_DIM, OUTPUT_DIM, N_LAYERS)
if 0:
model = nn.DataParallel(model)
model.cuda()
else:
torch.distributed.init_process_group(backend='nccl',init_method='tcp://localhost:'+str(np.random.randint(100,60000)),rank=0,world_size=1)
model.cuda()
model = nn.parallel.DistributedDataParallel(model,find_unused_parameters=True,check_reduction=True)
Environment
PyTorch version: 1.6.0.dev20200408+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 6.5.0-2ubuntu1~16.04) 6.5.0 20181026
CMake version: version 3.5.1
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.105
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti
Nvidia driver version: 435.21
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.13.3
[conda] torch 1.6.0.dev20200408+cu101 pypi_0 pypi
[conda] torchvision 0.6.0.dev20200408+cu101 pypi_0 pypi
[conda] warprnnt-pytorch 0.1 pypi_0 pypi
Additional context
Thanks
Meixitu
cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar