Description
Add Link
https://pytorch.org/tutorials/intermediate/ensembling.html
https://pytorch.org/docs/stable/notes/extending.func.html#defining-the-vmap-staticmethod
Describe the bug
🐛 Describe the bug
I want to use vmap to vectorize the ensemble models inherited from torch.autograd.Function. And torch.autograd.Function’s forward/backward calls into functions from cuda. etc,
Firstly, I set generate_vmap_rule=True ,which means calling the system's vmap function directly.
error: RuntimeError: Cannot access data pointer of Tensor that doesn't have storage
Becaue model calls for cuda system,I need to write the own vmap,
def vmap(info,in_dims,input):
if in_dims[0] is not None:
input_B = input.shape[0]
input = einops.rearrange(input,'B N C -> (B N) C')
outputs,_,_ = model.apply(input)
if in_dims[0] is not None:
outputs = einops.rearrange(input,'(B N) C -> B N C',B = input_B)
return outputs,(0)
error: RuntimeError: CUDA error: an illegal memory access was encountered,CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
How can I write the vmap.py to deal the Multiple models process multiple batches of data and models call for cuda to process data?
code follows,I simplify the model class.
def model(torch.autograd.Function):
def foward():
calls for cuda forward
def backward():
calls for cuda backward
def setup_context():
@staticmethod
def vmap():
from torch.func import stack_module_state
b_p = torch.randn([10,100,3]).cuda()
objs = [model() for i in range(10)]
pe_models = []
for obj in objs:
pe_models.append(obj.pe)
pe_param, pe_buffer = stack_module_state(pe_models)
base_model = copy.deepcopy(pe_models[0])
def fmodel(params,buffers,x):
return functional_call(base_model,(params,buffers),x)
out = vmap(fmodel)(pe_param,pe_buffer,b_p)
Describe your environment
Versions
pytorch2.0
cuda11.7
python 3.8
ubuntu20.4
collect_env.py error update later
cc @albanD