Skip to content

RuntimeError: Number of dpcpp devices should be greater than zero! #287

Open
@axel588

Description

@axel588

Hello,
I used the gpu configuration
oneAPI is installed correctly
I am in a python virtual environment ai_tr
I have this issue with Pytorch, the two import are on the top of the file :
import torch
import intel_extension_for_pytorch as ipex

I runned: source ${ONEAPI_ROOT}/setvars.s
with output :

(ai_tr) axel@Artishima:~/ai_tr/cod$ source ${ONEAPI_ROOT}/setvars.sh

:: WARNING: setvars.sh has already been run. Skipping re-execution.
   To force a re-execution of setvars.sh, use the '--force' option.
   Using '--force' can result in excessive use of your environment variables.

usage: source setvars.sh [--force] [--config=file] [--help] [...]
  --force        Force setvars.sh to re-run, doing so may overload environment.
  --config=file  Customize env vars using a setvars.sh configuration file.
  --help         Display this help message and exit.
  ...            Additional args are passed to individual env/vars.sh scripts
                 and should follow this script's arguments.

  Some POSIX shells do not accept command-line options. In that case, you can pass
  command-line options via the SETVARS_ARGS environment variable. For example:

  $ SETVARS_ARGS="ia32 --config=config.txt" ; export SETVARS_ARGS
  $ . path/to/setvars.sh

  The SETVARS_ARGS environment variable is cleared on exiting setvars.sh.

With --force :

(ai_tr) axel@Artishima:~/ai_tr/cod$ source ${ONEAPI_ROOT}/setvars.sh --force

:: initializing oneAPI environment ...
   -bash: BASH_VERSION = 5.1.16(1)-release
   args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

But this error keep appearing whenether I try to run my training python file:
xpu /home/axel/ai_tr/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/lazy_init.py:73: UserWarning: DPCPP Device count is zero! (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:120.) _C._initExtension() /home/axel/ai_tr/lib/python3.10/site-packages/torch/nn/modules/module.py:985: UserWarning: dpcppSetDevice: device_id is out of range (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:159.) return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) Traceback (most recent call last): File "/home/axel/ai_tr/cod/train.py", line 190, in <module> m = model.to(device) File "/home/axel/ai_tr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in to return self._apply(convert) File "/home/axel/ai_tr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/home/axel/ai_tr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 662, in _apply param_applied = fn(param) File "/home/axel/ai_tr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 985, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: Number of dpcpp devices should be greater than zero!

Everything related to mkl is installed correctl and path are set correctly and working, I am on ubuntu 22.04 using torch 13.1 on WSL 2 on windows 11 with intel drivers installed on windows 11 on Arc 770 with i9 13900K.

The error is trigerred here :

#the line below is triggering the error
m = model.to(device)
m = ipex.optimize(m)
# print the number of parameters in the model
print(sum(p.numel() for p in m.parameters())/1e6, 'M parameters')

# create a PyTorch optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)```

Also the name optimize is a weird naming.

It does seems to be an out of range issue, I have no idea how to solve this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ARCARC GPUCrashExecution crashes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions