Description
What happened?
SYCL version crashed since b3805 with this output:
llama_kv_cache_init: SYCL0 KV buffer size = 2688.00 MiB
llama_new_context_with_model: KV self size = 2688.00 MiB, K (f16): 1344.00 MiB, V (f16): 1344.00 MiB
llama_new_context_with_model: SYCL_Host output buffer size = 0.98 MiB
llama_new_context_with_model: SYCL0 compute buffer size = 507.00 MiB
llama_new_context_with_model: SYCL_Host compute buffer size = 39.01 MiB
llama_new_context_with_model: graph nodes = 1690
llama_new_context_with_model: graph splits = 2
llama_init_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
MKL Warning: Incompatible OpenCL driver version. GPU performance may be reduced.
Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)
Exception caught at file:D:/a/llama.cpp/llama.cpp/ggml/src/ggml-sycl.cpp, line:3438, func:operator()
SYCL error: CHECK_TRY_ERROR(dpct::gemm_batch( *main_stream, oneapi::mkl::transpose::trans, oneapi::mkl::transpose::nontrans, ne01, ne11, ne10, alpha, (const void **)(ptrs_src.get() + 0 * ne23), dpct::library_data_t::real_half, nb01 / nb00, (const void **)(ptrs_src.get() + 1 * ne23), dpct::library_data_t::real_half, nb11 / nb10, beta, (void **)(ptrs_dst.get() + 0 * ne23), cu_data_type, ne01, ne23, cu_compute_type)): Meet error in this line code!
in function ggml_sycl_mul_mat_batched_sycl at D:/a/llama.cpp/llama.cpp/ggml/src/ggml-sycl.cpp:3438
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\common.hpp:107: SYCL error
Name and Version
version: 3808 (1e7b929)
built with MSVC 19.41.34120.0 for
What operating system are you seeing the problem on?
No response
Relevant log output
No response