-
Notifications
You must be signed in to change notification settings - Fork 22
REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The goal of the PR is to show the issue and maybe get some recommendations on how to get it fixed. |
👍 |
Prepared more simpler reproducer that works on CPU but fails on GPU.
#include <iostream>
#include <CL/sycl.hpp>
class MyClass
{
public:
int get_idx(int idx)
{
return idx;
}
};
int main() {
/*
There is no below issue on CPU with sycl::cpu_selector
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted (core dumped)
*/
// sycl::cpu_selector device_selector;
sycl::gpu_selector device_selector;
sycl::queue myQueue { device_selector };
int *data = sycl::malloc_shared<int>(1024, myQueue);
MyClass *myObj = new MyClass();
myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
data[idx] = myObj->get_idx(idx);
});
myQueue.wait();
for (int i = 0; i < 1024; i++)
std::cout << "data[" << i << "] = " << data[i] << std::endl;
sycl::free(data, myQueue);
return 0;
}
|
try to alloc memory for |
Is it ready to merge? |
@densmirn @shssf
If we allocate |
Below example works on both CPU and GPU, but patched test #include <iostream>
#include <CL/sycl.hpp>
template <typename T>
class MyClass
{
public:
MyClass() : base_value(T(0)) {}
MyClass(T value) : base_value(value) {}
T get_value(T value)
{
return base_value + value;
}
private:
T base_value;
};
int main() {
// sycl::cpu_selector device_selector;
sycl::gpu_selector device_selector;
sycl::queue myQueue { device_selector };
int *data = sycl::malloc_shared<int>(1024, myQueue);
MyClass<int> *myObj = sycl::malloc_shared<MyClass<int>>(1, myQueue);
new (myObj) MyClass(10);
myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
data[idx] = myObj->get_value(idx);
});
myQueue.wait();
for (int i = 0; i < 1024; i++)
std::cout << "data[" << i << "] = " << data[i] << std::endl;
myObj->~MyClass();
sycl::free(data, myQueue);
return 0;
} |
Current status is the tests aren't aborted on GPU, but failed on both CPU and GPU that is under investigation. |
Now for test |
2 tests are still failed on GPU |
@@ -351,21 +351,29 @@ class DPNPC_id final | |||
free_iteration_memory(); | |||
free_output_memory(); | |||
|
|||
axes = get_validated_axes(__axes, input_shape_size); | |||
std::vector<size_type> valid_axes = get_validated_axes(__axes, input_shape_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you create a copy of array here and class member axis
remains uninitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class member axis
is initialized in below for-loop.
Google tests with usage of iterator in SYCL kernel are failed on GPU. Test
sycl_reduce_axis
is failed as well assycl_get_first
.The issue is reproduces without broadcasting #655.
The issue is under investigation.