Skip to content

REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

densmirn
Copy link
Contributor

Google tests with usage of iterator in SYCL kernel are failed on GPU. Test sycl_reduce_axis is failed as well as sycl_get_first.
The issue is reproduces without broadcasting #655.
The issue is under investigation.

$ ./dpnpc_tests --gtest_filter=TestUtilsIterator.sycl_get_first
Available SYCL devices:
 - id=4466, type=8, gws=67108864, cu=12, name=Intel(R) FPGA Emulation Device
 - id=32902, type=2, gws=8192, cu=12, name=Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=18, gws=1, cu=1, name=SYCL host device
Running on: Intel(R) Graphics [0x9bca]
queue initialization time: 0.000174863 (sec.)
SYCL kernels link time: 0.115128 (sec.)

Note: Google Test filter = TestUtilsIterator.sycl_get_first
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestUtilsIterator
[ RUN      ] TestUtilsIterator.sycl_get_first
unknown file: Failure
C++ exception with description "Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)" thrown in the test body.
[  FAILED  ] TestUtilsIterator.sycl_get_first (251 ms)
[----------] 1 test from TestUtilsIterator (251 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (251 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestUtilsIterator.sycl_get_first

 1 FAILED TEST

@densmirn densmirn added the bug Something isn't working label Mar 25, 2021
@densmirn densmirn requested a review from shssf March 25, 2021 14:57
@densmirn
Copy link
Contributor Author

The goal of the PR is to show the issue and maybe get some recommendations on how to get it fixed.

@densmirn densmirn changed the title Google tests with usage iterator in SYCL are failed on GPU. REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. Mar 25, 2021
@shssf
Copy link
Contributor

shssf commented Mar 25, 2021

👍

@densmirn
Copy link
Contributor Author

Prepared more simpler reproducer that works on CPU but fails on GPU.

$ dpcpp test_sycl.cpp -o test_sycl
$ ./test_sycl
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted (core dumped)
#include <iostream>
#include <CL/sycl.hpp>

class MyClass
{
public:

    int get_idx(int idx)
    {
        return idx;
    }
};

int main() {
    /*
    There is no below issue on CPU with sycl::cpu_selector

    terminate called after throwing an instance of 'cl::sycl::runtime_error'
      what():  Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
    Aborted (core dumped)
    */

    // sycl::cpu_selector device_selector;
    sycl::gpu_selector device_selector;
    sycl::queue myQueue { device_selector };

    int *data = sycl::malloc_shared<int>(1024, myQueue);

    MyClass *myObj = new MyClass();

    myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
        data[idx] = myObj->get_idx(idx);
    });

    myQueue.wait();

    for (int i = 0; i < 1024; i++)
        std::cout << "data[" << i << "] = " << data[i] << std::endl;

    sycl::free(data, myQueue);

    return 0;
}

@samir-nasibli
Copy link

samir-nasibli commented Mar 26, 2021

MyClass *myObj = new MyClass();

try to alloc memory for myObj by sycl::malloc_shared.

@shssf
Copy link
Contributor

shssf commented Mar 26, 2021

Is it ready to merge?

@Alexander-Makaryev
Copy link
Contributor

@densmirn @shssf
This is well known behavior.
When you deal with any pointers, kernel must operate with data that is associated with the same device(or queue or ... I am not sure, for now we worked with single queue/device/etc).
In the case of CPU device and host memory are related to the same device.
It can be reproduced with very simple example

    int *input = new int[size];
    int *data = sycl::malloc_shared<int>(size, myQueue);
    myQueue.parallel_for(size, [=](sycl::id<1> idx) {
        data[idx] = input[idx];
    });

If we allocate input as int *input = sycl::malloc_shared<int>(size, myQueue); it will work fine.
The same story with class, for example something like MyClass *myObj = sycl::malloc_shared<MyClass>(1, myQueue); will fix this case.

@densmirn
Copy link
Contributor Author

Below example works on both CPU and GPU, but patched test sycl_get_first still fails. It's under investigation.

#include <iostream>
#include <CL/sycl.hpp>

template <typename T>
class MyClass
{
public:
    MyClass() : base_value(T(0)) {}
    MyClass(T value) : base_value(value) {}

    T get_value(T value)
    {
        return base_value + value;
    }

private:
    T base_value;
};

int main() {
    // sycl::cpu_selector device_selector;
    sycl::gpu_selector device_selector;
    sycl::queue myQueue { device_selector };

    int *data = sycl::malloc_shared<int>(1024, myQueue);

    MyClass<int> *myObj = sycl::malloc_shared<MyClass<int>>(1, myQueue);
    new (myObj) MyClass(10);

    myQueue.parallel_for(1024, [=](sycl::id<1> idx) {
        data[idx] = myObj->get_value(idx);
    });

    myQueue.wait();

    for (int i = 0; i < 1024; i++)
        std::cout << "data[" << i << "] = " << data[i] << std::endl;

    myObj->~MyClass();
    sycl::free(data, myQueue);

    return 0;
}

@densmirn
Copy link
Contributor Author

Current status is the tests aren't aborted on GPU, but failed on both CPU and GPU that is under investigation.

@densmirn densmirn changed the title REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. REPRODUCER: Google tests with usage iterator in SYCL are failed. Mar 30, 2021
@densmirn densmirn changed the title REPRODUCER: Google tests with usage iterator in SYCL are failed. REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. Mar 30, 2021
@densmirn
Copy link
Contributor Author

Now for test TestUtilsIterator/IteratorReduction.sycl_reduce_axis/1, where
GetParam() = IteratorParameters(input_shape={2, 3, 4}, axis={1}, result={15, 18, 21, 24, 51, 54, 57, 60})
we have the following results:
on CPU: result={15, 18, 21, 24, 51, 54, 57, 60} - EXPECTED
on GPU: result={15, 18, 21, 24, 27, 30, 33, 36} - UNEXPECTED

@densmirn
Copy link
Contributor Author

densmirn commented Mar 30, 2021

2 tests are still failed on GPU
TestUtilsIterator/IteratorReduction.sycl_reduce_axis/24
, where GetParam() = IteratorParameters(input_shape={2, 3, 0, 5}, axis={1, 3}, result={})
and
TestUtilsIterator/IteratorReduction.sycl_reduce_axis/27
, where GetParam() = IteratorParameters(input_shape={0}, axis={}, result={})

@@ -351,21 +351,29 @@ class DPNPC_id final
free_iteration_memory();
free_output_memory();

axes = get_validated_axes(__axes, input_shape_size);
std::vector<size_type> valid_axes = get_validated_axes(__axes, input_shape_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you create a copy of array here and class member axis remains uninitialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class member axis is initialized in below for-loop.

@densmirn densmirn closed this Sep 20, 2021
@densmirn densmirn deleted the fix_iterator_gpu branch September 20, 2021 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants