MULTIPLY enable broadcasting #655

densmirn · 2021-03-19T15:58:09Z

This PR shows current status of broadcasting implementation through USM iterator.

Currently issue below is under investigation:

DPNP_QUEUE_GPU=1 python -c "import dpnp; a = dpnp.array([7]); b = dpnp.array([7]); c = dpnp.multiply(a, b); print(c)"
Available SYCL devices:
 - id=4466, type=8, gws=67108864, cu=12, name=Intel(R) FPGA Emulation Device
 - id=32902, type=2, gws=8192, cu=12, name=Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=18, gws=1, cu=1, name=SYCL host device
Running on: Intel(R) Graphics [0x9bca]
queue initialization time: 0.000503686 (sec.)
SYCL kernels link time: 0.112728 (sec.)

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted (core dumped)

shssf · 2021-03-19T17:56:45Z

dpnp/backend/kernels/dpnp_krnl_elemwise.cpp

        _DataType_output* result = reinterpret_cast<_DataType_output*>(result_out);                                    \
                                                                                                                       \
+        std::vector<size_t> result_shape = get_common_shape(input1_shape, input1_shape_ndim,                           \


maybe "get_result_shape" is better for this function name? not sure...

shssf · 2021-03-19T17:57:37Z

dpnp/backend/kernels/dpnp_krnl_elemwise.cpp

+                                                            input2_shape, input2_shape_ndim);                          \
+                                                                                                                       \
+        DPNPC_id<_DataType_input1> input1(input1_data, input1_shape, input1_shape_ndim);                               \
+        input1.broadcast(result_shape);                                                                                \


"broadcast_to_shape" func name?

shssf · 2021-03-19T18:00:57Z

gtests needs to be implemented to test "broadcast" feature in iterator (in separate file. not at the same place as reduce iterator).
Perhaps, it is better to start with gtests and later use "broadcast iterator" in real algorithms

…ture/mul

shssf · 2021-03-26T22:00:56Z

dpnp/backend/tests/test_broadcast_iterator.cpp

+
+#include "dpnp_iterator.hpp"
+
+#define DPNP_LOCAL_QUEUE 1 // TODO need to fix build procedure and remove this workaround. Issue #551


@samir-nasibli it looks like it requires your attention

shssf · 2021-03-26T22:02:14Z

dpnp/backend/tests/test_broadcast_iterator.cpp

+using dpnpc_index_t = dpnpc_it_t::size_type;
+
+template <typename _DataType>
+vector<_DataType> get_input_data(const vector<dpnpc_index_t>& shape)


code duplication. It looks like it is better to move this function into some common place for test suite.

shssf · 2021-03-26T22:04:16Z

dpnp/backend/tests/test_broadcast_iterator.cpp

+    }
+}
+
+TEST(TestBroadcastIterator, take_value_broadcast_loop_3D)


It is not clear why these tests are not with IteratorParameters variadic tests

shssf · 2021-03-26T22:10:00Z

dpnp/backend/tests/test_broadcast_iterator.cpp

+struct IteratorParameters
+{
+    vector<dpnpc_it_t::size_type> input_shape;
+    vector<dpnpc_it_t::size_type> output_shape;


Is it defined or standardized somewhere? I mean, output_shape as an input.
I see, at least, two approaches:

we keep input internally: input_shape and output_shape. Also we internally calculate step distances and other things to do iteration.

we keep input internally: input_shape and broatcast_axis. Also we internally calculate output_shape, step distances and other things to do iteration.

"N 2" will match existed interface code (to function set_axes) and, perhaps, required some extra parameter in ctor() like:
DPNPC_id<dpnpc_value_t, BROADCAST> result_obj(input_data.data(), {3, 4});

I don't know what complexity you are ready to implement and NumPy is required.
as far as I understand, we have two input shapes ("several" in general). Let's take:

shape1={2, 3, 4} shape2={3, 4}

it looks like we can broadcast shape2 into shape1 by two steps

1. [3, 4] => [1, 3, 4] 2. [1, 3, 4] => [2, 3, 4]

Also, as I understand, shapes [3, 4] and [3, 1, 4, 1] have the same layout in memory.
I think everything depends on input parameters. If we have shapes as input - it is ok. I think having axis to broadcast is more flexible and scalable solution.

From another perspective, it is NOT good idea.
Maybe better leave it as is but make function names more clear.

shssf · 2021-03-26T22:13:54Z

dpnp/backend/tests/test_broadcast_iterator.cpp

+INSTANTIATE_TEST_SUITE_P(
+    TestBroadcastIterator,
+    IteratorBroadcasting,
+    testing::Values(


👍
Do you think it is good to add some combinations like "{1, 0}. {0}"?

I wanted to add something like this IteratorParameters{{2, 0, 1}, {2, 0, 4}, {}})); as test parameter, but in this case we have empty result. So SYCL kernel doesn't make sense here, because we iterate by result size that is equal to 0 in this case. In real case we should return from the function before submitting such kernel to the queue. Moreover we can see below issue on GPU when submitting such kernel to the queue
C++ exception with description "Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)" thrown in the test body.
That is why 2 tests on reduction fail #661 (comment).

…ture/mul

samir-nasibli · 2021-04-01T14:21:44Z

dpnp/backend/kernels/dpnp_krnl_elemwise.cpp

+        input1_it->~DPNPC_id();                                                                                        \
+        input2_it->~DPNPC_id();                                                                                        \


No needed here.

Needed, we are responsible for destructing placed objects.

These are obvious things that you are saying, but a little bit not about that.

The comment was not about freeing resources in general, it is about that we should avoid such explicitly call the destructor.
I see that current iterator implementation has flaws, updates for to the iterator interface are required for this.
In anyway this is not the current PR problem.

samir-nasibli · 2021-04-01T14:25:23Z

dpnp/backend/tests/test_broadcast_iterator.cpp

    }
+
+    input_it->~DPNPC_id();


no need to explicitly call the destructor.

Needed for placement new.

MULTIPLY enable broadcasting

6887fb3

densmirn added the in progress Please do not merge. Work is in progress. label Mar 19, 2021

densmirn requested a review from shssf March 19, 2021 15:58

shssf reviewed Mar 19, 2021

View reviewed changes

densmirn added 6 commits March 24, 2021 11:10

Add gtests for broadcast iterator

b66a4e8

Expand memory free before broadcasting

052a307

Disable broadcasting in element-wise operations

6f6f9ed

Change docs for 2 DPNPC_id methods

bc0c1f7

Merge branch 'master' of https://github.com/IntelPython/dpnp into fea…

257700d

…ture/mul

Change memory free call before broadcasting

7f2e8e2

densmirn removed the in progress Please do not merge. Work is in progress. label Mar 25, 2021

densmirn changed the title ~~MULTIPLY enable broadcasting~~ MULTIPLY enable broadcasting on CPU Mar 25, 2021

densmirn mentioned this pull request Mar 25, 2021

REPRODUCER: Google tests with usage iterator in SYCL are failed on GPU. #661

Closed

densmirn requested a review from shssf March 26, 2021 12:23

shssf reviewed Mar 26, 2021

View reviewed changes

densmirn added 3 commits April 1, 2021 17:13

Enable gtests on broadcasting on GPU

0731675

Enable broadcasting in element-wise operations

da181e2

Merge branch 'master' of https://github.com/IntelPython/dpnp into fea…

3b30fb6

…ture/mul

densmirn changed the title ~~MULTIPLY enable broadcasting on CPU~~ MULTIPLY enable broadcasting Apr 1, 2021

densmirn requested a review from shssf April 1, 2021 14:25

samir-nasibli reviewed Apr 1, 2021

View reviewed changes

shssf approved these changes Apr 6, 2021

View reviewed changes

shssf merged commit 21dc351 into IntelPython:master Apr 6, 2021

densmirn deleted the feature/mul branch September 20, 2021 11:35


		#include "dpnp_iterator.hpp"

		#define DPNP_LOCAL_QUEUE 1 // TODO need to fix build procedure and remove this workaround. Issue #551

MULTIPLY enable broadcasting #655

MULTIPLY enable broadcasting #655

Uh oh!

Conversation

densmirn commented Mar 19, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shssf commented Mar 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shssf Mar 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shssf Mar 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samir-nasibli Apr 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shssf commented Mar 19, 2021 •

edited

Loading

shssf Mar 26, 2021 •

edited

Loading

shssf Mar 29, 2021 •

edited

Loading

samir-nasibli Apr 2, 2021 •

edited

Loading