Skip to content

MULTIPLY enable broadcasting #655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 6, 2021
Merged

Conversation

densmirn
Copy link
Contributor

This PR shows current status of broadcasting implementation through USM iterator.

Currently issue below is under investigation:

DPNP_QUEUE_GPU=1 python -c "import dpnp; a = dpnp.array([7]); b = dpnp.array([7]); c = dpnp.multiply(a, b); print(c)"
Available SYCL devices:
 - id=4466, type=8, gws=67108864, cu=12, name=Intel(R) FPGA Emulation Device
 - id=32902, type=2, gws=8192, cu=12, name=Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=4, gws=256, cu=24, name=Intel(R) Graphics [0x9bca]
 - id=32902, type=18, gws=1, cu=1, name=SYCL host device
Running on: Intel(R) Graphics [0x9bca]
queue initialization time: 0.000503686 (sec.)
SYCL kernels link time: 0.112728 (sec.)

terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)
Aborted (core dumped)

@densmirn densmirn added the in progress Please do not merge. Work is in progress. label Mar 19, 2021
@densmirn densmirn requested a review from shssf March 19, 2021 15:58
_DataType_output* result = reinterpret_cast<_DataType_output*>(result_out); \
\
std::vector<size_t> result_shape = get_common_shape(input1_shape, input1_shape_ndim, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "get_result_shape" is better for this function name? not sure...

input2_shape, input2_shape_ndim); \
\
DPNPC_id<_DataType_input1> input1(input1_data, input1_shape, input1_shape_ndim); \
input1.broadcast(result_shape); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"broadcast_to_shape" func name?

@shssf
Copy link
Contributor

shssf commented Mar 19, 2021

gtests needs to be implemented to test "broadcast" feature in iterator (in separate file. not at the same place as reduce iterator).
Perhaps, it is better to start with gtests and later use "broadcast iterator" in real algorithms

@densmirn densmirn removed the in progress Please do not merge. Work is in progress. label Mar 25, 2021
@densmirn densmirn changed the title MULTIPLY enable broadcasting MULTIPLY enable broadcasting on CPU Mar 25, 2021
@densmirn densmirn requested a review from shssf March 26, 2021 12:23

#include "dpnp_iterator.hpp"

#define DPNP_LOCAL_QUEUE 1 // TODO need to fix build procedure and remove this workaround. Issue #551
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samir-nasibli it looks like it requires your attention

using dpnpc_index_t = dpnpc_it_t::size_type;

template <typename _DataType>
vector<_DataType> get_input_data(const vector<dpnpc_index_t>& shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code duplication. It looks like it is better to move this function into some common place for test suite.

}
}

TEST(TestBroadcastIterator, take_value_broadcast_loop_3D)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why these tests are not with IteratorParameters variadic tests

struct IteratorParameters
{
vector<dpnpc_it_t::size_type> input_shape;
vector<dpnpc_it_t::size_type> output_shape;
Copy link
Contributor

@shssf shssf Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it defined or standardized somewhere? I mean, output_shape as an input.
I see, at least, two approaches:

  1. we keep input internally: input_shape and output_shape. Also we internally calculate step distances and other things to do iteration.
  2. we keep input internally: input_shape and broatcast_axis. Also we internally calculate output_shape, step distances and other things to do iteration.

"N 2" will match existed interface code (to function set_axes) and, perhaps, required some extra parameter in ctor() like:
DPNPC_id<dpnpc_value_t, BROADCAST> result_obj(input_data.data(), {3, 4});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what complexity you are ready to implement and NumPy is required.
as far as I understand, we have two input shapes ("several" in general). Let's take:

shape1={2, 3, 4}
shape2={3, 4}

it looks like we can broadcast shape2 into shape1 by two steps

1. [3, 4] => [1, 3, 4]
2. [1, 3, 4] => [2, 3, 4]

Also, as I understand, shapes [3, 4] and [3, 1, 4, 1] have the same layout in memory.
I think everything depends on input parameters. If we have shapes as input - it is ok. I think having axis to broadcast is more flexible and scalable solution.

Copy link
Contributor

@shssf shssf Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From another perspective, it is NOT good idea.
Maybe better leave it as is but make function names more clear.

INSTANTIATE_TEST_SUITE_P(
TestBroadcastIterator,
IteratorBroadcasting,
testing::Values(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Do you think it is good to add some combinations like "{1, 0}. {0}"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add something like this IteratorParameters{{2, 0, 1}, {2, 0, 4}, {}})); as test parameter, but in this case we have empty result. So SYCL kernel doesn't make sense here, because we iterate by result size that is equal to 0 in this case. In real case we should return from the function before submitting such kernel to the queue. Moreover we can see below issue on GPU when submitting such kernel to the queue
C++ exception with description "Native API failed. Native API returns: -30 (CL_INVALID_VALUE) -30 (CL_INVALID_VALUE)" thrown in the test body.
That is why 2 tests on reduction fail #661 (comment).

@densmirn densmirn changed the title MULTIPLY enable broadcasting on CPU MULTIPLY enable broadcasting Apr 1, 2021
@densmirn densmirn requested a review from shssf April 1, 2021 14:25
Comment on lines +421 to +422
input1_it->~DPNPC_id(); \
input2_it->~DPNPC_id(); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed, we are responsible for destructing placed objects.

Copy link

@samir-nasibli samir-nasibli Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are obvious things that you are saying, but a little bit not about that.

The comment was not about freeing resources in general, it is about that we should avoid such explicitly call the destructor.
I see that current iterator implementation has flaws, updates for to the iterator interface are required for this.
In anyway this is not the current PR problem.

}

input_it->~DPNPC_id();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to explicitly call the destructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for placement new.

@shssf shssf merged commit 21dc351 into IntelPython:master Apr 6, 2021
@densmirn densmirn deleted the feature/mul branch September 20, 2021 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants