Open
Description
Currently I don't see a way to implement einsum or batched matmul using the standard. There is a way with explicit loop, but that does not count.
Some ways this can be addressed:
- minimal implementation would include batched_matmul (signature
bij, bjk -> bik
). Other operations can be reduced to it with reshapes. This may include additional copies - modification of tensordot to include batched dimensions thus should be more efficient, but AFAIK this method is not surfaced in any major framework
- include einsum in the standard