Dot and blas slowed by negative strides

### Context for the issue:

Currently, cumulative sum is implemented as a wrapper for the corresponding Numpy function.
When testing with vectors, instead of using 
```python
pt.cumsum(x)
```
using
```python
pt.dot(pt.tril(pt.ones((d,d))), x)
```
where `d` is the length of vector `x`, seems to lead to considerably faster sampling performance.

See [this gist](https://gist.github.com/TeemuSailynoja/9ab551ab1ec9f95bedcaa1052a216b3d) for a quick demo.

### Proposal:
Keep the API un changed, but change the internals to compute the dot product with the lower diagonal matrix for the dimension specified in the `axis` argument of `cumsum`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dot and blas slowed by negative strides #1388

Context for the issue:

Proposal:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dot and blas slowed by negative strides #1388

Description

Context for the issue:

Proposal:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions