Open
Description
Context for the issue:
Currently, cumulative sum is implemented as a wrapper for the corresponding Numpy function.
When testing with vectors, instead of using
pt.cumsum(x)
using
pt.dot(pt.tril(pt.ones((d,d))), x)
where d
is the length of vector x
, seems to lead to considerably faster sampling performance.
See this gist for a quick demo.
Proposal:
Keep the API un changed, but change the internals to compute the dot product with the lower diagonal matrix for the dimension specified in the axis
argument of cumsum
.