Description
Right now we have grad
, L_op
, R_op
.
Deprecate grad
in favor of L_op
:
grad
is exactly the same as L_op
except it doesn't have access to the outputs of the node that is being differentiated.
Lines 366 to 393 in 24b67a8
L_op
allows one to reuse the same output when it's needed in the gradient, which means there is one less node to be merged during compilation. This is mostly relevant for nodes that are costly to merge such as Scan (see 0f5a06d).
It also saves time spent on make_node
(e.g., inferring static type shapes). In the Scalar Ops it's used everywhere to quickly check if the output types are discrete (see fd628c5). There are some opportunities still missing, for example, the gradient of Exp
:
pytensor/pytensor/scalar/basic.py
Lines 3096 to 3107 in 24b67a8
Could instead return (gz * outputs[0],)
More importantly for this issue, I think we should deprecate grad
completely, since everything can be equally well done with L_op
.
Rename L_op
and R_op
?
The names are pretty non-intuitive, and I don't think they are used in any other auto-diff libraries. The equivalents in JAX are vjp
and jvp
(you can find direct translation in https://www.pymc-labs.io/blog-posts/jax-functions-in-pymc-3-quick-examples/)
Other suggestions were discussed some time ago by Theano devs here: https://groups.google.com/g/theano-dev/c/8-z2C59rmQk/m/gm432ifVAg0J?pli=1
Remove R_op
in favor of double application of L_op
(or make it a default fallback)
There was some fanfare sometime ago about R_op
being completely redundant in a framework with dead code elimination: Theano/Theano#6035
That thread suggests also the double L_op
may generate more efficient graphs in some cases (because most of our rewrites target the type of graphs generated by L_op
?)
It probably makes sense to retain the R_op
for cases where we/users know that's the best approach but perhaps default/revert to double L_op
otherwise. Stale PRs that never quite got into Theano: