Implement tensor.jacobian
tensor.grad implements the gradient of a scalar cost wrt some variables. Some times, we would like to compute the 'gradient' (in fact, the Jacobian matrix) of a vector wrt another one (or several other ones).
The behaviour needs to be defined when the tensors in question are not vectors, but have a higher dimensionality.
On Thu, Nov 11, 2010, Olivier Delalleau wrote:
> After thinking a bit about it, the only solution I could come up with that I
> actually like is to systematically flatten both parameters of grad, to end
> up with (at most) a 2D tensor. This would yield
> grad(a, c) == grad(a, b) . grad(b, c)
> i.e. the chain rule would be respected.
> Unfortunately this would break the current implementation of grad, so if we
> do this it should probably be named differently (I was thinking of
> 'tensor_grad', but 'tensor.tensor_grad' looks ugly).
> Another option (maybe simpler to implement) is to say that the shape of
> grad(a, b) is a.shape + b.shape, which would be consistent with the current
> implementation, but it wouldn't be consistent with the chain rule (at least
> following Numpy's implementation of dot).
> I think even if we only want to implement a Hessian function right now, we
> should decide what of the above 2 options (or maybe a different 3rd one) we
> will want to pursue in the future for a more generic grad implementation.
> Otherwise we may end up with a Hessian inconsistent with applying grad
> twice.
See also* 544.
The behaviour needs to be defined when the tensors in question are not vectors, but have a higher dimensionality.
On Thu, Nov 11, 2010, Olivier Delalleau wrote:
> After thinking a bit about it, the only solution I could come up with that I
> actually like is to systematically flatten both parameters of grad, to end
> up with (at most) a 2D tensor. This would yield
> grad(a, c) == grad(a, b) . grad(b, c)
> i.e. the chain rule would be respected.
> Unfortunately this would break the current implementation of grad, so if we
> do this it should probably be named differently (I was thinking of
> 'tensor_grad', but 'tensor.tensor_grad' looks ugly).
> Another option (maybe simpler to implement) is to say that the shape of
> grad(a, b) is a.shape + b.shape, which would be consistent with the current
> implementation, but it wouldn't be consistent with the chain rule (at least
> following Numpy's implementation of dot).
> I think even if we only want to implement a Hessian function right now, we
> should decide what of the above 2 options (or maybe a different 3rd one) we
> will want to pursue in the future for a more generic grad implementation.
> Otherwise we may end up with a Hessian inconsistent with applying grad
> twice.
See also* 544.
Leave a comment