测试版本:
torch1.7.1 + CPU

python 3.6

一、自动求导过程

本部分整理自知乎《Pytorch的Autograd》

1、tensor的结构

import torch as t
a = t.tensor(2.0,requires_grad=True)
print(a)
b = a.exp()
print(b)
tensor(2., requires_grad=True)
tensor(7.3891, grad_fn=<ExpBackward>)

参数解释:requires_grad=True: 运算时保留梯度信息。也就可以用于反向传播。
为节省运算资源,默认requires_grad=False

grad_fn保留了前向传播的运算。这里是“exp”运算。

c = t.tensor(2.0)
print(c)
print(c.exp())
tensor(2.)
tensor(7.3891)

创建的c无法进行反向传播

综上:当requires_grad=True时,tensor中包含:

  • grad 梯度。
  • grad_fn 运算类别
  • data 数据

2、反向传播试验

试验所用表达式:

l1 = input × w1

l2 = l1 × w2

l3 = l1 × w3

l4 = l2 × l2

loss = mean(14)

例

input_data = t.ones([2,2],requires_grad=False)
w1 = t.tensor(2.0,requires_grad=True)
w2 = t.tensor(3.0,requires_grad=True)
w3 = t.tensor(4.0,requires_grad=True)
l1 = input_data * w1
l2 = l1 + w2
l3 = l1 * w3
l4 = l2 * l3
loss = l4.mean()

w1、w2、w3是用户自定义的,所以并没有grad_fn
w1、w2、w3也被称为叶子张量。(在计算图中也处在叶子的位置)

print(w1.is_leaf,w2.is_leaf,w3.is_leaf)
print(w1.grad_fn,w2.grad_fn,w3.grad_fn)
print(w1.grad, w2.grad, w3.grad)
True True True
None None None
None None None
print(l1.is_leaf,l2.is_leaf,l3.is_leaf,l4.is_leaf)
print(l1.grad_fn,l2.grad_fn,l3.grad_fn,l4.is_leaf)
print(l1.grad, l2.grad, l3.grad,l4.grad)
False False False False
<MulBackward0 object at 0x000001C575845108> <AddBackward0 object at 0x000001C575843888> <MulBackward0 object at 0x000001C575843E48> False
None None None None


c:\users\user\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  This is separate from the ipykernel package so we can avoid doing imports until
loss.backward()
print(w1.grad, w2.grad, w3.grad)
print(l1.grad, l2.grad, l3.grad,l4.grad)
tensor(28.) tensor(8.) tensor(10.)
None None None None


c:\users\user\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:3: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  This is separate from the ipykernel package so we can avoid doing imports until

注:

很奇怪,l1、l2、l3、l4的梯度是None。
原因在于:在神经网络中,非叶结点的梯度没有必要保留下来。在pytorch中,为节省内存,默认不保留非叶结点的梯度


如果需要使用非叶结点的梯度信息:

使用tensor.retain_grad()可以保留梯度信息。

使用tensor.register_hook()可以在反向传播时查看梯度信息,帮助debug。

#loss = l4.mean()
#l1.register_hook(lambda grad: print('l1 grad: ', grad))
# l4.register_hook(lambda grad: print('l4 grad: ', grad))
# loss.register_hook(lambda grad: print('loss grad: ', grad))
# loss.backward(retain_graph=True)

这个报错我真的搞不懂了。。。

排查了半天放弃了。

换个重来

x = t.ones(3,requires_grad = True)
y = t.ones(3,requires_grad = True)
w = t.rand(3,requires_grad = True)
y = x * w
def variable_hook(grad):
    print("y的梯度:\n",grad)
hook_handle = y.register_hook(variable_hook)
z = y.sum()
z.backward()
hook_handle.remove()# 移除hook
y的梯度:
 tensor([1., 1., 1.])

3、inplace操作

即:直接对原始数据进行修改

inplace操作极易引起bug,所以pytorch中,涉及求导的变量不支持大多数inplace操作RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: balabala…
pytorch通过引入tensor._version判断tensor是否进行过inplace操作。

1)非叶结点

a = t.tensor([1.0, 3.0], requires_grad=True)
b = a + 2
print(b._version)
0
b[0] = 100
print(b._version)
1
loss = (b * b).mean()
loss.backward()
print(b._version)
1

前后version一致没有报错。
而下面的会报错

# a = t.tensor([1.0, 3.0], requires_grad=True)
# b = a + 2
# print(b._version) # 0
# loss = (b * b).mean()
# b[0] = 1000.0
# print(b._version) # 1
# loss.backward()

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]], which is output 0 of AddBackward0, is at version 1; expected version 0 instead.

2) 叶结点

对于叶结点来说,甚至在使用前也不要修改它的值

a = t.tensor([10., 5., 2., 3.], requires_grad=True)
print(a, a.is_leaf)

a[:] = 0
print(a, a.is_leaf)

loss = (a*a).mean()
#loss.backward()
tensor([10.,  5.,  2.,  3.], requires_grad=True) True
tensor([0., 0., 0., 0.], grad_fn=<CopySlices>) False

RuntimeError: leaf variable has been moved into the graph interior
报错信息是:叶结点已经变成非叶结点了。非叶结点不会保留梯度信息,会引起错误

二、拓展autograd

参考自官方参考文档与《深度学习框架pytorch: 入门与实践》

如果要自己定义激活函数,如何实现自动求导呢?
答案是写一个function

  • 1、需要继承autograd.Function
  • 2、实现它的forward和backward方法,其他方法不需要写。
  • 3、所有方法的第一个参数是ctx,他类似于self/cls等保留字,应用时可以忽略

    下面的实例是新建一个仿射层
import torch as t
class LinearFunction(t.autograd.Function):

    # 注意forward和backward函数都是静态方法 (@staticmethods)
    @staticmethod
    # bias(偏移量)是一个缺省参数。参数的设定是任意的。
    def forward(ctx, input, weight, bias=None):
        ctx.save_for_backward(input, weight, bias) # 保存参数以供backward使用
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # 仿射层仅有一个输出,所以也就只有一个梯度
    @staticmethod
    def backward(ctx, grad_output):
        input, weight, bias = ctx.saved_tensors # 获取到forward时传入的参数
        grad_input = grad_weight = grad_bias = None

    #needs_input_grad 是一个布尔型元组,用于标记变量是否需要求梯度。
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0)

        return grad_input, grad_weight, grad_bias

使用时直接使用下面的代码:

linear = LinearFunction.apply
input_data = t.tensor([[2.0]],requires_grad=True)
#错误版本1、:input_data = t.Tensor([[2.0]],requires_grad=True)
#错误版本2、: input_data = t.tensor(2.0,requires_grad=True)

w = t.tensor([[3.0]], requires_grad=True)
loss = linear(input_data,w)
loss
tensor([[6.]], grad_fn=<LinearFunctionBackward>)
print(w.grad,input_data.grad)
None None
loss.backward()
print(w.grad,input_data.grad)
tensor([[2.]]) tensor([[3.]])

还可以使用gradcheck方法,来检验自定义的计算梯度的公式是否正确。

test_input1 = t.tensor(t.rand(2,7),requires_grad=True)
test_input2 = t.tensor(t.rand(5,7),requires_grad=True)
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  """Entry point for launching an IPython kernel.
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\ipykernel_launcher.py:2: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
t.autograd.gradcheck(LinearFunction.apply,(test_input1,test_input2),eps=1e-2)
True
  • 第一个参数是自定义层的实例
  • 第二个参数是forward的参数
  • 第三个参数是误差允许的范围