https://www.bilibili.com/video/BV1CueXzdEmy?vd_source=2d1af2a9b2aa8124af12180f29fb55f6&spm_id_from=333.788.player.switch&p=17
补充:
import numpy as np import torch
x = torch.tensor(0.0, requires_grad=True) # x需要被求导
a = torch.tensor(1.0)b = torch.tensor(-2.0)c = torch.tensor(1.0)
y = a*torch.pow(x,2) + b*x + c
y.backward()
dy_dx = x.gradprint(dy_dx)
数学原理
函数 f(x) = x² - 2x + 1 的导数是:
当 x = 0 时:
因此输出 tensor(-2.) 正确。
PyTorch自动求导机制
计算图构建:PyTorch会动态构建一个计算图,记录所有运算
requires_grad=True:标记需要计算梯度的张量
backward():自动计算所有requires_grad张量的梯度
x.grad:存储计算得到的梯度值
2. 非标量的反向传播
import numpy as np import torch
x = torch.tensor([[0.0,0.0],[1.0,2.0]],requires_grad = True) a = torch.tensor(1.0)b = torch.tensor(-2.0)c = torch.tensor(1.0)y = a*torch.pow(x,2) + b*x + c
gradient = torch.tensor([[1.0,1.0],[1.0,1.0]])
print("x:\n",x)print("y:\n",y)y.backward(gradient = gradient)x_grad = x.gradprint("x_grad:\n",x_grad)
输出:x: tensor([[0., 0.],
[1., 2.]], requires_grad=True)y: tensor([[1., 1.], [0., 1.]], grad_fn=)x_grad: tensor([[-2., -2.], [ 0., 2.]])
1.计算过程
2. 函数 f(x) = x² - 2x + 1 的计算
对每个元素应用函数:
f(0) = 0² - 2×0 + 1 = 1
f(0) = 1
f(1) = 1² - 2×1 + 1 = 0
f(2) = 2² - 2×2 + 1 = 1
所以输出y:
3. 导数 f'(x) = 2x - 2
对每个位置计算导数:
f'(0) = -2
f'(0) = -2
f'(1) = 0
f'(2) = 2
关键点:gradient参数
当输出不是标量时,需要传入一个与输出y形状相同的gradient参数作为"权重":
gradient = torch.tensor([[1.0, 1.0], [1.0, 1.0]])
y.backward(gradient) = sum(y * gradient) 的梯度
实际计算过程
loss = y[0,0]*1 + y[0,1]*1 + y[1,0]*1 + y[1,1]*1gradient = d(loss)/dx
于gradient全为1,最终x.grad就是每个位置的导数:x_grad = [[f'(0), f'(0)], [f'(1), f'(2)]] = [[-2, -2], [0, 2]]
数学验证
位置[0,0]: x=0, f'(0) = -2 ✓
位置[0,1]: x=0, f'(0) = -2 ✓
位置[1,0]: x=1, f'(1) = 0 ✓
位置[1,1]: x=2, f'(2) = 2 ✓
import numpy as np import torch
x = torch.tensor([[0.0,0.0],[1.0,2.0]],requires_grad = True) a = torch.tensor(1.0)b = torch.tensor(-2.0)c = torch.tensor(1.0)y = a*torch.pow(x,2) + b*x + c
gradient = torch.tensor([[1.0,1.0],[1.0,1.0]])z = torch.sum(y*gradient)
print("x:",x)print("y:",y)z.backward()x_grad = x.gradprint("x_grad:\n",x_grad)
输出:x: tensor([[0., 0.], [1., 2.]], requires_grad=True)y: tensor([[1., 1.], [0., 1.]], grad_fn=)x_grad: tensor([[-2., -2.], [ 0., 2.]])
参考:https://download.csdn.net/blog/column/9886176/108303645