约定
先说一个大坑吧,矩阵求导有两种布局(layout):分子布局(numerator layout,也叫Jacobian formulation)和分母布局(denominator layout,也叫Hessian formulation),两者互为转置,个人理解就是谁先对谁展开的问题,关于这两者的详细区别见维基百科。为什么说是坑呢?因为不同书可能会用不同的布局,对同一个公式理解起来就非常费解,我在刚学的时候真是一头雾水,甚至同一本书里面也会同时使用这两种布局(比如著名的The Matrix Cookbook)。没有特别说明的话,本文使用分母布局,向量为列向量。
scalar $y$ | vector $\mathbf y \in \mathbb R^m$ | matrix $\mathbf Y \in \mathbb R^{m\times n}$ | |
scalar $x$ | scalar | ${\partial \mathbf y \over \partial x} \in \mathbb R^{m}$ | ${\partial \mathbf Y \over \partial x} \in \mathbb R^{n\times m}$仅分子布局 |
vector $\mathbf x \in \mathbb R^n$ | ${\partial y \over \partial \mathbf x} \in \mathbb R^n$ | ${\partial \mathbf y \over \partial \mathbf x} \in \mathbb R^{n\times m}$ | |
matrix $\mathbf X \in \mathbb R^{p\times q}$ | ${\partial y \over \partial \mathbf X} \in \mathbb R^{p\times q}$ |
定义
向量对标量求导, 标量对向量求导
结果都是向量,不同的是一个是行向量一个是列向量
$$
\begin{align}
\left[ {\partial \mathbf{x} \over \partial a} \right]_i &= {\partial x_i \over \partial a} \
\left[ {\partial a \over \partial \mathbf{x}} \right]_i &= {\partial a \over \partial x_i} \
{\partial \mathbf y \over \partial a} &= \begin{bmatrix}
{\partial y_1 \over \partial a} & {\partial y_2 \over \partial a} & \cdots & {\partial y_m \over \partial a} \end{bmatrix} \
{\partial a \over \partial \mathbf y} &= \begin{bmatrix}
{\partial a \over \partial y_1} \
{\partial a \over \partial y_2} \ \vdots \ {\partial a \over \partial y_m} \
\end{bmatrix}
\end{align}
$$
矩阵对标量求导,标量对矩阵求导
结果都是矩阵,其元素为:
$$
\begin{align}
\left[ {\partial x \over \partial \mathbf{A}} \right]{ij} &= {\partial x \over \partial a{ij} } \
\left[ {\partial \mathbf{A} \over \partial x} \right]{ij} &= {\partial a{ji} \over \partial x} \
\frac{\partial y}{\partial\mathbf{X}} &= \begin{bmatrix} \frac{\partial y}{\partial x{11}} & \frac{\partial y}{\partial x{12}} & \cdots & \frac{\partial y}{\partial x{1n}}\
\frac{\partial y}{\partial x{21}} & \frac{\partial y}{\partial x{22}} & \cdots & \frac{\partial y}{\partial x{2n}}\
\vdots & \vdots & \ddots & \vdots\
\frac{\partial y}{\partial x{m1}} & \frac{\partial y}{\partial x{m2}} & \cdots & \frac{\partial y}{\partial x{mn}}
\end{bmatrix} \
\frac{\partial\mathbf{Y}}{\partial x} &= \begin{bmatrix}\frac{\partial y{11}}{\partial x} & \frac{\partial y{21}}{\partial x} & \cdots & \frac{\partial y{m1}}{\partial x}\
\frac{\partial y{12}}{\partial x} & \frac{\partial y{22}}{\partial x} & \cdots & \frac{\partial y{m2}}{\partial x}\
\vdots & \vdots & \ddots & \vdots \
\frac{\partial y{1n}}{\partial x} & \frac{\partial y{2n}}{\partial x} & \cdots & \frac{\partial y{mn}}{\partial x}
\end{bmatrix}
\end{align}
$$
向量对向量求导
结果是个矩阵
$$
\begin{align}
\begin{bmatrix}\frac{\partial\mathbf{y}} {\partial\mathbf{x}}\end{bmatrix} _{ij} &=
{\partial y_j \over \partial x_i} \
\frac{\partial\mathbf{y}}{\partial\mathbf{x}} &= \begin{bmatrix}
{\partial \mathbf y \over x_1} \
{\partial \mathbf y \over x_2} \
\vdots \
{\partial \mathbf y \over xn} \
\end{bmatrix} = \begin{bmatrix}
\frac{\partial y{1}}{\partial x{1}} & \frac{\partial y{2}}{\partial x{1}} & \cdots & \frac{\partial y{m}}{\partial x{1}}\
\frac{\partial y{1}}{\partial x{2}} & \frac{\partial y{2}}{\partial x{2}} & \cdots & \frac{\partial y{m}}{\partial x{2}}\
\vdots & \vdots & \ddots & \vdots\
\frac{\partial y{1}}{\partial x{n}} & \frac{\partial y{2}}{\partial x{n}} & \cdots & \frac{\partial y{m}}{\partial x_{n}}
\end{bmatrix}
\end{align}
$$
求导方法
时刻要注意维数匹配
1.首先看求导类型是哪一类(标量对矩阵、标量对向量等等),这样可以确定结果的维数。
2.展开计算被求导的式子(分母)。
3.求结果的每一项。
4.比对得到求导结果。
牛刀小试
让我们从定义出发,来求一个常用的:
$$
{\partial \mathbf x^{\rm T} \mathbf A \mathbf x \over \partial \mathbf x }
$$
其中$\mathbf x \in \mathbb R^m,\mathbf A \in \mathbb R^{m\times m}$,$\mathbf A$与$\mathbf x$无关。
step1:是标量对向量求导,因此结果应该是列向量,维数和x一样。
step2: 展开分母
$$
\text{let} f = \mathbf{x}^{\mathrm{T}}\mathbf{Ax} = \sum_i \sum_j xi A{ij} x_j \in \mathbb R
$$
step3:求结果的一项
$$
\left[ {\partial \mathbf x^{\rm T} \mathbf A \mathbf x \over \partial \mathbf x} \right]_p = {\partial f \over \partial xp} = \sum{i\neq p} xi A{ip} + \sum{j \neq p} A{pj} xj + 2A{pp} x_p= \sumi (A{ip} + A_{pi})x_i
$$
step4:比对得到求导结果
$$
{\partial \mathbf x^{\rm T} \mathbf A \mathbf x \over \partial \mathbf x } = (\mathbf A + \bf A^{\rm T}) \bf x
$$
更一般的,对于$\bf u \in \mathbb R^m, \bf v \in \mathbb R^n, \bf A \in \mathbb R^{m\times n}$,$\bf u$和$\bf v$和$\bf x$有关,$\bf A$与$\bf x$无关,有:
$$
{\partial \mathbf u^{\rm T} \mathbf A \mathbf v \over \partial \mathbf x } = {\partial \bf u \over \partial x} \bf {Av} + {\partial v \over \partial x} \bf A^{\rm T} \bf u
$$
详细过程参见ref4 (11)
链式法则
Todo
公式速查
对向量求导($\mathbf a$相对于$\mathbf x$为常数)
$$
\begin{align}
{\partial \mathbf x^{\rm T} \mathbf a \over \partial \mathbf x} &= {\partial \mathbf a \mathbf x^{\rm T} \over \partial \mathbf x} = \mathbf{a} \tag{1.1}\
{\partial \bf A \bf x \over \partial \mathbf x} &= \mathbf{A}^{\rm T} \tag{1.2} \
{\partial \mathbf x^{\rm T} \mathbf A \over \partial \mathbf x} &= \mathbf{A} \tag{1.3} \
{\partial \mathbf x^{\rm T} \mathbf A \mathbf x \over \partial \mathbf x } &= (\mathbf A + \mathbf A^{\rm T}) \mathbf{x} \tag{1.4} \
{\partial \mathbf x^{\rm T} \mathbf A \mathbf x \over \partial \mathbf x } &= 2 \mathbf A \mathbf{x} \text{ if } \mathbf A \text{ is symmetric} \tag{1.5} \
{\partial \mathbf u^{\rm T} \mathbf A \mathbf v \over \partial \mathbf x } &= {\partial \mathbf u \over \partial \mathbf x} \mathbf {Av} + {\partial \mathbf v \over \partial \mathbf x} \mathbf A^{\rm T} \mathbf u \tag{1.6} \
\end{align}
$$
对矩阵求导($\mathbf {a,b}$相对于$\mathbf X$为常数)
$$
\begin{align}
{\partial \mathbf a^T \mathbf X \mathbf b \over \partial \mathbf X} &= \mathbf a \mathbf b^T \tag{2.1}\
{\partial \mathbf a^T \mathbf X^T \mathbf b \over \partial \mathbf X} &= {\partial \mathbf b^T \mathbf X \mathbf a \over \partial \mathbf X}= \mathbf b\mathbf a^T \tag{2.2}\
{\partial \mathbf a^T \mathbf X \mathbf a \over \partial \mathbf X} & = {\partial \mathbf a^T \mathbf X^T \mathbf a \over \partial \mathbf X} =\mathbf a \mathbf a^T \tag{2.3}\
\end{align}
$$
前4个来自于The Matrix Book,69~72
(1.3)的证明:
$$
\begin{align}
[\mathbf x^{\rm T} \mathbf A]_p &= \sumi A{ip} xi \
\left[ {\partial \mathbf x^{\rm T} \mathbf A \over \partial \mathbf x} \right]{pq} &= {\partial \sumi A{iq} x_i \over \partial xp} = A{pq} \
\end{align}
$$
参考
[1] http://www.cnblogs.com/huashiyiqike/p/3568922.html
[2] http://xuehy.github.io/2014/04/18/2014-04-18-matrixcalc/
[3] wiki https://en.wikipedia.org/wiki/Matrix_calculus
[4] http://www.kamperh.com/notes/kamper_matrixcalculus13.pdf
[5] The Matrix Cookbook https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf