UMich DL for CV

Neural Network

Feature Transformation

for example, the original space is a Cartesian coordinate system. After mathematic transformation, we can turn it into Polar coordinate system, called feature space.

In this situation, a nonlinear classifier in the original space can become a linear classifier in the feature space, which is more convenient to implement.

An application is Histogram of Oriented Gradients (HoG), the method:

Compute edge direction/strength at each pixel
Divide image into 8*8 regions
Within each region, compute a histogram of edge directions weighted by the edge strength.

Fully-connected Neural Network

the max function in f is called the activation function, a classical implementation is ReLU:

#Rectified Linear Unit
def ReLU(z):
	return max(0, z)

Q: What if we build a neural network without activation function?

A: In this situation f = W_2 * W_1 * x, assuming W_3 = W_2 * W_1, we get f = W_3 * x. This is still a linear classifier!

Why ReLU can work?

The two pictures above show that ReLU can transform the non-linear boundary in original space to linear boundary in feature space.

Back Propagation

an example on back propagation

以$f=(x+y)z$为例，该算式的前向传播过程可以表达为$q=x+y\quad f=qz$，反向传播的目的是为了计算偏差$\frac{\partial f}{\partial x}$，$\frac{\partial f}{\partial y}$和$\frac{\partial f}{\partial z}$。根据链式法则可知，$\frac{\partial f}{\partial x}=\frac{\partial f}{\partial q}\frac{\partial q}{\partial x}$，其中，从$q=x+y$这一步运算角度来看，$\frac{\partial f}{\partial x}$称为Downstream Gradient，而$\frac{\partial q}{\partial x}$称为Local Gradient，$\frac{\partial f}{\partial q}$则称为Upstream Gradient。

Sigmoid Function

$\sigma(x)=\large \frac{1}{1+e^{-x}}$

$\frac{\partial \sigma(x)}{\partial x}=(1-\sigma(x))\sigma(x)$

Pattern Gradient

Matrix Operation

在真正的神经网络中，输入和输出一般是以矩阵的形式给出，这种情况下的对于$\frac{\partial L}{\partial x}$的计算就如上图所示。由于矩阵的维度D，M都很大，直接计算出所有的梯度是不可能的(out of memory)，因此，需要采用切片的方式，每次计算一部分的梯度值。例如，先计算$\frac{\partial L}{\partial x_{1,1}}$，那么相当于要计算$\frac{\partial L}{\partial y}\frac{\partial y}{\partial x_{1,1}}$，$\frac{\partial L}{\partial y}$作为upstream gradient是已知的，$\frac{\partial y}{\partial x_{1,1}}$形状应该与$y$相同，只需要依次求出$\frac{\partial y_{i,j}}{\partial x_{1,1}}$的值即可，例如，图中就给出了$\frac{\partial y_{1,2}}{\partial x_{1,1}}$的计算方法，考虑$\frac{\partial y_{2,3}}{\partial x_{1,1}}$，由于$y_{2,3}=x_{2,1}w_{1,3}+x_{2,2}w_{2,3}+x_{2,3}w_{3,3}$，所以$\frac{\partial y_{2,3}}{\partial x_{1,1}}=0$。

从上述的推导中，就可以总结出一个规律：

$\LARGE \frac{\partial L}{\partial x_{i,j}}=\frac{\partial L}{\partial y}\frac{\partial y}{\partial x_{i,j}}=(w_{j,:})\cdot (\frac{\partial L}{\partial y_{i,:}})$

Neural Network

Neural Network

Feature Transformation

Fully-connected Neural Network

Why ReLU can work?

Back Propagation

CATALOG

FEATURED TAGS