机器学习课程笔记-4

2017-03-19

Notes about Andrew Ng's Machine Learning course on Coursera-part 4

原创文章，转载请注明：转自Luozm's Blog

1. Neural Networks–representation

1.1 Motivations

1.2 Model Representation

At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features $x_1⋯x_n$, and the output is the result of our hypothesis function. In this model our $x_0$ input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, $\frac{1}{1+e^{−θ^Tx}}$, yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”.

一个简单的神经网络表示如下：

$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline \end{bmatrix}\rightarrow\begin{bmatrix}\ \ \ \newline \end{bmatrix}\rightarrow h_\theta(x)$

其中第一层是输入层，中间括号中为隐含层，最终输出一个假设函数，也就是输出层。

含有一个隐含层的模型如下：

$\begin{bmatrix}x_0 \newline x_1 \newline x_2 \newline x_3\end{bmatrix}\rightarrow\begin{bmatrix}a_1^{(2)} \newline a_2^{(2)} \newline a_3^{(2)} \newline \end{bmatrix}\rightarrow h_\theta(x)$

其中

$\begin{align*}& a_i^{(j)} = \text{"activation" of unit $i$ in layer $j$} \newline& \Theta^{(j)} = \text{matrix of weights controlling function mapping from layer $j$ to layer $j+1$}\end{align*}$

If network has $s_j$ units in layer $j$ and $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j + 1)$.

model Representation

Vectorized Version

重新整理上述模型：

$\begin{align*}a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end{align*}$

其中，对于层 $j=2$ 和节点 $k$ ，变量 $z$ 表示为：

$z_k^{(2)} = \Theta_{k,0}^{(1)}x_0 + \Theta_{k,1}^{(1)}x_1 + \cdots + \Theta_{k,n}^{(1)}x_n$

向量化 $x$ 和 $z$，如下：

$\begin{align*}x = \begin{bmatrix}x_0 \newline x_1 \newline\cdots \newline x_n\end{bmatrix} &z^{(j)} = \begin{bmatrix}z_1^{(j)} \newline z_2^{(j)} \newline\cdots \newline z_n^{(j)}\end{bmatrix}\end{align*}$

令 $x = a^{(1)}$，则向量化后的公式可以重写为：

$z^{(j)} = \Theta^{(j-1)}a^{(j-1)}$ $a^{(j)} = g(z^{(j)})$ $z^{(j+1)} = \Theta^{(j)}a^{(j)}$

注意在最后一层中，我们做的事情与逻辑回归中相同：

$h_\Theta(x) = a^{(j+1)} = g(z^{(j+1)})$

1.3 Applications

模拟 “AND”

我们所用的模型如下：

$\begin{align*}\begin{bmatrix}x_0 \newline x_1 \newline x_2\end{bmatrix} \rightarrow\begin{bmatrix}g(z^{(2)})\end{bmatrix} \rightarrow h_\Theta(x)\end{align*}$ $\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20\end{bmatrix}$

当且仅当 $x_1$ 和 $x_2$ 都为1时输出1，结果如下：

$\begin{align*}& h_\Theta(x) = g(-30 + 20x_1 + 20x_2) \newline \newline & x_1 = 0 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-30) \approx 0 \newline & x_1 = 0 \ \ and \ \ x_2 = 1 \ \ then \ \ g(-10) \approx 0 \newline & x_1 = 1 \ \ and \ \ x_2 = 0 \ \ then \ \ g(-10) \approx 0 \newline & x_1 = 1 \ \ and \ \ x_2 = 1 \ \ then \ \ g(10) \approx 1\end{align*}$

模拟”OR”

模型与上述类似，只是权重有所改变：

$\Theta^{(1)} =\begin{bmatrix}-10 & 20 & 20\end{bmatrix}$

结果如下：

模拟其他逻辑操作

总结起来，模拟不同操作的参数如下：

$\begin{align*}AND:\newline\Theta^{(1)} &=\begin{bmatrix}-30 & 20 & 20\end{bmatrix} \newline NOR:\newline\Theta^{(1)} &= \begin{bmatrix}10 & -20 & -20\end{bmatrix} \newline OR:\newline\Theta^{(1)} &= \begin{bmatrix}-10 & 20 & 20\end{bmatrix} \newline\end{align*}$

“XNOR”：结合上述参数来模拟”XNOR”（which gives 1 if $x_1$ and $x_2$ are both 0 or both 1）：

$\begin{align*}\begin{bmatrix}x_0 \newline x_1 \newline x_2\end{bmatrix} \rightarrow\begin{bmatrix}a_1^{(2)} \newline a_2^{(2)} \end{bmatrix} \rightarrow\begin{bmatrix}a^{(3)}\end{bmatrix} \rightarrow h_\Theta(x)\end{align*}$

其中，第一层的参数结合了”AND”和”NOR”：

$\Theta^{(1)} =\begin{bmatrix}-30 & 20 & 20 \newline 10 & -20 & -20\end{bmatrix}$

第二层的参数与”OR”相同：

$\Theta^{(2)} =\begin{bmatrix}-10 & 20 & 20\end{bmatrix}$

具体模型如下：

$\begin{align*}& a^{(2)} = g(\Theta^{(1)} \cdot x) \newline& a^{(3)} = g(\Theta^{(2)} \cdot a^{(2)}) \newline& h_\Theta(x) = a^{(3)}\end{align*}$

XNOR

Multiclass Classification

为了完成多类分类任务，我们采用“one VS all”策略，即同时对每个类别进行逻辑回归： Multiclass

定义标签 $y$ 为：

同时预测多个类别：

输出结果可能如下，这表示预测分类为Motorcycle：

$h_\Theta(x) =\begin{bmatrix}0 \newline 0 \newline 1 \newline 0 \newline\end{bmatrix}$

Life

LEGO sets wanted