Skip to content

Commit 7c1b3ca

Browse files
committed
update readme
1 parent a309fab commit 7c1b3ca

3 files changed

Lines changed: 265 additions & 6 deletions

File tree

1_Getting_Started_入门.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,7 @@ import numpy
7070
###深度学习的监督优化入门
7171
####学习一个分类器
7272
#####0-1损失函数
73-
f(x)=argmax(k) P(Y=k|x,theta)
74-
L=sum(I(f(x)==y))
73+
f(x)=argmax(k) P(Y=k|x,theta) L=sum(I(f(x)==y))
7574

7675
```Python
7776
# zero_one_loss is a Theano variable representing a symbolic
@@ -164,7 +163,7 @@ for (x_batch, y_batch) in train_batches:
164163

165164
####正则化
166165
正则化是为了防止在MSGD训练过程中出现过拟合。为了应对过拟合,我们提出了几个方法:L1/L2正则化和early-stopping。
167-
######L1/L2正则化
166+
#####L1/L2正则化
168167
L1/L2正则化就是在损失函数中添加额外的项,用以惩罚一定的参数结构。对于L2正则化,又被称为“权制递减(weight decay)”。
169168
原则上来说,增加一个正则项,可以平滑神经网络的网络映射(通过惩罚大的参数值,可以减少网络模型的非线性参数数)。因而最小化这个和,就可以寻找到与训练数据最贴合同时范化性更好的模型。更具奥卡姆剃刀原则,最好的模型总是最简单的。
170169
当然,事实上,简单模型并不一定意味着好的泛化。但从经验上看,这个正则化方案可以提高神经网络的泛化能力,尤其是对于小数据集而言。下面的代码我们分别给两个正则项一个对应的权重。
@@ -249,7 +248,7 @@ while (epoch < n_epochs) and (not done_looping):
249248
这是对优化章节的总结。Early-stopping技术需要我们将数据分割为训练集、验证集、测试集。测试集使用minibatch的随机梯度下降来对目标函数进行逼近。同时引入L1/L2正则项来应对过拟合。
250249

251250
###Theano/Python技巧
252-
#####载入和保存模型
251+
####载入和保存模型
253252
当你做实验的时候,用梯度下降算法可能要好几个小时去发现一个最优解。你可能在发现解的时候,想要保存这些权值。你也可能想要保存搜索进程中当前最优化的解。
254253

255254
#####使用Pickle在共享变量中储存numpy的ndarrays
Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
使用逻辑回归分类器进行MNIST分类(Classifying MNIST using Logistic Regressing)
2+
=============================
3+
本节假定读者属性了下面的Theano概念:[共享变量(shared variable)](http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables), [基本数学算子(basic arithmetic ops)](http://deeplearning.net/software/theano/tutorial/adding.html#adding-two-scalars), [Theano的进阶(T.grad)](http://deeplearning.net/software/theano/tutorial/examples.html#computing-gradients), [floatX(默认为float64)](http://deeplearning.net/software/theano/library/config.html#config.floatX)。假如你想要在你的GPU上跑你的代码,你也需要看[GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html)。
4+
本节的所有代码可以在[这里](http://deeplearning.net/tutorial/code/logistic_sgd.py)下载。
5+
6+
在这一节,我们将展示Theano如何实现最基本的分类器:逻辑回归分类器。我们以模型的快速入门开始,复习(refresher)和巩固(anchor)数学负号,也展示了数学表达式如何映射到Theano图中。
7+
8+
###模型
9+
逻辑回归模型是一个线性概率模型。它由一个权值矩阵W和偏置向量b参数化。分类通过将输入向量提交到一组超平面,每个超平面对应一个类。输入向量和超平面的距离是这个输入属于该类的一个概率量化。
10+
Theano代码如下。
11+
12+
```Python
13+
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
14+
self.W = theano.shared(
15+
value=numpy.zeros(
16+
(n_in, n_out),
17+
dtype=theano.config.floatX
18+
),
19+
name='W',
20+
borrow=True
21+
)
22+
# initialize the baises b as a vector of n_out 0s
23+
self.b = theano.shared(
24+
value=numpy.zeros(
25+
(n_out,),
26+
dtype=theano.config.floatX
27+
),
28+
name='b',
29+
borrow=True
30+
)
31+
32+
# symbolic expression for computing the matrix of class-membership
33+
# probabilities
34+
# Where:
35+
# W is a matrix where column-k represent the separation hyper plain for
36+
# class-k
37+
# x is a matrix where row-j represents input training sample-j
38+
# b is a vector where element-k represent the free parameter of hyper
39+
# plain-k
40+
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
41+
42+
# symbolic description of how to compute prediction as class whose
43+
# probability is maximal
44+
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
45+
```
46+
47+
由于模型的参数需要不断的存取和修正,所以我们把W和b定义为共享变量。这个dot(点乘)和softmax运算用以计算这个P(Y|x,W,b)。这个结果`p_y_given_x`(probability)是一个vector类型的概率向量。
48+
为了获得实际的模型预测,我们使用`T_argmax`操作,来返回`p_y_given_x`的最大值对应的y。
49+
如果想要获得完整的Theano算子,看[算子列表](http://deeplearning.net/software/theano/library/tensor/basic.html#basic-tensor-functionality)
50+
51+
###定义一个损失函数
52+
学习优化模型参数需要最小化一个损失参数。在多分类的逻辑回归中,很显然是使用负对数似然函数作为损失函数。
53+
虽然整本书都致力于探讨最小化话题,但梯度下降是迄今为止最简单的最小化非线性函数的方法。在这个教程中,我们使用minibatch随机梯度下降算法。可以看[随机梯度下降](http://deeplearning.net/tutorial/gettingstarted.html#opt-sgd)来获得更多细节。
54+
下面的代码定义了一个对给定的minibatch的损失函数。
55+
56+
```Python
57+
# y.shape[0] is (symbolically) the number of rows in y, i.e.,
58+
# number of examples (call it n) in the minibatch
59+
# T.arange(y.shape[0]) is a symbolic vector which will contain
60+
# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
61+
# Log-Probabilities (call it LP) with one row per example and
62+
# one column per class LP[T.arange(y.shape[0]),y] is a vector
63+
# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
64+
# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
65+
# the mean (across minibatch examples) of the elements in v,
66+
# i.e., the mean log-likelihood across the minibatch.
67+
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
68+
```
69+
在这里我们使用错误的平均来表示损失函数,以减少minibatch尺寸对我们的影响。
70+
71+
###创建一个逻辑回归类
72+
现在,我们要定义一个`逻辑回归`的类,来概括逻辑回归的基本行为。代码已经是我们之前涵盖的了,不再进行过多解释。
73+
74+
```Python
75+
class LogisticRegression(object):
76+
"""Multi-class Logistic Regression Class
77+
78+
The logistic regression is fully described by a weight matrix :math:`W`
79+
and bias vector :math:`b`. Classification is done by projecting data
80+
points onto a set of hyperplanes, the distance to which is used to
81+
determine a class membership probability.
82+
"""
83+
84+
def __init__(self, input, n_in, n_out):
85+
""" Initialize the parameters of the logistic regression
86+
87+
:type input: theano.tensor.TensorType
88+
:param input: symbolic variable that describes the input of the
89+
architecture (one minibatch)
90+
91+
:type n_in: int
92+
:param n_in: number of input units, the dimension of the space in
93+
which the datapoints lie
94+
95+
:type n_out: int
96+
:param n_out: number of output units, the dimension of the space in
97+
which the labels lie
98+
99+
"""
100+
# start-snippet-1
101+
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
102+
self.W = theano.shared(
103+
value=numpy.zeros(
104+
(n_in, n_out),
105+
dtype=theano.config.floatX
106+
),
107+
name='W',
108+
borrow=True
109+
)
110+
# initialize the baises b as a vector of n_out 0s
111+
self.b = theano.shared(
112+
value=numpy.zeros(
113+
(n_out,),
114+
dtype=theano.config.floatX
115+
),
116+
name='b',
117+
borrow=True
118+
)
119+
120+
# symbolic expression for computing the matrix of class-membership
121+
# probabilities
122+
# Where:
123+
# W is a matrix where column-k represent the separation hyper plain for
124+
# class-k
125+
# x is a matrix where row-j represents input training sample-j
126+
# b is a vector where element-k represent the free parameter of hyper
127+
# plain-k
128+
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
129+
130+
# symbolic description of how to compute prediction as class whose
131+
# probability is maximal
132+
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
133+
# end-snippet-1
134+
135+
# parameters of the model
136+
self.params = [self.W, self.b]
137+
138+
def negative_log_likelihood(self, y):
139+
"""Return the mean of the negative log-likelihood of the prediction
140+
of this model under a given target distribution.
141+
142+
.. math::
143+
144+
\frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
145+
\frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}
146+
\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
147+
\ell (\theta=\{W,b\}, \mathcal{D})
148+
149+
:type y: theano.tensor.TensorType
150+
:param y: corresponds to a vector that gives for each example the
151+
correct label
152+
153+
Note: we use the mean instead of the sum so that
154+
the learning rate is less dependent on the batch size
155+
"""
156+
# start-snippet-2
157+
# y.shape[0] is (symbolically) the number of rows in y, i.e.,
158+
# number of examples (call it n) in the minibatch
159+
# T.arange(y.shape[0]) is a symbolic vector which will contain
160+
# [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
161+
# Log-Probabilities (call it LP) with one row per example and
162+
# one column per class LP[T.arange(y.shape[0]),y] is a vector
163+
# v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
164+
# LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
165+
# the mean (across minibatch examples) of the elements in v,
166+
# i.e., the mean log-likelihood across the minibatch.
167+
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
168+
# end-snippet-2
169+
170+
def errors(self, y):
171+
"""Return a float representing the number of errors in the minibatch
172+
over the total number of examples of the minibatch ; zero one
173+
loss over the size of the minibatch
174+
175+
:type y: theano.tensor.TensorType
176+
:param y: corresponds to a vector that gives for each example the
177+
correct label
178+
"""
179+
180+
# check if y has same dimension of y_pred
181+
if y.ndim != self.y_pred.ndim:
182+
raise TypeError(
183+
'y should have the same shape as self.y_pred',
184+
('y', y.type, 'y_pred', self.y_pred.type)
185+
)
186+
# check if y is of the correct datatype
187+
if y.dtype.startswith('int'):
188+
# the T.neq operator returns a vector of 0s and 1s, where 1
189+
# represents a mistake in prediction
190+
return T.mean(T.neq(self.y_pred, y))
191+
else:
192+
raise NotImplementedError()
193+
```
194+
我们通过如下代码来实例化这个类。
195+
196+
```Pyhton
197+
# generate symbolic variables for input (x and y represent a
198+
# minibatch)
199+
x = T.matrix('x') # data, presented as rasterized images
200+
y = T.ivector('y') # labels, presented as 1D vector of [int] labels
201+
202+
# construct the logistic regression class
203+
# Each MNIST image has size 28*28
204+
classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)
205+
206+
```
207+
需要注意的是,输入向量x,和其相关的标签y都是定义在`LogisticRegression`实体外的。这个类需要将输入数据作为`__init__`函数的参数。这在将这些类的实例连接起来构建深网络方面非常有用。一层的输出可以作为下一层的输入。
208+
最后,我们定义了一个`cost`变量来最小化。
209+
210+
```Python
211+
# the cost we minimize during training is the negative log likelihood of
212+
# the model in symbolic format
213+
cost = classifier.negative_log_likelihood(y)
214+
```
215+
216+
###学习模型
217+
在实现MSGD的许多语言中,需要通过手动求解损失函数对每个参数的梯度(微分)来实现。
218+
在Theano中呢,这是非常简单的。它自动微分,并且使用了一定的数学转换来提高数学稳定性。
219+
220+
```Pyhton
221+
g_W = T.grad(cost=cost, wrt=classifier.W)
222+
g_b = T.grad(cost=cost, wrt=classifier.b)
223+
```
224+
这个函数`train_model`可以被定义如下。
225+
```Python
226+
# specify how to update the parameters of the model as a list of
227+
# (variable, update expression) pairs.
228+
updates = [(classifier.W, classifier.W - learning_rate * g_W),
229+
(classifier.b, classifier.b - learning_rate * g_b)]
230+
231+
# compiling a Theano function `train_model` that returns the cost, but in
232+
# the same time updates the parameter of the model based on the rules
233+
# defined in `updates`
234+
train_model = theano.function(
235+
inputs=[index],
236+
outputs=cost,
237+
updates=updates,
238+
givens={
239+
x: train_set_x[index * batch_size: (index + 1) * batch_size],
240+
y: train_set_y[index * batch_size: (index + 1) * batch_size]
241+
}
242+
)
243+
```
244+
`update`是一个list,用以更新每一步的参数。`given`是一个字典,用以表示象征变量,和你在该步中表示的数据。这个`train_model`定义如下:
245+
*
246+
247+
248+
249+
250+
251+
252+
253+
254+
255+
256+
257+
258+
259+
260+

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ This is a `Chinese tutorial` which is translated from [DeepLearning 0.1 document
1111

1212

1313

14-
##Contents
14+
##内容/Contents
1515

1616
* `入门`(Getting Started)
17-
* Classifying MNIST digits using Logistic Regression
17+
* `使用逻辑回归进行MNIST分类`Classifying MNIST digits using Logistic Regression
1818
* Multilayer Perceptron
1919
* Convolutional Neural Networks(LeNet)
2020
* Denoising Autoencoders(dA)

0 commit comments

Comments
 (0)