|
| 1 | +使用逻辑回归分类器进行MNIST分类(Classifying MNIST using Logistic Regressing) |
| 2 | +============================= |
| 3 | + 本节假定读者属性了下面的Theano概念:[共享变量(shared variable)](http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables), [基本数学算子(basic arithmetic ops)](http://deeplearning.net/software/theano/tutorial/adding.html#adding-two-scalars), [Theano的进阶(T.grad)](http://deeplearning.net/software/theano/tutorial/examples.html#computing-gradients), [floatX(默认为float64)](http://deeplearning.net/software/theano/library/config.html#config.floatX)。假如你想要在你的GPU上跑你的代码,你也需要看[GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html)。 |
| 4 | + 本节的所有代码可以在[这里](http://deeplearning.net/tutorial/code/logistic_sgd.py)下载。 |
| 5 | + |
| 6 | +在这一节,我们将展示Theano如何实现最基本的分类器:逻辑回归分类器。我们以模型的快速入门开始,复习(refresher)和巩固(anchor)数学负号,也展示了数学表达式如何映射到Theano图中。 |
| 7 | + |
| 8 | +###模型 |
| 9 | +逻辑回归模型是一个线性概率模型。它由一个权值矩阵W和偏置向量b参数化。分类通过将输入向量提交到一组超平面,每个超平面对应一个类。输入向量和超平面的距离是这个输入属于该类的一个概率量化。 |
| 10 | +Theano代码如下。 |
| 11 | + |
| 12 | +```Python |
| 13 | + # initialize with 0 the weights W as a matrix of shape (n_in, n_out) |
| 14 | + self.W = theano.shared( |
| 15 | + value=numpy.zeros( |
| 16 | + (n_in, n_out), |
| 17 | + dtype=theano.config.floatX |
| 18 | + ), |
| 19 | + name='W', |
| 20 | + borrow=True |
| 21 | + ) |
| 22 | + # initialize the baises b as a vector of n_out 0s |
| 23 | + self.b = theano.shared( |
| 24 | + value=numpy.zeros( |
| 25 | + (n_out,), |
| 26 | + dtype=theano.config.floatX |
| 27 | + ), |
| 28 | + name='b', |
| 29 | + borrow=True |
| 30 | + ) |
| 31 | + |
| 32 | + # symbolic expression for computing the matrix of class-membership |
| 33 | + # probabilities |
| 34 | + # Where: |
| 35 | + # W is a matrix where column-k represent the separation hyper plain for |
| 36 | + # class-k |
| 37 | + # x is a matrix where row-j represents input training sample-j |
| 38 | + # b is a vector where element-k represent the free parameter of hyper |
| 39 | + # plain-k |
| 40 | + self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) |
| 41 | + |
| 42 | + # symbolic description of how to compute prediction as class whose |
| 43 | + # probability is maximal |
| 44 | + self.y_pred = T.argmax(self.p_y_given_x, axis=1) |
| 45 | +``` |
| 46 | + |
| 47 | +由于模型的参数需要不断的存取和修正,所以我们把W和b定义为共享变量。这个dot(点乘)和softmax运算用以计算这个P(Y|x,W,b)。这个结果`p_y_given_x`(probability)是一个vector类型的概率向量。 |
| 48 | +为了获得实际的模型预测,我们使用`T_argmax`操作,来返回`p_y_given_x`的最大值对应的y。 |
| 49 | + 如果想要获得完整的Theano算子,看[算子列表](http://deeplearning.net/software/theano/library/tensor/basic.html#basic-tensor-functionality) |
| 50 | + |
| 51 | +###定义一个损失函数 |
| 52 | +学习优化模型参数需要最小化一个损失参数。在多分类的逻辑回归中,很显然是使用负对数似然函数作为损失函数。 |
| 53 | +虽然整本书都致力于探讨最小化话题,但梯度下降是迄今为止最简单的最小化非线性函数的方法。在这个教程中,我们使用minibatch随机梯度下降算法。可以看[随机梯度下降](http://deeplearning.net/tutorial/gettingstarted.html#opt-sgd)来获得更多细节。 |
| 54 | +下面的代码定义了一个对给定的minibatch的损失函数。 |
| 55 | + |
| 56 | +```Python |
| 57 | + # y.shape[0] is (symbolically) the number of rows in y, i.e., |
| 58 | + # number of examples (call it n) in the minibatch |
| 59 | + # T.arange(y.shape[0]) is a symbolic vector which will contain |
| 60 | + # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of |
| 61 | + # Log-Probabilities (call it LP) with one row per example and |
| 62 | + # one column per class LP[T.arange(y.shape[0]),y] is a vector |
| 63 | + # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ..., |
| 64 | + # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is |
| 65 | + # the mean (across minibatch examples) of the elements in v, |
| 66 | + # i.e., the mean log-likelihood across the minibatch. |
| 67 | + return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]) |
| 68 | +``` |
| 69 | + 在这里我们使用错误的平均来表示损失函数,以减少minibatch尺寸对我们的影响。 |
| 70 | + |
| 71 | +###创建一个逻辑回归类 |
| 72 | +现在,我们要定义一个`逻辑回归`的类,来概括逻辑回归的基本行为。代码已经是我们之前涵盖的了,不再进行过多解释。 |
| 73 | + |
| 74 | +```Python |
| 75 | +class LogisticRegression(object): |
| 76 | + """Multi-class Logistic Regression Class |
| 77 | +
|
| 78 | + The logistic regression is fully described by a weight matrix :math:`W` |
| 79 | + and bias vector :math:`b`. Classification is done by projecting data |
| 80 | + points onto a set of hyperplanes, the distance to which is used to |
| 81 | + determine a class membership probability. |
| 82 | + """ |
| 83 | + |
| 84 | + def __init__(self, input, n_in, n_out): |
| 85 | + """ Initialize the parameters of the logistic regression |
| 86 | +
|
| 87 | + :type input: theano.tensor.TensorType |
| 88 | + :param input: symbolic variable that describes the input of the |
| 89 | + architecture (one minibatch) |
| 90 | +
|
| 91 | + :type n_in: int |
| 92 | + :param n_in: number of input units, the dimension of the space in |
| 93 | + which the datapoints lie |
| 94 | +
|
| 95 | + :type n_out: int |
| 96 | + :param n_out: number of output units, the dimension of the space in |
| 97 | + which the labels lie |
| 98 | +
|
| 99 | + """ |
| 100 | + # start-snippet-1 |
| 101 | + # initialize with 0 the weights W as a matrix of shape (n_in, n_out) |
| 102 | + self.W = theano.shared( |
| 103 | + value=numpy.zeros( |
| 104 | + (n_in, n_out), |
| 105 | + dtype=theano.config.floatX |
| 106 | + ), |
| 107 | + name='W', |
| 108 | + borrow=True |
| 109 | + ) |
| 110 | + # initialize the baises b as a vector of n_out 0s |
| 111 | + self.b = theano.shared( |
| 112 | + value=numpy.zeros( |
| 113 | + (n_out,), |
| 114 | + dtype=theano.config.floatX |
| 115 | + ), |
| 116 | + name='b', |
| 117 | + borrow=True |
| 118 | + ) |
| 119 | + |
| 120 | + # symbolic expression for computing the matrix of class-membership |
| 121 | + # probabilities |
| 122 | + # Where: |
| 123 | + # W is a matrix where column-k represent the separation hyper plain for |
| 124 | + # class-k |
| 125 | + # x is a matrix where row-j represents input training sample-j |
| 126 | + # b is a vector where element-k represent the free parameter of hyper |
| 127 | + # plain-k |
| 128 | + self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) |
| 129 | + |
| 130 | + # symbolic description of how to compute prediction as class whose |
| 131 | + # probability is maximal |
| 132 | + self.y_pred = T.argmax(self.p_y_given_x, axis=1) |
| 133 | + # end-snippet-1 |
| 134 | + |
| 135 | + # parameters of the model |
| 136 | + self.params = [self.W, self.b] |
| 137 | + |
| 138 | + def negative_log_likelihood(self, y): |
| 139 | + """Return the mean of the negative log-likelihood of the prediction |
| 140 | + of this model under a given target distribution. |
| 141 | +
|
| 142 | + .. math:: |
| 143 | +
|
| 144 | + \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) = |
| 145 | + \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|} |
| 146 | + \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\ |
| 147 | + \ell (\theta=\{W,b\}, \mathcal{D}) |
| 148 | +
|
| 149 | + :type y: theano.tensor.TensorType |
| 150 | + :param y: corresponds to a vector that gives for each example the |
| 151 | + correct label |
| 152 | +
|
| 153 | + Note: we use the mean instead of the sum so that |
| 154 | + the learning rate is less dependent on the batch size |
| 155 | + """ |
| 156 | + # start-snippet-2 |
| 157 | + # y.shape[0] is (symbolically) the number of rows in y, i.e., |
| 158 | + # number of examples (call it n) in the minibatch |
| 159 | + # T.arange(y.shape[0]) is a symbolic vector which will contain |
| 160 | + # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of |
| 161 | + # Log-Probabilities (call it LP) with one row per example and |
| 162 | + # one column per class LP[T.arange(y.shape[0]),y] is a vector |
| 163 | + # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ..., |
| 164 | + # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is |
| 165 | + # the mean (across minibatch examples) of the elements in v, |
| 166 | + # i.e., the mean log-likelihood across the minibatch. |
| 167 | + return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]) |
| 168 | + # end-snippet-2 |
| 169 | + |
| 170 | + def errors(self, y): |
| 171 | + """Return a float representing the number of errors in the minibatch |
| 172 | + over the total number of examples of the minibatch ; zero one |
| 173 | + loss over the size of the minibatch |
| 174 | +
|
| 175 | + :type y: theano.tensor.TensorType |
| 176 | + :param y: corresponds to a vector that gives for each example the |
| 177 | + correct label |
| 178 | + """ |
| 179 | + |
| 180 | + # check if y has same dimension of y_pred |
| 181 | + if y.ndim != self.y_pred.ndim: |
| 182 | + raise TypeError( |
| 183 | + 'y should have the same shape as self.y_pred', |
| 184 | + ('y', y.type, 'y_pred', self.y_pred.type) |
| 185 | + ) |
| 186 | + # check if y is of the correct datatype |
| 187 | + if y.dtype.startswith('int'): |
| 188 | + # the T.neq operator returns a vector of 0s and 1s, where 1 |
| 189 | + # represents a mistake in prediction |
| 190 | + return T.mean(T.neq(self.y_pred, y)) |
| 191 | + else: |
| 192 | + raise NotImplementedError() |
| 193 | +``` |
| 194 | +我们通过如下代码来实例化这个类。 |
| 195 | + |
| 196 | +```Pyhton |
| 197 | + # generate symbolic variables for input (x and y represent a |
| 198 | + # minibatch) |
| 199 | + x = T.matrix('x') # data, presented as rasterized images |
| 200 | + y = T.ivector('y') # labels, presented as 1D vector of [int] labels |
| 201 | +
|
| 202 | + # construct the logistic regression class |
| 203 | + # Each MNIST image has size 28*28 |
| 204 | + classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10) |
| 205 | +
|
| 206 | +``` |
| 207 | +需要注意的是,输入向量x,和其相关的标签y都是定义在`LogisticRegression`实体外的。这个类需要将输入数据作为`__init__`函数的参数。这在将这些类的实例连接起来构建深网络方面非常有用。一层的输出可以作为下一层的输入。 |
| 208 | +最后,我们定义了一个`cost`变量来最小化。 |
| 209 | + |
| 210 | +```Python |
| 211 | + # the cost we minimize during training is the negative log likelihood of |
| 212 | + # the model in symbolic format |
| 213 | + cost = classifier.negative_log_likelihood(y) |
| 214 | +``` |
| 215 | + |
| 216 | +###学习模型 |
| 217 | +在实现MSGD的许多语言中,需要通过手动求解损失函数对每个参数的梯度(微分)来实现。 |
| 218 | +在Theano中呢,这是非常简单的。它自动微分,并且使用了一定的数学转换来提高数学稳定性。 |
| 219 | + |
| 220 | +```Pyhton |
| 221 | + g_W = T.grad(cost=cost, wrt=classifier.W) |
| 222 | + g_b = T.grad(cost=cost, wrt=classifier.b) |
| 223 | +``` |
| 224 | +这个函数`train_model`可以被定义如下。 |
| 225 | +```Python |
| 226 | + # specify how to update the parameters of the model as a list of |
| 227 | + # (variable, update expression) pairs. |
| 228 | + updates = [(classifier.W, classifier.W - learning_rate * g_W), |
| 229 | + (classifier.b, classifier.b - learning_rate * g_b)] |
| 230 | + |
| 231 | + # compiling a Theano function `train_model` that returns the cost, but in |
| 232 | + # the same time updates the parameter of the model based on the rules |
| 233 | + # defined in `updates` |
| 234 | + train_model = theano.function( |
| 235 | + inputs=[index], |
| 236 | + outputs=cost, |
| 237 | + updates=updates, |
| 238 | + givens={ |
| 239 | + x: train_set_x[index * batch_size: (index + 1) * batch_size], |
| 240 | + y: train_set_y[index * batch_size: (index + 1) * batch_size] |
| 241 | + } |
| 242 | + ) |
| 243 | +``` |
| 244 | +`update`是一个list,用以更新每一步的参数。`given`是一个字典,用以表示象征变量,和你在该步中表示的数据。这个`train_model`定义如下: |
| 245 | +* |
| 246 | + |
| 247 | + |
| 248 | + |
| 249 | + |
| 250 | + |
| 251 | + |
| 252 | + |
| 253 | + |
| 254 | + |
| 255 | + |
| 256 | + |
| 257 | + |
| 258 | + |
| 259 | + |
| 260 | + |
0 commit comments