0. 个人理解1. 基本使用2. MNIST(multiclass classification)入门3. 深入MNIST4. 卷积神经网络:CIFAR-10 数据集分类5. 单词的向量表示(Vector Representations of Words)6. 循环神经网络(RNN)、LSTM(Long-Short Term Memory, LSTM)7. 用深度学习网络搭建一个聊天机器人


0. 个人理解

在学习的最开始,我在这里写一个个人对deep leanring和神经网络的粗略理解,不对的地方请多指教

1. deep learning神经网络本质是在lean什么,我觉得是在learn一个一组参数,或者说是选择模式,也就是我们常说的分类器,这个分类器可能是一个高维度分类器,由一组参数组成2. 拿图像验证码识别来说,这里的参数就是指图像区域中的权重分布情况(数字1和数字2的权重像素空间分布是不同的),如果我们选定图像的像素空间(例如32 * 32) + RGB色彩通道(3)作为输入特征(本质上这就是特征工程),这些特征会被tensorflow当成神经元,并在每一层对这些神经元进行组合,并计算出结果,而下一层神经网络的神经元,会把这一层的输出再进行组合,组合时,根据上一次预测的准确性,会自动通过back propogation给每个组合不同的weight(比重),这个过程会一直进行,直到调整出一个最佳拟合的weight,这个weight往往就是最贴近真实图像的像素空间权重3. . 世界上的所有事物,都可以抽象为一个高维度矩阵,这个过程在不同的领域会有不同的提取抽象方式,即特征工程,值得注意的是,拥有对应领域的专业知识,非常有助于特征工程的实施4. 我们将特定领域的、需要分类/识别的对象抽象为高维度矩阵后,进入deep learn算法模型中就成为了神经元(节点),deep learn模型接下来要做的事称之为"拟合"5. 要完成分类和识别,deep learn的目标是找到一个拟合矩阵(具象来说就是"高维度-1分类切面"),要达到这个目的,需要3个元素  1) 拟合函数(activation 激活函数): 用于生成拟合切面  2) 误差函数(Loss Function): 用于在网络计算的过程中计算当前拟合参数得到的拟合切面离最优值的距离,以便随时调整参数  3) 神经网络结构: deep learn深度学习和普通神经网络的区别就在于"层数"的不同,深度神经网络往往有3层以上(输入层、隐层、输出层),当层数增加后,在每一层选择怎样的组合的交叉结构就成了一个很难的事情,当前还没有完善的理论支撑能精确地计算出什么样的网络结构能输出最好的结果,通常的做法是根据不认同的业务场景去不断尝试不同的网络结构,直到"试出"一个相对较好的网络结构,然后再在这个网络结构的基础之上进行参数调整




0x1: 神经网络到底理解了什么


有些人说,卷积神经网络学习到的对输入空间的分层次解耦模拟了人类视觉皮层的行为。这种说法可能对也可能不对,但目前未知我们还没有比较强的证据来承认或否认它。当然,有些人可以期望人类的视觉皮层就是以类似的方式学东西的,某种程度上讲,这是对我们视觉世界的自然解耦(就像傅里叶变换是对周期声音信号的一种解耦一样自然)【这里是说,就像声音信号的傅里叶变换表达了不同频率的声音信号这种很自然很物理的理解一样,我们可能会认为我们对视觉信息的识别就是分层来完成的,圆的是轮子,有四个轮子的是汽车,造型炫酷的汽车是跑车,像这样】。但是,人类对视觉信号的滤波、分层次、处理的本质很可能和我们弱鸡的卷积网络完全不是一回事。视觉皮层不是卷积的,尽管它们也分层,但那些层具有皮质列的结构,而这些结构的真正目的目前还不得而知,这种结构在我们的人工神经网络中还没有出现(尽管乔大帝Geoff Hinton正在在这个方面努力)。此外,人类有比给静态图像分类的感知器多得多的视觉感知器,这些感知器是连续而主动的,不是静态而被动的,这些感受器还被如眼动等多种机制复杂控制

Relevant Link:



1. 基本使用

0x1: 综述

1. 使用图 (graph) 来表示计算任务.2. 在被称之为 会话 (Session) 的上下文 (context) 中执行图.3. 使用 tensor 表示数据.4. 通过 变量 (Variable) 维护状态.5. 使用 feed 和 fetch 可以为任意的操作(arbitrary operation) 赋值或者从其中获取数据.

TensorFlow 是一个编程系统, 使用图来表示计算任务. 图中的节点被称之为 op (operation 的缩写). 一个 op 获得 0 个或多个 Tensor, 执行计算, 产生 0 个或多个 Tensor. 每个 Tensor 是一个类型化的多维数组. 例如, 你可以将一小组图像集表示为一个四维浮点数数组, 这四个维度分别是 [batch, height, width, channels].
一个 TensorFlow 图描述了计算的过程. 为了进行计算, 图必须在 会话 里被启动. 会话 将图的 op 分发到诸如 CPU 或 GPU 之类的 设备 上, 同时提供执行 op 的方法. 这些方法执行后, 将产生的 tensor 返回. 在 Python 语言中, 返回的 tensor 是 numpy ndarray 对象; 在 C 和 C++ 语言中, 返回的 tensor 是 tensorflow::Tensor 实例.

0x2: 计算图

TensorFlow 程序通常被组织成一个构建阶段和一个执行阶段. 在构建阶段, op 的执行步骤 被描述成一个图. 在执行阶段, 使用会话执行执行图中的 op.
例如, 通常在构建阶段创建一个图来表示和训练神经网络, 然后在执行阶段反复执行图中的训练 op.

1. 构建图(将待分类对象抽象为高维矩阵)

构建图的第一步, 是创建源 op (source op). 源 op 不需要任何输入, 例如 常量 (Constant). 源 op 的输出被传递给其它 op 做运算.
Python 库中, op 构造器的返回值代表被构造出的 op 的输出, 这些返回值可以传递给其它 op 构造器作为输入.

# -*- coding:utf-8 -*-import tensorflow as tfif __name__ == "__main__":    # 创建一个常量 op, 产生一个 1x2 矩阵. 这个 op 被作为一个节点    # 加到默认图中.    #    # 构造器的返回值代表该常量 op 的返回值.    matrix1 = tf.constant([[3., 3.]])    # 创建另外一个常量 op, 产生一个 2x1 矩阵.    matrix2 = tf.constant([[2.],[2.]])    # 创建一个矩阵乘法 matmul op , 把 'matrix1' 和 'matrix2' 作为输入.    # 返回值 'product' 代表矩阵乘法的结果.    product = tf.matmul(matrix1, matrix2)

默认图现在有三个节点, 两个 constant() op, 和一个matmul() op. 为了真正进行矩阵相乘运算, 并得到矩阵乘法的 结果, 必须在会话里启动这个图. 

2. 在一个会话中启动图

构造阶段完成后, 才能启动图. 启动图的第一步是创建一个 Session 对象, 如果无任何创建参数, 会话构造器将启动默认图.

# -*- coding:utf-8 -*-import tensorflow as tfif __name__ == "__main__":    # 创建一个常量 op, 产生一个 1x2 矩阵. 这个 op 被作为一个节点    # 加到默认图中.    #    # 构造器的返回值代表该常量 op 的返回值.    matrix1 = tf.constant([[3., 3.]])    # 创建另外一个常量 op, 产生一个 2x1 矩阵.    matrix2 = tf.constant([[2.],[2.]])    # 创建一个矩阵乘法 matmul op , 把 'matrix1' 和 'matrix2' 作为输入.    # 返回值 'product' 代表矩阵乘法的结果.    product = tf.matmul(matrix1, matrix2)    # 默认图现在有三个节点, 两个 constant() op, 和一个matmul() op. 为了真正进行矩阵相乘运算, 并得到矩阵乘法的 结果, 你必须在会话里启动这个图.    # 启动默认图.    sess = tf.Session()    # 调用 sess 的 'run()' 方法来执行矩阵乘法 op, 传入 'product' 作为该方法的参数.    # 上面提到, 'product' 代表了矩阵乘法 op 的输出, 传入它是向方法表明, 我们希望取回    # 矩阵乘法 op 的输出.    #    # 整个执行过程是自动化的, 会话负责传递 op 所需的全部输入. op 通常是并发执行的.    #    # 函数调用 'run(product)' 触发了图中三个 op (两个常量 op 和一个矩阵乘法 op) 的执行.    #    # 返回值 'result' 是一个 numpy `ndarray` 对象.    result = sess.run(product)    print result    # ==> [[ 12.]]    # 任务完成, 关闭会话.    sess.close()

在实现上, TensorFlow 将图形定义转换成分布式执行的操作, 以充分利用可用的计算资源(如 CPU 或 GPU). 一般你不需要显式指定使用 CPU 还是 GPU, TensorFlow 能自动检测. 如果检测到 GPU, TensorFlow 会尽可能地利用找到的第一个 GPU 来执行操作

0x3: Tensor

TensorFlow 程序使用 tensor 数据结构来代表所有的数据, 计算图中, 操作间传递的数据都是 tensor. 你可以把 TensorFlow tensor 看作是一个 n 维的数组或列表. 一个 tensor 包含一个静态类型 rank, 和 一个 shape.

0x4: 变量

变量维护图执行过程中的状态信息. 下面的例子演示了如何使用变量实现一个简单的计数器

# -*- coding:utf-8 -*-import tensorflow as tfif __name__ == "__main__":    # 创建一个变量, 初始化为标量 0.    state = tf.Variable(0, name="counter")    # 创建一个 op, 其作用是使 state 增加 1    one = tf.constant(1)    new_value = tf.add(state, one)    update = tf.assign(state, new_value)    # 启动图后, 变量必须先经过`初始化` (init) op 初始化,    # 首先必须增加一个`初始化` op 到图中.    init_op = tf.initialize_all_variables()    # 启动图, 运行 op    with tf.Session() as sess:      # 运行 'init' op      sess.run(init_op)      # 打印 'state' 的初始值      print sess.run(state)      # 运行 op, 更新 'state', 并打印 'state'      for _ in range(3):        sess.run(update)        print sess.run(state)

代码中 assign() 操作是图所描绘的表达式的一部分, 正如 add() 操作一样. 所以在调用 run() 执行表达式之前, 它并不会真正执行赋值操作.
通常会将一个统计模型中的参数表示为一组变量. 例如, 你可以将一个神经网络的权重作为某个变量存储在一个 tensor 中. 在训练过程中, 通过重复运行训练图, 更新这个 tensor.

0x5: Fetch

为了取回操作的输出内容, 可以在使用 Session 对象的 run() 调用 执行图时, 传入一些 tensor, 这些 tensor 
会帮助你取回结果. 在之前的例子里, 我们只取回了单个节点 state, 但是你也可以取回多个 tensor:

# -*- coding:utf-8 -*-import tensorflow as tfif __name__ == "__main__":    # 启动默认图.    sess = tf.Session()    input1 = tf.constant(3.0)    input2 = tf.constant(2.0)    input3 = tf.constant(5.0)    intermed = tf.add(input2, input3)    mul = tf.multiply(input1, intermed)    with tf.Session():      result = sess.run([mul, intermed])      print result

0x6: Feed

上述示例在计算图中引入了 tensor, 以常量或变量的形式存储. TensorFlow 还提供了 feed 机制, 该机制 可以临时替代图中的任意操作中的 tensor 可以对图中任何操作提交补丁, 直接插入一个 tensor.
feed 使用一个 tensor 值临时替换一个操作的输出结果. 你可以提供 feed 数据作为 run() 调用的参数. feed 只在调用它的方法内有效, 方法结束, feed 就会消失. 最常见的用例是将某些特殊的操作指定为 "feed" 操作, 标记的方法是使用 tf.placeholder() 为这些操作创建占位符.

# -*- coding:utf-8 -*-import tensorflow as tfif __name__ == "__main__":    input1 = tf.placeholder(tf.types.float32)    input2 = tf.placeholder(tf.types.float32)    output = tf.multiply(input1, input2)    with tf.Session() as sess:        print sess.run([output])        print sess.run([output], feed_dict={input1:[7.], input2:[2.]})

0x7: batch 


1. 第一种,遍历全部数据集算一次损失函数,然后算函数对各个参数的梯度,更新梯度。这种方法每更新一次参数都要把数据集里的所有样本都看一遍,计算量开销大,计算速度慢,不支持在线学习,这称为Batch gradient descent,批梯度下降。2. 另一种,每看一个数据就算一下损失函数,然后求梯度更新参数,这个称为随机梯度下降,stochastic gradient descent。这个方法速度比较快,但是收敛性能不太好,可能在最优点附近晃来晃去,hit不到最优点。两次参数的更新也有可能互相抵消掉,造成目标函数震荡的比较剧烈

为了克服两种方法的缺点,现在一般采用的是一种折中手段,mini-batch gradient decent,小批的梯度下降,这种方法把数据分为若干个批,按批来更新参数,这样,一个批中的一组数据共同决定了本次梯度的方向,下降起来就不容易跑偏,减少了随机性。另一方面因为批的样本数与整个数据集相比小了很多,计算量也不是很大 
我们在代码中常见的优化器SGD是stochastic gradient descent的缩写,但不代表是一个样本就更新一回,还是基于mini-batch的

Relevant Link:



2. MNIST(multiclass classification)入门

0x1: MNIST数据集


正如前面提到的一样,每一个MNIST数据单元有两部分组成:一张包含手写数字的图片和一个对应的标签(监督学习中,正确打标的样本特别重要)。我们把这些图片设为“xs”,把这些标签设为“ys”。训练数据集和测试数据集都包含xs和ys,比如训练数据集的图片是 mnist.train.images ,训练数据集的标签是 mnist.train.labels。

我们把这个数组展开成一个向量,长度是 28x28 = 784。如何展开这个数组(数字间的顺序)不重要,只要保持各个图片采用相同的方式展开。从这个角度来看,MNIST数据集的图片就是在784维向量空间里面的点, 并且拥有比较复杂的结构 (提醒: 此类数据的可视化是计算密集型的)。
展平图片的数字数组会丢失图片的二维结构信息。这显然是不理想的,最优秀的计算机视觉方法会挖掘并利用这些结构信息,但在当前锁学习的简单数学模型,softmax回归(softmax regression),不会利用这些结构信息。
因此,在MNIST训练数据集中,mnist.train.images 是一个形状为 [60000, 784] 的张量,第一个维度数字用来索引图片,第二个维度数字用来索引每张图片中的像素点。在此张量里的每一个元素,都表示某张图片里的某个像素的强度值,值介于0和1之间(黑白图片)。

相对应的MNIST数据集的标签是介于0到9的数字,用来描述给定图片里表示的数字。为了用于这个教程,我们使标签数据是"one-hot vectors"。 一个one-hot向量除了某一位的数字是1以外其余各维度数字都是0。所以在此教程中,数字n将表示成一个只有在第n维度(从0开始)数字为1的10维向量。比如,标签0将表示成([1,0,0,0,0,0,0,0,0,0,0])。因此, mnist.train.labels 是一个 [60000, 10] 的数字矩阵。

0x2: Softmax回归

这是一个使用softmax回归(softmax regression)模型的经典案例。softmax模型可以用来给不同的对象分配概率。即使在之后,我们训练更加精细的模型时,最后一步也需要用softmax来分配概率。
softmax回归(softmax regression)分两步

1. 第一步


我们也需要加入一个额外的偏置量(bias),因为输入往往会带有一些无关的干扰量。因此对于给定的输入图片 x 它代表的是数字 i 的证据可以表示为

其中  代表权重, 代表数字 i 类的偏置量,j 代表给定图片 x 的像素索引用于像素求和。然后用softmax函数可以把这些证据转换成概率 y









0x3: 实现回归模型

y = tf.nn.softmax(tf.matmul(x,W) + b)


0x4: 训练模型


y 是我们预测的概率分布, y' 是实际的分布(我们输入的one-hot vector)。比较粗糙的理解是,交叉熵是用来衡量我们的预测用于描述真相的低效性。即如果我们的描述越不准确,则不确定性就越高,熵值就越大

TensorFlow拥有一张描述你各个计算单元的图,它可以自动地使用反向传播算法(backpropagation algorithm)来有效地确定你的变量是如何影响你想要最小化的那个成本值的。然后,TensorFlow会用你选择的优化算法来不断地修改变量以降低成本。train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

在这里,我们要求TensorFlow用梯度下降算法(gradient descent algorithm)以0.01的学习速率最小化交叉熵。梯度下降算法(gradient descent algorithm)是一个简单的学习过程,TensorFlow只需将每个变量一点点地往使成本不断降低的方向移动


0x5: 评估我们的模型

首先让我们找出那些预测正确的标签。tf.argmax 是一个非常有用的函数,它能给出某个tensor对象在某一维上的其数据(softmax预测出了一个类似[1,0,0,0,0,0,0,0,0]矩阵,对应为1的那个就是它预测出的最大概率的数字)最大值所在的索引值(对应的数字)。由于标签向量是由0,1组成,因此最大值1所在的索引位置就是类别标签,比如tf.argmax(y,1)返回的是模型对于任一输入x预测到的标签值,而 tf.argmax(y_,1) 代表正确的标签,我们可以用 tf.equal 来检测我们的预测是否真实标签匹配(索引位置一样表示匹配)

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

这行代码会给我们一组布尔值。为了确定正确预测项的比例,我们可以把布尔值转换成浮点数,然后取平均值。例如,[True, False, True, True] 会变成 [1,0,1,1] ,取平均值后得到 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))


print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

0x6: mnist_softmax.py

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""A very simple MNIST classifier.See extensive documentation athttp://tensorflow.org/tutorials/mnist/beginners/index.md"""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport argparseimport sysimport input_dataimport tensorflow as tfFLAGS = Nonedef main(_):  # Import data  mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)  # Create the model  x = tf.placeholder(tf.float32, [None, 784])  W = tf.Variable(tf.zeros([784, 10]))  b = tf.Variable(tf.zeros([10]))  y = tf.matmul(x, W) + b  # Define loss and optimizer  y_ = tf.placeholder(tf.float32, [None, 10])  # The raw formulation of cross-entropy,  #  #   tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),  #                                 reduction_indices=[1]))  #  # can be numerically unstable.  #  # So here we use tf.nn.softmax_cross_entropy_with_logits on the raw  # outputs of 'y', and then average across the batch.  cross_entropy = tf.reduce_mean(      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)  sess = tf.InteractiveSession()  tf.global_variables_initializer().run()  # Train  for _ in range(1000):    batch_xs, batch_ys = mnist.train.next_batch(100)    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})  # Test trained model  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))  print(sess.run(accuracy, feed_dict={x: mnist.test.images,                                      y_: mnist.test.labels}))if __name__ == '__main__':  parser = argparse.ArgumentParser()  parser.add_argument('--data_dir', type=str, default='MNIST_data/',                      help='Directory for storing input data')  FLAGS, unparsed = parser.parse_known_args()  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Relevant Link:



3. 深入MNIST

0x1: 构建一个多层卷积网络(多层深度神经网络)

1. 权重初始化

为了创建这个模型,我们需要创建大量的权重和偏置项。这个模型中的权重在初始化时应该加入少量的噪声来打破对称性以及避免0梯度。由于我们使用的是ReLU神经元,因此比较好的做法是用一个较小的正数来初始化偏置项,以避免神经元节点输出恒为0的问题(dead neurons)。为了不在建立模型的时候反复做初始化操作,我们定义两个函数用于初始化

def weight_variable(shape):  initial = tf.truncated_normal(shape, stddev=0.1)  return tf.Variable(initial)def bias_variable(shape):  initial = tf.constant(0.1, shape=shape)  return tf.Variable(initial)

2. 卷积和池化(将低维特征扩展到高维空间)

TensorFlow在卷积和池化上有很强的灵活性。我们怎么处理边界?步长应该设多大?在这个实例里,我们会一直使用vanilla版本。我们的卷积使用1步长(stride size),0边距(padding size)的模板,保证输出和输入是同一个大小。我们的池化用简单传统的2x2大小的模板做max pooling。为了代码更简洁,我们把这部分抽象成一个函数。

def conv2d(x, W):  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(x):  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                        strides=[1, 2, 2, 1], padding='SAME')

3. 第一层卷积

现在我们可以开始实现第一层了。它由一个卷积接一个max pooling完成。卷积在每个5x5的patch中算出32个特征。卷积的权重张量形状是[5, 5, 1, 32],前两个维度是patch的大小,接着是输入的通道数目,最后是输出的通道数目。 而对于每一个输出通道都有一个对应的偏置量。

W_conv1 = weight_variable([5, 5, 1, 32])b_conv1 = bias_variable([32])


x_image = tf.reshape(x, [-1,28,28,1])

We then convolve x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool. 我们把x_image和权值向量进行卷积,加上偏置项,然后应用ReLU激活函数,最后进行max pooling。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)h_pool1 = max_pool_2x2(h_conv1)

4. 第二层卷积


W_conv2 = weight_variable([5, 5, 32, 64])b_conv2 = bias_variable([64])h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)h_pool2 = max_pool_2x2(h_conv2)

5. 密集连接层


W_fc1 = weight_variable([7 * 7 * 64, 1024])b_fc1 = bias_variable([1024])h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

6. Dropout

为了减少过拟合,我们在输出层之前加入dropout。我们用一个placeholder来代表一个神经元的输出在dropout中保持不变的概率。这样我们可以在训练过程中启用dropout,在测试过程中关闭dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神经元的输出外,还会自动处理神经元输出值的scale。所以用dropout的时候可以不用考虑scale。

keep_prob = tf.placeholder("float")h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

7. 输出层

最后,我们添加一个softmax层,就像前面的单层softmax regression一样

W_fc2 = weight_variable([1024, 10])b_fc2 = bias_variable([10])y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

注意这里和softmax regression的区别在于,softmax regression的输入维度是图像像素的768维,而该网络的输入是卷积后的1024高维空间,后者抽象度更好

8. 训练和评估模型


cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))sess.run(tf.initialize_all_variables())for i in range(20000):  batch = mnist.train.next_batch(50)  if i%100 == 0:    train_accuracy = accuracy.eval(feed_dict={        x:batch[0], y_: batch[1], keep_prob: 1.0})    print "step %d, training accuracy %g"%(i, train_accuracy)  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})print "test accuracy %g"%accuracy.eval(feed_dict={    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

9. tensorflow-deep_convolution.py

import input_datamnist = input_data.read_data_sets('MNIST_data/', one_hot=True)import tensorflow as tfdef weight_variable(shape):    initial = tf.truncated_normal(shape, stddev=0.1)    return tf.Variable(initial)def bias_variable(shape):    initial = tf.constant(0.1, shape=shape)    return tf.Variable(initial)def conv2d(x, W):    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')def max_pool_2x2(x):    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                          strides=[1, 2, 2, 1], padding='SAME')sess = tf.InteractiveSession()x = tf.placeholder("float", shape=[None, 784])y_ = tf.placeholder("float", shape=[None, 10])W_conv1 = weight_variable([5, 5, 1, 32])b_conv1 = bias_variable([32])x_image = tf.reshape(x, [-1, 28, 28, 1])h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)h_pool1 = max_pool_2x2(h_conv1)W_conv2 = weight_variable([5, 5, 32, 64])b_conv2 = bias_variable([64])h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)h_pool2 = max_pool_2x2(h_conv2)W_fc1 = weight_variable([7 * 7 * 64, 1024])b_fc1 = bias_variable([1024])h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)keep_prob = tf.placeholder("float")h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)W_fc2 = weight_variable([1024, 10])b_fc2 = bias_variable([10])y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))tf.summary.scalar('Training error', cross_entropy)tf.summary.scalar('Training accuracy', accuracy)tf.summary.scalar('sparsity', tf.nn.zero_fraction(h_fc1))sess.run(tf.global_variables_initializer())merged_summary_op = tf.summary.merge_all()print merged_summary_opsummary_writer = tf.summary.FileWriter('./mnist_logs', sess.graph)for i in range(20000):    batch = mnist.train.next_batch(50)    sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})    if i % 100 == 0:        train_accuracy = accuracy.eval(feed_dict={            x: batch[0], y_: batch[1], keep_prob: 1.0})        print "step %d, training accuracy %g" % (i, train_accuracy)        summary_str = sess.run(merged_summary_op, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})        summary_writer.add_summary(summary_str, i)print "test accuracy %g" % accuracy.eval(feed_dict={    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

0x2: 前馈神经网络(feed-forward neural network):full connected MINST

1. 构建图表 (Build the Graph)

在为数据创建占位符之后,就可以运行mnist.py文件,经过三阶段的模式函数操作:inference(), loss(),和training()。图表就构建完成了

1.inference() —— 尽可能地构建好图表,满足促使神经网络向前反馈并做出预测的要求2.loss() —— 往inference图表中添加生成损失(loss)所需要的操作(ops)3.training() —— 往损失图表中添加计算并应用梯度(gradients)所需的操作


inference()函数会尽可能地构建图表,做到返回包含了预测结果(output prediction)的Tensor。
它接受图像占位符为输入,在此基础上借助ReLu(Rectified Linear Units)激活函数,构建一对完全连接层(layers),以及一个有着十个节点(node)、指明了输出logtis模型的线性层。

with tf.name_scope('hidden1') as scope:


weights = tf.Variable(    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),    name='weights')biases = tf.Variable(tf.zeros([hidden1_units]),                     name='biases')

每个变量在构建时,都会获得初始化操作(initializer ops)。
在这种最常见的情况下,通过tf.truncated_normal函数初始化权重变量,给赋予的shape则是一个二维tensor,其中第一个维度代表该层中权重变量所连接(connect from)的单元数量,第二个维度代表该层中权重变量所连接到的(connect to)单元数量。对于名叫hidden1的第一层,相应的维度则是[IMAGE_PIXELS, hidden1_units](显然,第一层的输入是图像像素维度),因为权重变量将图像输入连接到了hidden1层。tf.truncated_normal初始函数将根据所得到的均值和标准差,生成一个随机分布。
然后,通过tf.zeros函数初始化偏差变量(biases),确保所有偏差的起始值都是0,而它们的shape则是其在该层中所接到的(connect to)单元数量。

hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)logits = tf.matmul(hidden2, weights) + biases



首先,labels_placeholer中的值,将被编码为一个含有1-hot values的Tensor。例如,如果类标识符为“3”,那么该值就会被转换为:

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]


batch_size = tf.size(labels)labels = tf.expand_dims(labels, 1)indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)concated = tf.concat(1, [indices, labels])onehot_labels = tf.sparse_to_dense(    concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)

之后,又添加一个tf.nn.softmax_cross_entropy_with_logits操作,用来比较inference()函数与1-hot标签所输出的logits Tensor。

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,                                                        onehot_labels,                                                        name='xentropy')

然后,使用tf.reduce_mean函数,计算batch维度(第一维度)下交叉熵(cross entropy)的平均值,将将该值作为总损失。

loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')



training()函数添加了通过梯度下降(gradient descent)将损失最小化所需的操作。
首先,该函数从loss()函数中获取损失Tensor,将其交给tf.scalar_summary,后者在与SummaryWriter(见下文)配合使用时,可以向事件文件(events file)中生成汇总值(summary values)。在实验中,每次写入汇总值时,它都会释放损失Tensor的当前值(snapshot value)

tf.scalar_summary(loss.op.name, loss)

接下来,我们实例化一个tf.train.GradientDescentOptimizer,负责按照所要求的学习效率(learning rate)应用梯度下降法(gradients)。

optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)

之后,我们生成一个变量用于保存全局训练步骤(global training step)的数值,并使用minimize()函数更新系统中的三角权重(triangle weights)、增加全局步骤的操作。根据惯例,这个操作被称为 train_op,是TensorFlow会话(session)诱发一个完整训练步骤所必须运行的操作

global_step = tf.Variable(0, name='global_step', trainable=False)train_op = optimizer.minimize(loss, global_step=global_step)

最后,程序返回包含了训练操作(training op)输出结果的Tensor

2. 训练模型


3. 训练循环


for step in xrange(max_steps):    sess.run(train_op)


执行每一步时,我们的代码会生成一个反馈字典(feed dictionary),其中包含对应步骤中训练所要使用的例子,这些例子的哈希键就是其所代表的占位符操作。

images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)


feed_dict = {    images_placeholder: images_feed,    labels_placeholder: labels_feed,}





在运行sess.run函数时,要在代码中明确其需要获取的两个值:[train_op, loss]

for step in xrange(FLAGS.max_steps):    feed_dict = fill_feed_dict(data_sets.train,                               images_placeholder,                               labels_placeholder)    _, loss_value = sess.run([train_op, loss],                             feed_dict=feed_dict)

因为要获取这两个值,sess.run()会返回一个有两个元素的元组。其中每一个Tensor对象,对应了返回的元组中的numpy数组,而这些数组中包含了当前这步训练中对应Tensor的值。由于train_op并不会产生输出,其在返回的元祖中的对应元素就是None,所以会被抛弃。但是,如果模型在训练中出现偏差,loss Tensor的值可能会变成NaN,所以我们要获取它的值,并记录下来。

if step % 100 == 0:    print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)


为了释放TensorBoard所使用的事件文件(events file),所有的即时数据(在这里只有一个)都要在图表构建阶段合并至一个操作(op)中

summary_op = tf.merge_all_summaries()


summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,                                        graph_def=sess.graph_def)


summary_str = sess.run(summary_op, feed_dict=feed_dict)summary_writer.add_summary(summary_str, step)



为了得到可以用来后续恢复模型以进一步训练或评估的检查点文件(checkpoint file),我们实例化一个tf.train.Saver

saver = tf.train.Saver()


saver.save(sess, FLAGS.train_dir, global_step=step)


saver.restore(sess, FLAGS.train_dir)

4. 评估模型


print 'Training Data Eval:'do_eval(sess,        eval_correct,        images_placeholder,        labels_placeholder,        data_sets.train)print 'Validation Data Eval:'do_eval(sess,        eval_correct,        images_placeholder,        labels_placeholder,        data_sets.validation)print 'Test Data Eval:'do_eval(sess,        eval_correct,        images_placeholder,        labels_placeholder,        data_sets.test)

5. fully_connected_feed.py

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Trains and Evaluates the MNIST network using a feed dictionary."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_function# pylint: disable=missing-docstringimport argparseimport os.pathimport sysimport timefrom six.moves import xrange  # pylint: disable=redefined-builtinimport tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_datafrom tensorflow.examples.tutorials.mnist import mnist# Basic model parameters as external flags.FLAGS = Nonedef placeholder_inputs(batch_size):  """Generate placeholder variables to represent the input tensors.  These placeholders are used as inputs by the rest of the model building  code and will be fed from the downloaded data in the .run() loop, below.  Args:    batch_size: The batch size will be baked into both placeholders.  Returns:    images_placeholder: Images placeholder.    labels_placeholder: Labels placeholder.  """  # Note that the shapes of the placeholders match the shapes of the full  # image and label tensors, except the first dimension is now batch_size  # rather than the full size of the train or test data sets.  images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,                                                         mnist.IMAGE_PIXELS))  labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))  return images_placeholder, labels_placeholderdef fill_feed_dict(data_set, images_pl, labels_pl):  """Fills the feed_dict for training the given step.  A feed_dict takes the form of:  feed_dict = {      
, .... } Args: data_set: The set of images and labels, from input_data.read_data_sets() images_pl: The images placeholder, from placeholder_inputs(). labels_pl: The labels placeholder, from placeholder_inputs(). Returns: feed_dict: The feed dictionary mapping from placeholders to values. """ # Create the feed_dict for the placeholders filled with the next # `batch size` examples. images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, FLAGS.fake_data) feed_dict = { images_pl: images_feed, labels_pl: labels_feed, } return feed_dictdef do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_set): """Runs one evaluation against the full epoch of data. Args: sess: The session in which the model has been trained. eval_correct: The Tensor that returns the number of correct predictions. images_placeholder: The images placeholder. labels_placeholder: The labels placeholder. data_set: The set of images and labels to evaluate, from input_data.read_data_sets(). """ # And run one epoch of eval. true_count = 0 # Counts the number of correct predictions. steps_per_epoch = data_set.num_examples // FLAGS.batch_size num_examples = steps_per_epoch * FLAGS.batch_size for step in xrange(steps_per_epoch): feed_dict = fill_feed_dict(data_set, images_placeholder, labels_placeholder) true_count += sess.run(eval_correct, feed_dict=feed_dict) precision = float(true_count) / num_examples print(' Num examples: %d Num correct: %d Precision @ 1: %0.04f' % (num_examples, true_count, precision))def run_training(): """Train MNIST for a number of steps.""" # Get the sets of images and labels for training, validation, and # test on MNIST. data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data) # Tell TensorFlow that the model will be built into the default Graph. with tf.Graph().as_default(): # Generate placeholders for the images and labels. images_placeholder, labels_placeholder = placeholder_inputs( FLAGS.batch_size) # Build a Graph that computes predictions from the inference model. logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2) # Add to the Graph the Ops for loss calculation. loss = mnist.loss(logits, labels_placeholder) # Add to the Graph the Ops that calculate and apply gradients. train_op = mnist.training(loss, FLAGS.learning_rate) # Add the Op to compare the logits to the labels during evaluation. eval_correct = mnist.evaluation(logits, labels_placeholder) # Build the summary Tensor based on the TF collection of Summaries. summary = tf.summary.merge_all() # Add the variable initializer Op. init = tf.global_variables_initializer() # Create a saver for writing training checkpoints. saver = tf.train.Saver() # Create a session for running Ops on the Graph. sess = tf.Session() # Instantiate a SummaryWriter to output summaries and the Graph. summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph) # And then after everything is built: # Run the Op to initialize the variables. sess.run(init) # Start the training loop. for step in xrange(FLAGS.max_steps): start_time = time.time() # Fill a feed dictionary with the actual set of images and labels # for this particular training step. feed_dict = fill_feed_dict(data_sets.train, images_placeholder, labels_placeholder) # Run one step of the model. The return values are the activations # from the `train_op` (which is discarded) and the `loss` Op. To # inspect the values of your Ops or variables, you may include them # in the list passed to sess.run() and the value tensors will be # returned in the tuple from the call. _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict) duration = time.time() - start_time # Write the summaries and print an overview fairly often. if step % 100 == 0: # Print status to stdout. print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)) # Update the events file. summary_str = sess.run(summary, feed_dict=feed_dict) summary_writer.add_summary(summary_str, step) summary_writer.flush() # Save a checkpoint and evaluate the model periodically. if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps: checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt') saver.save(sess, checkpoint_file, global_step=step) # Evaluate against the training set. print('Training Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.train) # Evaluate against the validation set. print('Validation Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.validation) # Evaluate against the test set. print('Test Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.test)def main(_): if tf.gfile.Exists(FLAGS.log_dir): tf.gfile.DeleteRecursively(FLAGS.log_dir) tf.gfile.MakeDirs(FLAGS.log_dir) run_training()if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( '--learning_rate', type=float, default=0.01, help='Initial learning rate.' ) parser.add_argument( '--max_steps', type=int, default=20000, help='Number of steps to run trainer.' ) parser.add_argument( '--hidden1', type=int, default=128, help='Number of units in hidden layer 1.' ) parser.add_argument( '--hidden2', type=int, default=32, help='Number of units in hidden layer 2.' ) parser.add_argument( '--batch_size', type=int, default=100, help='Batch size. Must divide evenly into the dataset sizes.' ) parser.add_argument( '--input_data_dir', type=str, default='MNIST_data/', help='Directory to put the input data.' ) parser.add_argument( '--log_dir', type=str, default='./mnist_logs', help='Directory to put the log data.' ) parser.add_argument( '--fake_data', default=False, help='If true, uses fake data for unit testing.', action='store_true' ) FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Relevant Link:



4. 卷积神经网络:CIFAR-10 数据集分类(将像素空间通卷积扩展到高维空间,输入CNN进行计算)

对CIFAR-10 数据集的分类是机器学习中一个公开的基准测试问题,其任务是对一组32x32RGB的图像进行分类,这些图像涵盖了10个类别:

飞机, 汽车, 鸟, 猫, 鹿, 狗, 青蛙, 马, 船以及卡车

0x1: 模型架构


1. 模型输入

输入模型是通过 inputs() 和distorted_inputs()函数建立起来的,这2个函数会从CIFAR-10二进制文件中读取图片文件,由于每个图片的存储字节数是固定的,因此可以使用tf.FixedLengthRecordReader函数





2. 模型预测

模型的预测流程由inference()构造,该函数会添加必要的操作步骤用于计算预测值的 logits,其对应的模型组织方式如下所示:

conv1    实现卷积 以及 rectified linear activation.pool1    max pooling.norm1    局部响应归一化.conv2    卷积 and rectified linear activation.norm2    局部响应归一化.pool2    max pooling.local3    基于修正线性激活的全连接层.local4    基于修正线性激活的全连接层.softmax_linear    进行线性变换以输出 logits.

0x2: 模型训练

训练一个可进行N维分类的网络的常用方法是使用多项式逻辑回归,又被叫做softmax 回归。Softmax 回归在网络的输出层上附加了一个softmax nonlinearity,并且计算归一化的预测值和label的1-hot encoding的交叉熵。在正则化过程中,我们会对所有学习变量应用权重衰减损失(和手写文字识别类似,图像识别的本质就是对应某个形状的物理对应该区域的权重相应较高,这也是人识别图像甚至畸形图像的本质道理)。模型的目标函数是求交叉熵损失和所有权重衰减项的和,loss()函数的返回值就是这个值。

train() 函数会添加一些操作使得目标函数最小化,这些操作包括计算梯度、更新学习变量(GradientDescentOptimizer)。train() 函数最终会返回一个用以对一批图像执行所有计算的操作步骤,以便训练并更新模型。

0x3: 开始执行并训练模型





# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Builds the CIFAR-10 network.Summary of available functions: # Compute input images and labels for training. If you would like to run # evaluations, use inputs() instead. inputs, labels = distorted_inputs() # Compute inference on the model inputs to make a prediction. predictions = inference(inputs) # Compute the total loss of the prediction with respect to the labels. loss = loss(predictions, labels) # Create a graph to run one step of training with respect to the loss. train_op = train(loss, global_step)"""# pylint: disable=missing-docstringfrom __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport gzipimport osimport reimport sysimport tarfilefrom six.moves import urllibimport tensorflow as tfimport cifar10_inputFLAGS = tf.app.flags.FLAGS# Basic model parameters.tf.app.flags.DEFINE_integer('batch_size', 128,                            """Number of images to process in a batch.""")tf.app.flags.DEFINE_string('data_dir', './cifar10_data',                           """Path to the CIFAR-10 data directory.""")# Global constants describing the CIFAR-10 data set.IMAGE_SIZE = cifar10_input.IMAGE_SIZENUM_CLASSES = cifar10_input.NUM_CLASSESNUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAINNUM_EXAMPLES_PER_EPOCH_FOR_EVAL = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL# Constants describing the training process.MOVING_AVERAGE_DECAY = 0.9999     # The decay to use for the moving average.NUM_EPOCHS_PER_DECAY = 350.0      # Epochs after which learning rate decays.LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.INITIAL_LEARNING_RATE = 0.1       # Initial learning rate.# If a model is trained with multiple GPU's prefix all Op names with tower_name# to differentiate the operations. Note that this prefix is removed from the# names of the summaries when visualizing a model.TOWER_NAME = 'tower'DATA_URL = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'def _activation_summary(x):  """Helper to create summaries for activations.  Creates a summary that provides a histogram of activations.  Creates a summary that measure the sparsity of activations.  Args:    x: Tensor  Returns:    nothing  """  # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training  # session. This helps the clarity of presentation on tensorboard.  tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)  tf.summary.histogram(tensor_name + '/activations', x)  tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x))def _variable_on_cpu(name, shape, initializer):  """Helper to create a Variable stored on CPU memory.  Args:    name: name of the variable    shape: list of ints    initializer: initializer for Variable  Returns:    Variable Tensor  """  with tf.device('/cpu:0'):    var = tf.get_variable(name, shape, initializer=initializer)  return vardef _variable_with_weight_decay(name, shape, stddev, wd):  """Helper to create an initialized Variable with weight decay.  Note that the Variable is initialized with a truncated normal distribution.  A weight decay is added only if one is specified.  Args:    name: name of the variable    shape: list of ints    stddev: standard deviation of a truncated Gaussian    wd: add L2Loss weight decay multiplied by this float. If None, weight        decay is not added for this Variable.  Returns:    Variable Tensor  """  var = _variable_on_cpu(name, shape,                         tf.truncated_normal_initializer(stddev=stddev))  if wd:    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')    tf.add_to_collection('losses', weight_decay)  return vardef distorted_inputs():  """Construct distorted input for CIFAR training using the Reader ops.  Returns:    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.    labels: Labels. 1D tensor of [batch_size] size.  Raises:    ValueError: If no data_dir  """  if not FLAGS.data_dir:    raise ValueError('Please supply a data_dir')  data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')  return cifar10_input.distorted_inputs(data_dir=data_dir,                                        batch_size=FLAGS.batch_size)def inputs(eval_data):  """Construct input for CIFAR evaluation using the Reader ops.  Args:    eval_data: bool, indicating if one should use the train or eval data set.  Returns:    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.    labels: Labels. 1D tensor of [batch_size] size.  Raises:    ValueError: If no data_dir  """  if not FLAGS.data_dir:    raise ValueError('Please supply a data_dir')  data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')  return cifar10_input.inputs(eval_data=eval_data, data_dir=data_dir,                              batch_size=FLAGS.batch_size)def inference(images):  """Build the CIFAR-10 model.  Args:    images: Images returned from distorted_inputs() or inputs().  Returns:    Logits.  """  # We instantiate all variables using tf.get_variable() instead of  # tf.Variable() in order to share variables across multiple GPU training runs.  # If we only ran this model on a single GPU, we could simplify this function  # by replacing all instances of tf.get_variable() with tf.Variable().  #  # conv1  with tf.variable_scope('conv1') as scope:    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],                                         stddev=1e-4, wd=0.0)    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))    bias = tf.nn.bias_add(conv, biases)    conv1 = tf.nn.relu(bias, name=scope.name)    _activation_summary(conv1)  # pool1  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],                         padding='SAME', name='pool1')  # norm1  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,                    name='norm1')  # conv2  with tf.variable_scope('conv2') as scope:    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],                                         stddev=1e-4, wd=0.0)    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))    bias = tf.nn.bias_add(conv, biases)    conv2 = tf.nn.relu(bias, name=scope.name)    _activation_summary(conv2)  # norm2  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,                    name='norm2')  # pool2  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')  # local3  with tf.variable_scope('local3') as scope:    # Move everything into depth so we can perform a single matrix multiply.    dim = 1    for d in pool2.get_shape()[1:].as_list():      dim *= d    reshape = tf.reshape(pool2, [FLAGS.batch_size, dim])    weights = _variable_with_weight_decay('weights', shape=[dim, 384],                                          stddev=0.04, wd=0.004)    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)    _activation_summary(local3)  # local4  with tf.variable_scope('local4') as scope:    weights = _variable_with_weight_decay('weights', shape=[384, 192],                                          stddev=0.04, wd=0.004)    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)    _activation_summary(local4)  # softmax, i.e. softmax(WX + b)  with tf.variable_scope('softmax_linear') as scope:    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],                                          stddev=1/192.0, wd=0.0)    biases = _variable_on_cpu('biases', [NUM_CLASSES],                              tf.constant_initializer(0.0))    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)    _activation_summary(softmax_linear)  return softmax_lineardef loss(logits, labels):  """Add L2Loss to all the trainable variables.  Add summary for for "Loss" and "Loss/avg".  Args:    logits: Logits from inference().    labels: Labels from distorted_inputs or inputs(). 1-D tensor            of shape [batch_size]  Returns:    Loss tensor of type float.  """  # Calculate the average cross entropy loss across the batch.  labels = tf.cast(labels, tf.int64)  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(      logits=logits, labels=labels, name='cross_entropy_per_example')  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')  tf.add_to_collection('losses', cross_entropy_mean)  # The total loss is defined as the cross entropy loss plus all of the weight  # decay terms (L2 loss).  return tf.add_n(tf.get_collection('losses'), name='total_loss')def _add_loss_summaries(total_loss):  """Add summaries for losses in CIFAR-10 model.  Generates moving average for all losses and associated summaries for  visualizing the performance of the network.  Args:    total_loss: Total loss from loss().  Returns:    loss_averages_op: op for generating moving averages of losses.  """  # Compute the moving average of all individual losses and the total loss.  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')  losses = tf.get_collection('losses')  loss_averages_op = loss_averages.apply(losses + [total_loss])  # Attach a scalar summary to all individual losses and the total loss; do the  # same for the averaged version of the losses.  for l in losses + [total_loss]:    # Name each loss as '(raw)' and name the moving average version of the loss    # as the original loss name.    tf.summary.scalar(l.op.name +' (raw)', l)    tf.summary.scalar(l.op.name, loss_averages.average(l))  return loss_averages_opdef train(total_loss, global_step):  """Train CIFAR-10 model.  Create an optimizer and apply to all trainable variables. Add moving  average for all trainable variables.  Args:    total_loss: Total loss from loss().    global_step: Integer Variable counting the number of training steps      processed.  Returns:    train_op: op for training.  """  # Variables that affect learning rate.  num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size  decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)  # Decay the learning rate exponentially based on the number of steps.  lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,                                  global_step,                                  decay_steps,                                  LEARNING_RATE_DECAY_FACTOR,                                  staircase=True)  tf.summary.scalar('learning_rate', lr)  # Generate moving averages of all losses and associated summaries.  loss_averages_op = _add_loss_summaries(total_loss)  # Compute gradients.  with tf.control_dependencies([loss_averages_op]):    opt = tf.train.GradientDescentOptimizer(lr)    grads = opt.compute_gradients(total_loss)  # Apply gradients.  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)  # Add histograms for trainable variables.  for var in tf.trainable_variables():      tf.summary.histogram(var.op.name, var)  # Add histograms for gradients.  for grad, var in grads:    if grad is not None:        tf.summary.histogram(var.op.name + '/gradients', grad)  # Track the moving averages of all trainable variables.  variable_averages = tf.train.ExponentialMovingAverage(      MOVING_AVERAGE_DECAY, global_step)  variables_averages_op = variable_averages.apply(tf.trainable_variables())  with tf.control_dependencies([apply_gradient_op, variables_averages_op]):    train_op = tf.no_op(name='train')  return train_opdef maybe_download_and_extract():  """Download and extract the tarball from Alex's website."""  dest_directory = FLAGS.data_dir  if not os.path.exists(dest_directory):    os.makedirs(dest_directory)  filename = DATA_URL.split('/')[-1]  filepath = os.path.join(dest_directory, filename)  if not os.path.exists(filepath):    def _progress(count, block_size, total_size):      sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,          float(count * block_size) / float(total_size) * 100.0))      sys.stdout.flush()    filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath,                                             reporthook=_progress)    print()    statinfo = os.stat(filepath)    print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')    tarfile.open(filepath, 'r:gz').extractall(dest_directory)


# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""A binary to train CIFAR-10 using a single GPU.Accuracy:cifar10_train.py achieves ~86% accuracy after 100K steps (256 epochs ofdata) as judged by cifar10_eval.py.Speed: With batch_size 128.System        | Step Time (sec/batch)  |     Accuracy------------------------------------------------------------------1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)Usage:Please see the tutorial and website for how to download the CIFAR-10data set, compile the program and train the model.http://tensorflow.org/tutorials/deep_cnn/"""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom datetime import datetimeimport os.pathimport timeimport numpy as npfrom six.moves import xrange  # pylint: disable=redefined-builtinimport tensorflow as tfimport cifar10FLAGS = tf.app.flags.FLAGStf.app.flags.DEFINE_string('train_dir', './cifar10_train',                           """Directory where to write event logs """                           """and checkpoint.""")tf.app.flags.DEFINE_integer('max_steps', 1000000,                            """Number of batches to run.""")tf.app.flags.DEFINE_boolean('log_device_placement', False,                            """Whether to log device placement.""")def train():  """Train CIFAR-10 for a number of steps."""  with tf.Graph().as_default():    global_step = tf.Variable(0, trainable=False)    # Get images and labels for CIFAR-10.    images, labels = cifar10.distorted_inputs()    # Build a Graph that computes the logits predictions from the    # inference model.    logits = cifar10.inference(images)    # Calculate loss.    loss = cifar10.loss(logits, labels)    # Build a Graph that trains the model with one batch of examples and    # updates the model parameters.    train_op = cifar10.train(loss, global_step)    # Create a saver.    saver = tf.train.Saver(tf.global_variables())    # Build the summary operation based on the TF collection of Summaries.    summary_op = tf.summary.merge_all()    # Build an initialization operation to run below.    init = tf.global_variables_initializer()    # Start running operations on the Graph.    sess = tf.Session(config=tf.ConfigProto(        log_device_placement=FLAGS.log_device_placement))    sess.run(init)    # Start the queue runners.    tf.train.start_queue_runners(sess=sess)    summary_writer = tf.summary.FileWriter(FLAGS.train_dir,                                            graph=sess.graph)    for step in xrange(FLAGS.max_steps):      start_time = time.time()      _, loss_value = sess.run([train_op, loss])      duration = time.time() - start_time      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'      if step % 10 == 0:        num_examples_per_step = FLAGS.batch_size        examples_per_sec = num_examples_per_step / duration        sec_per_batch = float(duration)        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '                      'sec/batch)')        print (format_str % (datetime.now(), step, loss_value,                             examples_per_sec, sec_per_batch))      if step % 100 == 0:        summary_str = sess.run(summary_op)        summary_writer.add_summary(summary_str, step)      # Save the model checkpoint periodically.      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')        saver.save(sess, checkpoint_path, global_step=step)def main(argv=None):  # pylint: disable=unused-argument  cifar10.maybe_download_and_extract()  if tf.gfile.Exists(FLAGS.train_dir):    tf.gfile.DeleteRecursively(FLAGS.train_dir)  tf.gfile.MakeDirs(FLAGS.train_dir)  train()if __name__ == '__main__':  tf.app.run()


# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Routine for decoding the CIFAR-10 binary file format."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport osfrom six.moves import xrange  # pylint: disable=redefined-builtinimport tensorflow as tf# Process images of this size. Note that this differs from the original CIFAR# image size of 32 x 32. If one alters this number, then the entire model# architecture will change and any model would need to be retrained.IMAGE_SIZE = 24# Global constants describing the CIFAR-10 data set.NUM_CLASSES = 10NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000def read_cifar10(filename_queue):  """Reads and parses examples from CIFAR10 data files.  Recommendation: if you want N-way read parallelism, call this function  N times.  This will give you N independent Readers reading different  files & positions within those files, which will give better mixing of  examples.  Args:    filename_queue: A queue of strings with the filenames to read from.  Returns:    An object representing a single example, with the following fields:      height: number of rows in the result (32)      width: number of columns in the result (32)      depth: number of color channels in the result (3)      key: a scalar string Tensor describing the filename & record number        for this example.      label: an int32 Tensor with the label in the range 0..9.      uint8image: a [height, width, depth] uint8 Tensor with the image data  """  class CIFAR10Record(object):    pass  result = CIFAR10Record()  # Dimensions of the images in the CIFAR-10 dataset.  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the  # input format.  label_bytes = 1  # 2 for CIFAR-100  result.height = 32  result.width = 32  result.depth = 3  image_bytes = result.height * result.width * result.depth  # Every record consists of a label followed by the image, with a  # fixed number of bytes for each.  record_bytes = label_bytes + image_bytes  # Read a record, getting filenames from the filename_queue.  No  # header or footer in the CIFAR-10 format, so we leave header_bytes  # and footer_bytes at their default of 0.  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)  result.key, value = reader.read(filename_queue)  # Convert from a string to a vector of uint8 that is record_bytes long.  record_bytes = tf.decode_raw(value, tf.uint8)  # The first bytes represent the label, which we convert from uint8->int32.  result.label = tf.cast(      tf.slice(record_bytes, [0], [label_bytes]), tf.int32)  # The remaining bytes after the label represent the image, which we reshape  # from [depth * height * width] to [depth, height, width].  depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),                           [result.depth, result.height, result.width])  # Convert from [depth, height, width] to [height, width, depth].  result.uint8image = tf.transpose(depth_major, [1, 2, 0])  return resultdef _generate_image_and_label_batch(image, label, min_queue_examples,                                    batch_size):  """Construct a queued batch of images and labels.  Args:    image: 3-D Tensor of [height, width, 3] of type.float32.    label: 1-D Tensor of type.int32    min_queue_examples: int32, minimum number of samples to retain      in the queue that provides of batches of examples.    batch_size: Number of images per batch.  Returns:    images: Images. 4D tensor of [batch_size, height, width, 3] size.    labels: Labels. 1D tensor of [batch_size] size.  """  # Create a queue that shuffles the examples, and then  # read 'batch_size' images + labels from the example queue.  num_preprocess_threads = 16  images, label_batch = tf.train.shuffle_batch(      [image, label],      batch_size=batch_size,      num_threads=num_preprocess_threads,      capacity=min_queue_examples + 3 * batch_size,      min_after_dequeue=min_queue_examples)  # Display the training images in the visualizer.  tf.summary.image('images', images)  return images, tf.reshape(label_batch, [batch_size])def distorted_inputs(data_dir, batch_size):  """Construct distorted input for CIFAR training using the Reader ops.  Args:    data_dir: Path to the CIFAR-10 data directory.    batch_size: Number of images per batch.  Returns:    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.    labels: Labels. 1D tensor of [batch_size] size.  """  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)               for i in xrange(1, 6)]  for f in filenames:    if not tf.gfile.Exists(f):      raise ValueError('Failed to find file: ' + f)  # Create a queue that produces the filenames to read.  filename_queue = tf.train.string_input_producer(filenames)  # Read examples from files in the filename queue.  read_input = read_cifar10(filename_queue)  reshaped_image = tf.cast(read_input.uint8image, tf.float32)  height = IMAGE_SIZE  width = IMAGE_SIZE  # Image processing for training the network. Note the many random  # distortions applied to the image.  # Randomly crop a [height, width] section of the image.  distorted_image = tf.random_crop(reshaped_image, [height, width, 3])  # Randomly flip the image horizontally.  distorted_image = tf.image.random_flip_left_right(distorted_image)  # Because these operations are not commutative, consider randomizing  # randomize the order their operation.  distorted_image = tf.image.random_brightness(distorted_image,                                               max_delta=63)  distorted_image = tf.image.random_contrast(distorted_image,                                             lower=0.2, upper=1.8)  # Subtract off the mean and divide by the variance of the pixels.  float_image = tf.image.per_image_standardization(distorted_image)  # Ensure that the random shuffling has good mixing properties.  min_fraction_of_examples_in_queue = 0.4  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *                           min_fraction_of_examples_in_queue)  print ('Filling queue with %d CIFAR images before starting to train. '         'This will take a few minutes.' % min_queue_examples)  # Generate a batch of images and labels by building up a queue of examples.  return _generate_image_and_label_batch(float_image, read_input.label,                                         min_queue_examples, batch_size)def inputs(eval_data, data_dir, batch_size):  """Construct input for CIFAR evaluation using the Reader ops.  Args:    eval_data: bool, indicating if one should use the train or eval data set.    data_dir: Path to the CIFAR-10 data directory.    batch_size: Number of images per batch.  Returns:    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.    labels: Labels. 1D tensor of [batch_size] size.  """  if not eval_data:    filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)                 for i in xrange(1, 6)]    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN  else:    filenames = [os.path.join(data_dir, 'test_batch.bin')]    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL  for f in filenames:    if not tf.gfile.Exists(f):      raise ValueError('Failed to find file: ' + f)  # Create a queue that produces the filenames to read.  filename_queue = tf.train.string_input_producer(filenames)  # Read examples from files in the filename queue.  read_input = read_cifar10(filename_queue)  reshaped_image = tf.cast(read_input.uint8image, tf.float32)  height = IMAGE_SIZE  width = IMAGE_SIZE  # Image processing for evaluation.  # Crop the central [height, width] of the image.  resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,                                                         width, height)  # Subtract off the mean and divide by the variance of the pixels.  float_image = tf.image.per_image_whitening(resized_image)  # Ensure that the random shuffling has good mixing properties.  min_fraction_of_examples_in_queue = 0.4  min_queue_examples = int(num_examples_per_epoch *                           min_fraction_of_examples_in_queue)  # Generate a batch of images and labels by building up a queue of examples.  return _generate_image_and_label_batch(float_image, read_input.label,                                         min_queue_examples, batch_size)

0x4: 评估模型

现在可以在另一部分数据集上来评估训练模型的性能。脚本文件cifar10_eval.py对模型进行了评估,利用 inference()函数重构模型,并使用了在评估数据集所有10,000张CIFAR-10图片进行测试。最终计算出的精度为1:N,N=预测值中置信度最高的一项与图片真实label匹配的频次。(It calculates the precision at 1: how often the top prediction matches the true label of the image)。


# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Evaluation for CIFAR-10.Accuracy:cifar10_train.py achieves 83.0% accuracy after 100K steps (256 epochsof data) as judged by cifar10_eval.py.Speed:On a single Tesla K40, cifar10_train.py processes a single batch of 128 imagesin 0.25-0.35 sec (i.e. 350 - 600 images /sec). The model reaches ~86%accuracy after 100K steps in 8 hours of training time.Usage:Please see the tutorial and website for how to download the CIFAR-10data set, compile the program and train the model.http://tensorflow.org/tutorials/deep_cnn/"""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom datetime import datetimeimport mathimport timeimport tensorflow.python.platformfrom tensorflow.python.platform import gfileimport numpy as npimport tensorflow as tfimport cifar10FLAGS = tf.app.flags.FLAGStf.app.flags.DEFINE_string('eval_dir', './cifar10_eval',                           """Directory where to write event logs.""")tf.app.flags.DEFINE_string('eval_data', 'test',                           """Either 'test' or 'train_eval'.""")tf.app.flags.DEFINE_string('checkpoint_dir', './cifar10_train',                           """Directory where to read model checkpoints.""")tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 5,                            """How often to run the eval.""")tf.app.flags.DEFINE_integer('num_examples', 10000,                            """Number of examples to run.""")tf.app.flags.DEFINE_boolean('run_once', False,                         """Whether to run eval only once.""")def eval_once(saver, summary_writer, top_k_op, summary_op):  """Run Eval once.  Args:    saver: Saver.    summary_writer: Summary writer.    top_k_op: Top K op.    summary_op: Summary op.  """  with tf.Session() as sess:    ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)    if ckpt and ckpt.model_checkpoint_path:      # Restores from checkpoint      saver.restore(sess, ckpt.model_checkpoint_path)      # Assuming model_checkpoint_path looks something like:      #   /my-favorite-path/cifar10_train/model.ckpt-0,      # extract global_step from it.      global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]    else:      print('No checkpoint file found')      return    # Start the queue runners.    coord = tf.train.Coordinator()    try:      threads = []      for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):        threads.extend(qr.create_threads(sess, coord=coord, daemon=True,                                         start=True))      num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))      true_count = 0  # Counts the number of correct predictions.      total_sample_count = num_iter * FLAGS.batch_size      step = 0      while step < num_iter and not coord.should_stop():        predictions = sess.run([top_k_op])        true_count += np.sum(predictions)        step += 1      # Compute precision @ 1.      precision = true_count / total_sample_count      print('%s: precision @ 1 = %.3f' % (datetime.now(), precision))      summary = tf.Summary()      summary.ParseFromString(sess.run(summary_op))      summary.value.add(tag='Precision @ 1', simple_value=precision)      summary_writer.add_summary(summary, global_step)    except Exception as e:  # pylint: disable=broad-except      coord.request_stop(e)    coord.request_stop()    coord.join(threads, stop_grace_period_secs=10)def evaluate():  """Eval CIFAR-10 for a number of steps."""  with tf.Graph().as_default():    # Get images and labels for CIFAR-10.    eval_data = FLAGS.eval_data == 'test'    images, labels = cifar10.inputs(eval_data=eval_data)    # Build a Graph that computes the logits predictions from the    # inference model.    logits = cifar10.inference(images)    # Calculate predictions.    top_k_op = tf.nn.in_top_k(logits, labels, 1)    # Restore the moving average version of the learned variables for eval.    variable_averages = tf.train.ExponentialMovingAverage(        cifar10.MOVING_AVERAGE_DECAY)    variables_to_restore = variable_averages.variables_to_restore()    saver = tf.train.Saver(variables_to_restore)    # Build the summary operation based on the TF collection of Summaries.    summary_op = tf.summary.merge_all()    graph = tf.get_default_graph().as_graph_def()    summary_writer = tf.summary.FileWriter(FLAGS.eval_dir,                                            graph=graph)    while True:      eval_once(saver, summary_writer, top_k_op, summary_op)      if FLAGS.run_once:        break      time.sleep(FLAGS.eval_interval_secs)def main(argv=None):  # pylint: disable=unused-argument  cifar10.maybe_download_and_extract()  if gfile.Exists(FLAGS.eval_dir):    gfile.DeleteRecursively(FLAGS.eval_dir)  gfile.MakeDirs(FLAGS.eval_dir)  evaluate()if __name__ == '__main__':  tf.app.run()

google的tensorflow api在1.0正式版本后变化很大,旧的代码在迁移到1.0后需要修改对应的api名字

0x5: 在GPU上运行

# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""A binary to train CIFAR-10 using multiple GPU's with synchronous updates.Accuracy:cifar10_multi_gpu_train.py achieves ~86% accuracy after 100K steps (256epochs of data) as judged by cifar10_eval.py.Speed: With batch_size 128.System        | Step Time (sec/batch)  |     Accuracy--------------------------------------------------------------------1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)2 Tesla K20m  | 0.13-0.20              | ~84% at 30K steps  (2.5 hours)3 Tesla K20m  | 0.13-0.18              | ~84% at 30K steps4 Tesla K20m  | ~0.10                  | ~84% at 30K stepsUsage:Please see the tutorial and website for how to download the CIFAR-10data set, compile the program and train the model.http://tensorflow.org/tutorials/deep_cnn/"""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionfrom datetime import datetimeimport os.pathimport reimport timeimport numpy as npfrom six.moves import xrange  # pylint: disable=redefined-builtinimport tensorflow as tfimport cifar10FLAGS = tf.app.flags.FLAGStf.app.flags.DEFINE_string('train_dir', './cifar10_train',                           """Directory where to write event logs """                           """and checkpoint.""")tf.app.flags.DEFINE_integer('max_steps', 1000000,                            """Number of batches to run.""")tf.app.flags.DEFINE_integer('num_gpus', 1,                            """How many GPUs to use.""")tf.app.flags.DEFINE_boolean('log_device_placement', False,                            """Whether to log device placement.""")def tower_loss(scope):  """Calculate the total loss on a single tower running the CIFAR model.  Args:    scope: unique prefix string identifying the CIFAR tower, e.g. 'tower_0'  Returns:     Tensor of shape [] containing the total loss for a batch of data  """  # Get images and labels for CIFAR-10.  images, labels = cifar10.distorted_inputs()  # Build inference Graph.  logits = cifar10.inference(images)  # Build the portion of the Graph calculating the losses. Note that we will  # assemble the total_loss using a custom function below.  _ = cifar10.loss(logits, labels)  # Assemble all of the losses for the current tower only.  losses = tf.get_collection('losses', scope)  # Calculate the total loss for the current tower.  total_loss = tf.add_n(losses, name='total_loss')  # Compute the moving average of all individual losses and the total loss.  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')  loss_averages_op = loss_averages.apply(losses + [total_loss])  # Attach a scalar summary to all individual losses and the total loss; do the  # same for the averaged version of the losses.  for l in losses + [total_loss]:    # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training    # session. This helps the clarity of presentation on tensorboard.    loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)    # Name each loss as '(raw)' and name the moving average version of the loss    # as the original loss name.    tf.summary.scalar(loss_name +' (raw)', l)    tf.summary.scalar(loss_name, loss_averages.average(l))  with tf.control_dependencies([loss_averages_op]):    total_loss = tf.identity(total_loss)  return total_lossdef average_gradients(tower_grads):  """Calculate the average gradient for each shared variable across all towers.  Note that this function provides a synchronization point across all towers.  Args:    tower_grads: List of lists of (gradient, variable) tuples. The outer list      is over individual gradients. The inner list is over the gradient      calculation for each tower.  Returns:     List of pairs of (gradient, variable) where the gradient has been averaged     across all towers.  """  average_grads = []  for grad_and_vars in zip(*tower_grads):    # Note that each grad_and_vars looks like the following:    #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))    grads = []    for g, _ in grad_and_vars:      # Add 0 dimension to the gradients to represent the tower.      expanded_g = tf.expand_dims(g, 0)      # Append on a 'tower' dimension which we will average over below.      grads.append(expanded_g)    # Average over the 'tower' dimension.    grad = tf.concat(0, grads)    grad = tf.reduce_mean(grad, 0)    # Keep in mind that the Variables are redundant because they are shared    # across towers. So .. we will just return the first tower's pointer to    # the Variable.    v = grad_and_vars[0][1]    grad_and_var = (grad, v)    average_grads.append(grad_and_var)  return average_gradsdef train():  """Train CIFAR-10 for a number of steps."""  with tf.Graph().as_default(), tf.device('/cpu:0'):    # Create a variable to count the number of train() calls. This equals the    # number of batches processed * FLAGS.num_gpus.    global_step = tf.get_variable(        'global_step', [],        initializer=tf.constant_initializer(0), trainable=False)    # Calculate the learning rate schedule.    num_batches_per_epoch = (cifar10.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /                             FLAGS.batch_size)    decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY)    # Decay the learning rate exponentially based on the number of steps.    lr = tf.train.exponential_decay(cifar10.INITIAL_LEARNING_RATE,                                    global_step,                                    decay_steps,                                    cifar10.LEARNING_RATE_DECAY_FACTOR,                                    staircase=True)    # Create an optimizer that performs gradient descent.    opt = tf.train.GradientDescentOptimizer(lr)    # Calculate the gradients for each model tower.    tower_grads = []    for i in xrange(FLAGS.num_gpus):      with tf.device('/gpu:%d' % i):        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:          # Calculate the loss for one tower of the CIFAR model. This function          # constructs the entire CIFAR model but shares the variables across          # all towers.          loss = tower_loss(scope)          # Reuse variables for the next tower.          tf.get_variable_scope().reuse_variables()          # Retain the summaries from the final tower.          summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)          # Calculate the gradients for the batch of data on this CIFAR tower.          grads = opt.compute_gradients(loss)          # Keep track of the gradients across all towers.          tower_grads.append(grads)    # We must calculate the mean of each gradient. Note that this is the    # synchronization point across all towers.    grads = average_gradients(tower_grads)    # Add a summary to track the learning rate.    summaries.append(tf.summary.scalar('learning_rate', lr))    # Add histograms for gradients.    for grad, var in grads:      if grad is not None:        summaries.append(            tf.summary.histogram(var.op.name + '/gradients', grad))    # Apply the gradients to adjust the shared variables.    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)    # Add histograms for trainable variables.    for var in tf.trainable_variables():      summaries.append(tf.summary.histogram(var.op.name, var))    # Track the moving averages of all trainable variables.    variable_averages = tf.train.ExponentialMovingAverage(        cifar10.MOVING_AVERAGE_DECAY, global_step)    variables_averages_op = variable_averages.apply(tf.trainable_variables())    # Group all updates to into a single train op.    train_op = tf.group(apply_gradient_op, variables_averages_op)    # Create a saver.    saver = tf.train.Saver(tf.global_variables())    # Build the summary operation from the last tower summaries.    summary_op = tf.summary.merge(summaries)  #tf.summary.merge_all(summaries)    # Build an initialization operation to run below.    init = tf.global_variables_initializer()    # Start running operations on the Graph. allow_soft_placement must be set to    # True to build towers on GPU, as some of the ops do not have GPU    # implementations.    sess = tf.Session(config=tf.ConfigProto(        allow_soft_placement=True,        log_device_placement=FLAGS.log_device_placement))    sess.run(init)    # Start the queue runners.    tf.train.start_queue_runners(sess=sess)    summary_writer = tf.summary.FileWriter(FLAGS.train_dir,                                            graph=sess.graph)    for step in xrange(FLAGS.max_steps):      start_time = time.time()      _, loss_value = sess.run([train_op, loss])      duration = time.time() - start_time      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'      if step % 10 == 0:        num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus        examples_per_sec = num_examples_per_step / duration        sec_per_batch = duration / FLAGS.num_gpus        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '                      'sec/batch)')        print (format_str % (datetime.now(), step, loss_value,                             examples_per_sec, sec_per_batch))      if step % 100 == 0:        summary_str = sess.run(summary_op)        summary_writer.add_summary(summary_str, step)      # Save the model checkpoint periodically.      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')        saver.save(sess, checkpoint_path, global_step=step)def main(argv=None):  # pylint: disable=unused-argument  cifar10.maybe_download_and_extract()  if tf.gfile.Exists(FLAGS.train_dir):    tf.gfile.DeleteRecursively(FLAGS.train_dir)  tf.gfile.MakeDirs(FLAGS.train_dir)  train()if __name__ == '__main__':  tf.app.run()

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64export CUDA_HOME=/usr/local/cudaexport PATH=/usr/local/cuda-8.0//bin:$PATHscreen python cifar10_multi_gpu_train.py --num_gpus=1python cifar10_eval.py

Relevant Link:



5. 单词的向量表示(Vector Representations of Words)

0x1: Word Embeddings

通常图像或音频系统处理的是由图片中所有单个原始像素点强度值(pix chanel)或者音频中功率谱密度的强度值,把它们编码成丰富、高纬度的向量数据集(卷积)。对于物体或语音识别这一类的任务,我们所需的全部信息已经都存储在原始数据中(显然人类本身就是依赖原始数据进行日常的物体或语音识别的)
然后,自然语言处理系统通常将词汇作为离散的单一符号,例如 "cat" 一词或可表示为 Id537 ,而 "dog" 一词或可表示为 Id143。这些符号编码毫无规律,无法提供不同词汇之间可能存在的关联信息。换句话说,在处理关于 "dogs" 一词的信息时,模型将无法利用已知的关于 "cats" 的信息(例如,它们都是动物,有四条腿,可作为宠物等等)。可见,将词汇表达为上述的独立离散符号将进一步导致数据稀疏,使我们在训练统计模型时不得不寻求更多的数据。而词汇的向量表示将克服上述的难题

向量空间模型 (VSMs)将词汇表达(嵌套)于一个连续的向量空间中,语义近似的词汇被映射为相邻的数据点。向量空间模型在自然语言处理领域中有着漫长且丰富的历史,不过几乎所有利用这一模型的方法都依赖于 分布式假设,其核心思想为出现于上下文情景中的词汇都有相类似的语义。采用这一假设的研究方法大致分为以下两类

基于计数的方法 (e.g. 潜在语义分析): 基于计数的方法计算某词汇与其邻近词汇在一个大型语料库中共同出现的频率及其他统计量,然后将这些统计量映射到一个小型且稠密的向量中预测方法 (e.g. 神经概率化语言模型): 预测方法则试图直接从某词汇的邻近词汇对其进行预测,在此过程中利用已经学习到的小型且稠密的嵌套向量


连续词袋模型(CBOW): 从算法角度看,这两种方法非常相似,其区别为CBOW根据源词上下文词汇('the cat sits on the')来预测目标词汇(例如,'mat')Skip-Gram模型: Skip-Gram模型做法相反,它通过目标词汇来预测源词汇Skip-Gram模型采取CBOW的逆过程的动机在于    1) CBOW算法对于很多分布式信息进行了平滑处理(例如将一整段上下文信息视为一个单一观察量)。很多情况下,对于小型的数据集,这一处理是有帮助的    2) 相形之下,Skip-Gram模型将每个"上下文-目标词汇"的组合视为一个新观察量,这种做法在大型数据集中会更为有效

0x2: 处理噪声对比训练

神经概率化语言模型通常使用极大似然法 (ML) 进行训练,其中通过 softmax function 来最大化当提供前一个单词 h (代表 "history"),后一个单词的概率  (代表 "target")

当 score(w_t,h) 计算了文字 w_t 和 上下文 h 的相容性(通常使用向量积)。我们使用对数似然函数来训练训练集的最大值,比如通过:

这里提出了一个解决语言概率模型的合适的通用方法。然而这个方法实际执行起来开销非常大,因为我们需要去计算并正则化当前上下文环境 h 中所有其他 V 单词 w' 的概率得分,在每一步训练迭代中


从另一个角度来说,当使用word2vec模型时,我们并不需要对概率模型中的所有特征进行学习。而CBOW模型和Skip-Gram模型为了避免这种情况发生,使用一个二分类器(逻辑回归)在同一个上下文环境里从 k 虚构的 (噪声) 单词  区分出真正的目标单词 。我们下面详细阐述一下CBOW模型,对于Skip-Gram模型只要简单地做相反的操作即可。



其中代表的是数据集在当前上下文 h ,根据所学习的嵌套向量  ,目标单词 w 使用二分类逻辑回归计算得出的概率。在实践中,我们通过在噪声分布中绘制比对文字来获得近似的期望值(通过计算蒙特卡洛平均值)。
当真实地目标单词被分配到较高的概率,同时噪声单词的概率很低时,目标函数也就达到最大值了。从技术层面来说,这种方法叫做"负抽样",而且使用这个损失函数在数学层面上也有很好的解释:这个更新过程也近似于softmax函数的更新。这在计算上将会有很大的优势,因为当计算这个损失函数时,只是有我们挑选出来的 k 个 噪声单词,而没有使用整个语料库 V。这使得训练变得非常快。我们实际上使用了与noise-contrastive estimation (NCE)介绍的非常相似的方法,这在TensorFlow中已经封装了一个很便捷的函数tf.nn.nce_loss()

0x3: Skip-gram 模型


the quick brown fox jumped over the lazy dog

我们首先对一些单词以及它们的上下文环境建立一个数据集。我们可以以任何合理的方式定义'上下文',而通常上这个方式是根据文字的句法语境的(使用语法原理的方式处理当前目标单词可,比如说把目标单词左边的内容当做一个'上下文',或者以目标单词右边的内容,等等。现在我们把目标单词的左右单词视作一个上下文, 使用大小为1的窗口,这样就得到这样一个由(上下文, 目标单词) 组成的数据集

([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...

文提到Skip-Gram模型是把目标单词和上下文颠倒过来,所以在这个问题中,举个例子,就是用'quick'来预测 'the' 和 'brown' ,用 'brown' 预测 'quick' 和 'brown' 。因此这个数据集就变成由(输入, 输出)组成的:

(quick, the), (quick, brown), (brown, quick), (brown, fox), ...

目标函数通常是对整个数据集建立的,但是本问题中要对每一个样本(或者是一个batch_size 很小的样本集,通常设置为16 <= batch_size <= 512)在同一时间执行特别的操作,称之为随机梯度下降 (SGD)。我们来看一下训练过程中每一步的执行。
假设用 t 表示上面这个例子中quick 来预测 the 的训练的单个循环。用 num_noise 定义从噪声分布中挑选出来的噪声(相反的)单词的个数,通常使用一元分布,P(w)。为了简单起见,我们就定num_noise=1,用 sheep 选作噪声词。接下来就可以计算每一对观察值和噪声值的损失函数了,每一个执行步骤就可表示为:

整个计算过程的目标是通过更新嵌套参数  来逼近目标函数(这个这个例子中就是使目标函数最大化)(即让模型向对目标值预测概率最高,而对噪音值预测概率最低)。为此我们要计算损失函数中嵌套参数的梯度。对于整个数据集,当梯度下降的过程中不断地更新参数,对应产生的效果就是不断地移动每个单词的嵌套向量,直到可以把真实单词和噪声单词很好得区分开。
我们可以把学习向量映射到2维中以便我们观察,其中用到的技术可以参考 t-SNE 降纬技术。当我们用可视化的方式来观察这些向量,就可以很明显的获取单词之间语义信息的关系,这实际上是非常有用的。当我们第一次发现这样的诱导向量空间中,展示了一些特定的语义关系,这是非常有趣的,比如文字中 male-female,gender 甚至还有 country-capital 的关系 


0x4: 建立图形


embeddings = tf.Variable(    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

对噪声-比对的损失计算就使用一个逻辑回归模型。对此,我们需要对语料库中的每个单词定义一个权重值和偏差值。(也可称之为输出权重 与之对应的 输入嵌套值)。定义如下

nce_weights = tf.Variable(  tf.truncated_normal([vocabulary_size, embedding_size],                      stddev=1.0 / math.sqrt(embedding_size)))nce_biases = tf.Variable(tf.zeros([vocabulary_size]))


# 建立输入占位符train_inputs = tf.placeholder(tf.int32, shape=[batch_size])train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])


embed = tf.nn.embedding_lookup(embeddings, train_inputs)


# 计算 NCE 损失函数, 每次使用负标签的样本.loss = tf.reduce_mean(  tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,                 num_sampled, vocabulary_size))


# 使用 SGD 控制器.optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)

0x5: 训练模型

训练的过程很简单,只要在循环中使用feed_dict不断给占位符填充数据,同时调用 session.run即可

for inputs, labels in generate_batch(...):  feed_dict = {training_inputs: inputs, training_labels: labels}  _, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)

0x6: 嵌套学习结果可视化

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# ==============================================================================from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport collectionsimport mathimport osimport randomimport zipfileimport numpy as npfrom six.moves import urllibfrom six.moves import xrange  # pylint: disable=redefined-builtinimport tensorflow as tf# Step 1: Download the data.url = 'http://mattmahoney.net/dc/'def maybe_download(filename, expected_bytes):  """Download a file if not present, and make sure it's the right size."""  if not os.path.exists(filename):    filename, _ = urllib.request.urlretrieve(url + filename, filename)  statinfo = os.stat(filename)  if statinfo.st_size == expected_bytes:    print('Found and verified', filename)  else:    print(statinfo.st_size)    raise Exception(        'Failed to verify ' + filename + '. Can you get to it with a browser?')  return filenamefilename = maybe_download('text8.zip', 31344016)# Read the data into a list of strings.def read_data(filename):  """Extract the first file enclosed in a zip file as a list of words"""  with zipfile.ZipFile(filename) as f:    data = tf.compat.as_str(f.read(f.namelist()[0])).split()  return datawords = read_data(filename)print('Data size', len(words))# Step 2: Build the dictionary and replace rare words with UNK token.vocabulary_size = 50000def build_dataset(words):  count = [['UNK', -1]]  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))  dictionary = dict()  for word, _ in count:    dictionary[word] = len(dictionary)  data = list()  unk_count = 0  for word in words:    if word in dictionary:      index = dictionary[word]    else:      index = 0  # dictionary['UNK']      unk_count += 1    data.append(index)  count[0][1] = unk_count  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))  return data, count, dictionary, reverse_dictionarydata, count, dictionary, reverse_dictionary = build_dataset(words)del words  # Hint to reduce memory.print('Most common words (+UNK)', count[:5])print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])data_index = 0# Step 3: Function to generate a training batch for the skip-gram model.def generate_batch(batch_size, num_skips, skip_window):  global data_index  assert batch_size % num_skips == 0  assert num_skips <= 2 * skip_window  batch = np.ndarray(shape=(batch_size), dtype=np.int32)  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)  span = 2 * skip_window + 1  # [ skip_window target skip_window ]  buffer = collections.deque(maxlen=span)  for _ in range(span):    buffer.append(data[data_index])    data_index = (data_index + 1) % len(data)  for i in range(batch_size // num_skips):    target = skip_window  # target label at the center of the buffer    targets_to_avoid = [skip_window]    for j in range(num_skips):      while target in targets_to_avoid:        target = random.randint(0, span - 1)      targets_to_avoid.append(target)      batch[i * num_skips + j] = buffer[skip_window]      labels[i * num_skips + j, 0] = buffer[target]    buffer.append(data[data_index])    data_index = (data_index + 1) % len(data)  # Backtrack a little bit to avoid skipping words in the end of a batch  data_index = (data_index + len(data) - span) % len(data)  return batch, labelsbatch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)for i in range(8):  print(batch[i], reverse_dictionary[batch[i]],        '->', labels[i, 0], reverse_dictionary[labels[i, 0]])# Step 4: Build and train a skip-gram model.batch_size = 128embedding_size = 128  # Dimension of the embedding vector.skip_window = 1       # How many words to consider left and right.num_skips = 2         # How many times to reuse an input to generate a label.# We pick a random validation set to sample nearest neighbors. Here we limit the# validation samples to the words that have a low numeric ID, which by# construction are also the most frequent.valid_size = 16     # Random set of words to evaluate similarity on.valid_window = 100  # Only pick dev samples in the head of the distribution.valid_examples = np.random.choice(valid_window, valid_size, replace=False)num_sampled = 64    # Number of negative examples to sample.graph = tf.Graph()with graph.as_default():  # Input data.  train_inputs = tf.placeholder(tf.int32, shape=[batch_size])  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)  # Ops and variables pinned to the CPU because of missing GPU implementation  with tf.device('/cpu:0'):    # Look up embeddings for inputs.    embeddings = tf.Variable(        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))    embed = tf.nn.embedding_lookup(embeddings, train_inputs)    # Construct the variables for the NCE loss    nce_weights = tf.Variable(        tf.truncated_normal([vocabulary_size, embedding_size],                            stddev=1.0 / math.sqrt(embedding_size)))    nce_biases = tf.Variable(tf.zeros([vocabulary_size]))  # Compute the average NCE loss for the batch.  # tf.nce_loss automatically draws a new sample of the negative labels each  # time we evaluate the loss.  loss = tf.reduce_mean(      tf.nn.nce_loss(weights=nce_weights,                     biases=nce_biases,                     labels=train_labels,                     inputs=embed,                     num_sampled=num_sampled,                     num_classes=vocabulary_size))  # Construct the SGD optimizer using a learning rate of 1.0.  optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)  # Compute the cosine similarity between minibatch examples and all embeddings.  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))  normalized_embeddings = embeddings / norm  valid_embeddings = tf.nn.embedding_lookup(      normalized_embeddings, valid_dataset)  similarity = tf.matmul(      valid_embeddings, normalized_embeddings, transpose_b=True)  # Add variable initializer.  init = tf.global_variables_initializer()# Step 5: Begin training.num_steps = 100001with tf.Session(graph=graph) as session:  # We must initialize all variables before we use them.  init.run()  print("Initialized")  average_loss = 0  for step in xrange(num_steps):    batch_inputs, batch_labels = generate_batch(        batch_size, num_skips, skip_window)    feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}    # We perform one update step by evaluating the optimizer op (including it    # in the list of returned values for session.run()    _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)    average_loss += loss_val    if step % 2000 == 0:      if step > 0:        average_loss /= 2000      # The average loss is an estimate of the loss over the last 2000 batches.      print("Average loss at step ", step, ": ", average_loss)      average_loss = 0    # Note that this is expensive (~20% slowdown if computed every 500 steps)    if step % 10000 == 0:      sim = similarity.eval()      for i in xrange(valid_size):        valid_word = reverse_dictionary[valid_examples[i]]        top_k = 8  # number of nearest neighbors        nearest = (-sim[i, :]).argsort()[1:top_k + 1]        log_str = "Nearest to %s:" % valid_word        for k in xrange(top_k):          close_word = reverse_dictionary[nearest[k]]          log_str = "%s %s," % (log_str, close_word)        print(log_str)  final_embeddings = normalized_embeddings.eval()# Step 6: Visualize the embeddings.def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):  assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"  plt.figure(figsize=(18, 18))  # in inches  for i, label in enumerate(labels):    x, y = low_dim_embs[i, :]    plt.scatter(x, y)    plt.annotate(label,                 xy=(x, y),                 xytext=(5, 2),                 textcoords='offset points',                 ha='right',                 va='bottom')  plt.savefig(filename)try:  from sklearn.manifold import TSNE  import matplotlib.pyplot as plt  tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)  plot_only = 500  low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only, :])  labels = [reverse_dictionary[i] for i in xrange(plot_only)]  plot_with_labels(low_dim_embs, labels)except ImportError:  print("Please install sklearn, matplotlib, and scipy to visualize embeddings.")

0x7: 嵌套学习的评估: 类比推理

词嵌套在NLP的预测问题中是非常有用且使用广泛地。如果要检测一个模型是否是可以成熟地区分词性或者区分专有名词的模型,最简单的办法就是直接检验它的预测词性、语义关系的能力,比如让它解决形如king is to queen as father is to ?这样的问题。这种方法叫做类比推理

# Copyright 2015 Google Inc. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# =============================================================================="""Multi-threaded word2vec mini-batched skip-gram model.Trains the model described in:(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector SpaceICLR 2013.http://arxiv.org/abs/1301.3781This model does traditional minibatching.The key ops used are:* placeholder for feeding in tensors for each example.* embedding_lookup for fetching rows from the embedding matrix.* sigmoid_cross_entropy_with_logits to calculate the loss.* GradientDescentOptimizer for optimizing the loss.* skipgram custom op that does input processing."""from __future__ import absolute_importfrom __future__ import divisionfrom __future__ import print_functionimport osimport sysimport threadingimport timeimport tensorflow.python.platformfrom six.moves import xrange  # pylint: disable=redefined-builtinimport numpy as npimport tensorflow as tffrom tensorflow.models.embedding import gen_word2vec as word2vecflags = tf.app.flagsflags.DEFINE_string("save_path", None, "Directory to write the model and "                    "training summaries.")flags.DEFINE_string("train_data", None, "Training text file. "                    "E.g., unzipped file http://mattmahoney.net/dc/text8.zip.")flags.DEFINE_string(    "eval_data", None, "File consisting of analogies of four tokens."    "embedding 2 - embedding 1 + embedding 3 should be close "    "to embedding 4."    "E.g. https://word2vec.googlecode.com/svn/trunk/questions-words.txt.")flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.")flags.DEFINE_integer(    "epochs_to_train", 15,    "Number of epochs to train. Each epoch processes the training data once "    "completely.")flags.DEFINE_float("learning_rate", 0.2, "Initial learning rate.")flags.DEFINE_integer("num_neg_samples", 100,                     "Negative samples per training example.")flags.DEFINE_integer("batch_size", 16,                     "Number of training examples processed per step "                     "(size of a minibatch).")flags.DEFINE_integer("concurrent_steps", 12,                     "The number of concurrent training steps.")flags.DEFINE_integer("window_size", 5,                     "The number of words to predict to the left and right "                     "of the target word.")flags.DEFINE_integer("min_count", 5,                     "The minimum number of word occurrences for it to be "                     "included in the vocabulary.")flags.DEFINE_float("subsample", 1e-3,                   "Subsample threshold for word occurrence. Words that appear "                   "with higher frequency will be randomly down-sampled. Set "                   "to 0 to disable.")flags.DEFINE_boolean(    "interactive", False,    "If true, enters an IPython interactive session to play with the trained "    "model. E.g., try model.analogy('france', 'paris', 'russia') and "    "model.nearby(['proton', 'elephant', 'maxwell']")flags.DEFINE_integer("statistics_interval", 5,                     "Print statistics every n seconds.")flags.DEFINE_integer("summary_interval", 5,                     "Save training summary to file every n seconds (rounded "                     "up to statistics interval.")flags.DEFINE_integer("checkpoint_interval", 600,                     "Checkpoint the model (i.e. save the parameters) every n "                     "seconds (rounded up to statistics interval.")FLAGS = flags.FLAGSclass Options(object):  """Options used by our word2vec model."""  def __init__(self):    # Model options.    # Embedding dimension.    self.emb_dim = FLAGS.embedding_size    # Training options.    # The training text file.    self.train_data = FLAGS.train_data    # Number of negative samples per example.    self.num_samples = FLAGS.num_neg_samples    # The initial learning rate.    self.learning_rate = FLAGS.learning_rate    # Number of epochs to train. After these many epochs, the learning    # rate decays linearly to zero and the training stops.    self.epochs_to_train = FLAGS.epochs_to_train    # Concurrent training steps.    self.concurrent_steps = FLAGS.concurrent_steps    # Number of examples for one training step.    self.batch_size = FLAGS.batch_size    # The number of words to predict to the left and right of the target word.    self.window_size = FLAGS.window_size    # The minimum number of word occurrences for it to be included in the    # vocabulary.    self.min_count = FLAGS.min_count    # Subsampling threshold for word occurrence.    self.subsample = FLAGS.subsample    # How often to print statistics.    self.statistics_interval = FLAGS.statistics_interval    # How often to write to the summary file (rounds up to the nearest    # statistics_interval).    self.summary_interval = FLAGS.summary_interval    # How often to write checkpoints (rounds up to the nearest statistics    # interval).    self.checkpoint_interval = FLAGS.checkpoint_interval    # Where to write out summaries.    self.save_path = FLAGS.save_path    # Eval options.    # The text file for eval.    self.eval_data = FLAGS.eval_dataclass Word2Vec(object):  """Word2Vec model (Skipgram)."""  def __init__(self, options, session):    self._options = options    self._session = session    self._word2id = {}    self._id2word = []    self.build_graph()    self.build_eval_graph()    self.save_vocab()    self._read_analogies()  def _read_analogies(self):    """Reads through the analogy question file.    Returns:      questions: a [n, 4] numpy array containing the analogy question's                 word ids.      questions_skipped: questions skipped due to unknown words.    """    questions = []    questions_skipped = 0    with open(self._options.eval_data, "rb") as analogy_f:      for line in analogy_f:        if line.startswith(b":"):  # Skip comments.          continue        words = line.strip().lower().split(b" ")        ids = [self._word2id.get(w.strip()) for w in words]        if None in ids or len(ids) != 4:          questions_skipped += 1        else:          questions.append(np.array(ids))    print("Eval analogy file: ", self._options.eval_data)    print("Questions: ", len(questions))    print("Skipped: ", questions_skipped)    self._analogy_questions = np.array(questions, dtype=np.int32)  def forward(self, examples, labels):    """Build the graph for the forward pass."""    opts = self._options    # Declare all variables we need.    # Embedding: [vocab_size, emb_dim]    init_width = 0.5 / opts.emb_dim    emb = tf.Variable(        tf.random_uniform(            [opts.vocab_size, opts.emb_dim], -init_width, init_width),        name="emb")    self._emb = emb    # Softmax weight: [vocab_size, emb_dim]. Transposed.    sm_w_t = tf.Variable(        tf.zeros([opts.vocab_size, opts.emb_dim]),        name="sm_w_t")    # Softmax bias: [emb_dim].    sm_b = tf.Variable(tf.zeros([opts.vocab_size]), name="sm_b")    # Global step: scalar, i.e., shape [].    self.global_step = tf.Variable(0, name="global_step")    # Nodes to compute the nce loss w/ candidate sampling.    labels_matrix = tf.reshape(        tf.cast(labels,                dtype=tf.int64),        [opts.batch_size, 1])    # Negative sampling.    sampled_ids, _, _ = (tf.nn.fixed_unigram_candidate_sampler(        true_classes=labels_matrix,        num_true=1,        num_sampled=opts.num_samples,        unique=True,        range_max=opts.vocab_size,        distortion=0.75,        unigrams=opts.vocab_counts.tolist()))    # Embeddings for examples: [batch_size, emb_dim]    example_emb = tf.nn.embedding_lookup(emb, examples)    # Weights for labels: [batch_size, emb_dim]    true_w = tf.nn.embedding_lookup(sm_w_t, labels)    # Biases for labels: [batch_size, 1]    true_b = tf.nn.embedding_lookup(sm_b, labels)    # Weights for sampled ids: [num_sampled, emb_dim]    sampled_w = tf.nn.embedding_lookup(sm_w_t, sampled_ids)    # Biases for sampled ids: [num_sampled, 1]    sampled_b = tf.nn.embedding_lookup(sm_b, sampled_ids)    # True logits: [batch_size, 1]    true_logits = tf.reduce_sum(tf.mul(example_emb, true_w), 1) + true_b    # Sampled logits: [batch_size, num_sampled]    # We replicate sampled noise lables for all examples in the batch    # using the matmul.    sampled_b_vec = tf.reshape(sampled_b, [opts.num_samples])    sampled_logits = tf.matmul(example_emb,                               sampled_w,                               transpose_b=True) + sampled_b_vec    return true_logits, sampled_logits  def nce_loss(self, true_logits, sampled_logits):    """Build the graph for the NCE loss."""    # cross-entropy(logits, labels)    opts = self._options    true_xent = tf.nn.sigmoid_cross_entropy_with_logits(        true_logits, tf.ones_like(true_logits))    sampled_xent = tf.nn.sigmoid_cross_entropy_with_logits(        sampled_logits, tf.zeros_like(sampled_logits))    # NCE-loss is the sum of the true and noise (sampled words)    # contributions, averaged over the batch.    nce_loss_tensor = (tf.reduce_sum(true_xent) +                       tf.reduce_sum(sampled_xent)) / opts.batch_size    return nce_loss_tensor  def optimize(self, loss):    """Build the graph to optimize the loss function."""    # Optimizer nodes.    # Linear learning rate decay.    opts = self._options    words_to_train = float(opts.words_per_epoch * opts.epochs_to_train)    lr = opts.learning_rate * tf.maximum(        0.0001, 1.0 - tf.cast(self._words, tf.float32) / words_to_train)    self._lr = lr    optimizer = tf.train.GradientDescentOptimizer(lr)    train = optimizer.minimize(loss,                               global_step=self.global_step,                               gate_gradients=optimizer.GATE_NONE)    self._train = train  def build_eval_graph(self):    """Build the eval graph."""    # Eval graph    # Each analogy task is to predict the 4th word (d) given three    # words: a, b, c.  E.g., a=italy, b=rome, c=france, we should    # predict d=paris.    # The eval feeds three vectors of word ids for a, b, c, each of    # which is of size N, where N is the number of analogies we want to    # evaluate in one batch.    analogy_a = tf.placeholder(dtype=tf.int32)  # [N]    analogy_b = tf.placeholder(dtype=tf.int32)  # [N]    analogy_c = tf.placeholder(dtype=tf.int32)  # [N]    # Normalized word embeddings of shape [vocab_size, emb_dim].    nemb = tf.nn.l2_normalize(self._emb, 1)    # Each row of a_emb, b_emb, c_emb is a word's embedding vector.    # They all have the shape [N, emb_dim]    a_emb = tf.gather(nemb, analogy_a)  # a's embs    b_emb = tf.gather(nemb, analogy_b)  # b's embs    c_emb = tf.gather(nemb, analogy_c)  # c's embs    # We expect that d's embedding vectors on the unit hyper-sphere is    # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim].    target = c_emb + (b_emb - a_emb)    # Compute cosine distance between each pair of target and vocab.    # dist has shape [N, vocab_size].    dist = tf.matmul(target, nemb, transpose_b=True)    # For each question (row in dist), find the top 4 words.    _, pred_idx = tf.nn.top_k(dist, 4)    # Nodes for computing neighbors for a given word according to    # their cosine distance.    nearby_word = tf.placeholder(dtype=tf.int32)  # word id    nearby_emb = tf.gather(nemb, nearby_word)    nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True)    nearby_val, nearby_idx = tf.nn.top_k(nearby_dist,                                         min(1000, self._options.vocab_size))    # Nodes in the construct graph which are used by training and    # evaluation to run/feed/fetch.    self._analogy_a = analogy_a    self._analogy_b = analogy_b    self._analogy_c = analogy_c    self._analogy_pred_idx = pred_idx    self._nearby_word = nearby_word    self._nearby_val = nearby_val    self._nearby_idx = nearby_idx  def build_graph(self):    """Build the graph for the full model."""    opts = self._options    # The training data. A text file.    (words, counts, words_per_epoch, self._epoch, self._words, examples,     labels) = word2vec.skipgram(filename=opts.train_data,                                 batch_size=opts.batch_size,                                 window_size=opts.window_size,                                 min_count=opts.min_count,                                 subsample=opts.subsample)    (opts.vocab_words, opts.vocab_counts,     opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch])    opts.vocab_size = len(opts.vocab_words)    print("Data file: ", opts.train_data)    print("Vocab size: ", opts.vocab_size - 1, " + UNK")    print("Words per epoch: ", opts.words_per_epoch)    self._examples = examples    self._labels = labels    self._id2word = opts.vocab_words    for i, w in enumerate(self._id2word):      self._word2id[w] = i    true_logits, sampled_logits = self.forward(examples, labels)    loss = self.nce_loss(true_logits, sampled_logits)    tf.scalar_summary("NCE loss", loss)    self._loss = loss    self.optimize(loss)    # Properly initialize all variables.    tf.initialize_all_variables().run()    self.saver = tf.train.Saver()  def save_vocab(self):    """Save the vocabulary to a file so the model can be reloaded."""    opts = self._options    with open(os.path.join(opts.save_path, "vocab.txt"), "w") as f:      for i in xrange(opts.vocab_size):        f.write("%s %d\n" % (tf.compat.as_text(opts.vocab_words[i]),                             opts.vocab_counts[i]))  def _train_thread_body(self):    initial_epoch, = self._session.run([self._epoch])    while True:      _, epoch = self._session.run([self._train, self._epoch])      if epoch != initial_epoch:        break  def train(self):    """Train the model."""    opts = self._options    initial_epoch, initial_words = self._session.run([self._epoch, self._words])    summary_op = tf.merge_all_summaries()    summary_writer = tf.train.SummaryWriter(opts.save_path,                                            graph_def=self._session.graph_def)    workers = []    for _ in xrange(opts.concurrent_steps):      t = threading.Thread(target=self._train_thread_body)      t.start()      workers.append(t)    last_words, last_time, last_summary_time = initial_words, time.time(), 0    last_checkpoint_time = 0    while True:      time.sleep(opts.statistics_interval)  # Reports our progress once a while.      (epoch, step, loss, words, lr) = self._session.run(          [self._epoch, self.global_step, self._loss, self._words, self._lr])      now = time.time()      last_words, last_time, rate = words, now, (words - last_words) / (          now - last_time)      print("Epoch %4d Step %8d: lr = %5.3f loss = %6.2f words/sec = %8.0f\r" %            (epoch, step, lr, loss, rate), end="")      sys.stdout.flush()      if now - last_summary_time > opts.summary_interval:        summary_str = self._session.run(summary_op)        summary_writer.add_summary(summary_str, step)        last_summary_time = now      if now - last_checkpoint_time > opts.checkpoint_interval:        self.saver.save(self._session,                        opts.save_path + "model",                        global_step=step.astype(int))        last_checkpoint_time = now      if epoch != initial_epoch:        break    for t in workers:      t.join()    return epoch  def _predict(self, analogy):    """Predict the top 4 answers for analogy questions."""    idx, = self._session.run([self._analogy_pred_idx], {        self._analogy_a: analogy[:, 0],        self._analogy_b: analogy[:, 1],        self._analogy_c: analogy[:, 2]    })    return idx  def eval(self):    """Evaluate analogy questions and reports accuracy."""    # How many questions we get right at precision@1.    correct = 0    total = self._analogy_questions.shape[0]    start = 0    while start < total:      limit = start + 2500      sub = self._analogy_questions[start:limit, :]      idx = self._predict(sub)      start = limit      for question in xrange(sub.shape[0]):        for j in xrange(4):          if idx[question, j] == sub[question, 3]:            # Bingo! We predicted correctly. E.g., [italy, rome, france, paris].            correct += 1            break          elif idx[question, j] in sub[question, :3]:            # We need to skip words already in the question.            continue          else:            # The correct label is not the precision@1            break    print()    print("Eval %4d/%d accuracy = %4.1f%%" % (correct, total,                                              correct * 100.0 / total))  def analogy(self, w0, w1, w2):    """Predict word w3 as in w0:w1 vs w2:w3."""    wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]])    idx = self._predict(wid)    for c in [self._id2word[i] for i in idx[0, :]]:      if c not in [w0, w1, w2]:        return c    return "unknown"  def nearby(self, words, num=20):    """Prints out nearby words given a list of words."""    ids = np.array([self._word2id.get(x, 0) for x in words])    vals, idx = self._session.run(        [self._nearby_val, self._nearby_idx], {self._nearby_word: ids})    for i in xrange(len(words)):      print("\n%s\n=====================================" % (words[i]))      for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]):        print("%-20s %6.4f" % (self._id2word[neighbor], distance))def _start_shell(local_ns=None):  # An interactive shell is useful for debugging/development.  import IPython  user_ns = {}  if local_ns:    user_ns.update(local_ns)  user_ns.update(globals())  IPython.start_ipython(argv=[], user_ns=user_ns)def main(_):  """Train a word2vec model."""  if not FLAGS.train_data or not FLAGS.eval_data or not FLAGS.save_path:    print("--train_data --eval_data and --save_path must be specified.")    sys.exit(1)  opts = Options()  with tf.Graph().as_default(), tf.Session() as session:    model = Word2Vec(opts, session)    for _ in xrange(opts.epochs_to_train):      model.train()  # Process one epoch      model.eval()  # Eval analogies.    # Perform a final save.    model.saver.save(session,                     os.path.join(opts.save_path, "model.ckpt"),                     global_step=model.global_step)    if FLAGS.interactive:      # E.g.,      # [0]: model.analogy('france', 'paris', 'russia')      # [1]: model.nearby(['proton', 'elephant', 'maxwell'])      _start_shell(locals())if __name__ == "__main__":  tf.app.run()

curl http://mattmahoney.net/dc/text8.zip > text8.zipunzip text8.zipcurl https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip > source-archive.zipunzip -p source-archive.zip  word2vec/trunk/questions-words.txt > questions-words.txtrm text8.zip source-archive.zipTF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')g++ -std=c++11 -shared word2vec_ops.cc word2vec_kernels.cc -o word2vec_ops.so -fPIC -I $TF_INC -O2 -D_GLIBCXX_USE_CXX11_ABI=0 python word2vec_optimized.py \  --train_data=text8 \  --eval_data=questions-words.txt \  --save_path=./

Relevant Link:



6. 循环神经网络(RNN)、LSTM(Long-Short Term Memory, LSTM)

0x1: 语言模型

此教程将展示如何在高难度的语言模型中训练循环神经网络。该问题的目标是获得一个能确定语句概率的概率模型。为了做到这一点,通过之前已经给出的词语来预测后面的词语。我们将使用 PTB(Penn Tree Bank) 数据集,这是一种常用来衡量模型的基准,同时它比较小而且训练起来相对快速。

0x2: LSTM

模型的核心由一个 LSTM 单元组成,其可以在某时刻处理一个词语,以及计算语句可能的延续性的概率。网络的存储状态由一个零矢量初始化并在读取每一个词语后更新。而且,由于计算上的原因,我们将以 batch_size 为最小批量来处理数据。

lstm = rnn_cell.BasicLSTMCell(lstm_size)# 初始化 LSTM 存储状态.state = tf.zeros([batch_size, lstm.state_size])loss = 0.0for current_batch_of_words in words_in_dataset:    # 每次处理一批词语后更新状态值.    output, state = lstm(current_batch_of_words, state)    # LSTM 输出可用于产生下一个词语的预测    logits = tf.matmul(output, softmax_w) + softmax_b    probabilities = tf.nn.softmax(logits)    loss += loss_function(probabilities, target_words)

0x3: 截断反向传播

为使学习过程易于处理,通常的做法是将反向传播的梯度在(按时间)展开的步骤上照一个固定长度(num_steps)截断。 通过在一次迭代中的每个时刻上提供长度为 num_steps 的输入和每次迭代完成之后反向传导,这会很容易实现。

# 一次给定的迭代中的输入占位符.words = tf.placeholder(tf.int32, [batch_size, num_steps])lstm = rnn_cell.BasicLSTMCell(lstm_size)# 初始化 LSTM 存储状态.initial_state = state = tf.zeros([batch_size, lstm.state_size])for i in range(len(num_steps)):    # 每处理一批词语后更新状态值.    output, state = lstm(words[:, i], state)    # 其余的代码.    # ...final_state = state


# 一个 numpy 数组,保存每一批词语之后的 LSTM 状态.numpy_state = initial_state.eval()total_loss = 0.0for current_batch_of_words in words_in_dataset:    numpy_state, current_loss = session.run([final_state, loss],        # 通过上一次迭代结果初始化 LSTM 状态.        feed_dict={initial_state: numpy_state, words: current_batch_of_words})    total_loss += current_loss

0x4: 输入

在输入 LSTM 前,词语 ID 被嵌入到了一个密集的表示中(单词矢量表示可以在不同的单词之间建立关联性的依据)。这种方式允许模型高效地表示词语,也便于写代码

# embedding_matrix 张量的形状是: [vocabulary_size, embedding_size]word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)


0x5: 损失函数




0x6: 多个 LSTM 层堆叠

要想给模型更强的表达能力,可以添加多层 LSTM 来处理数据。第一层的输出作为第二层的输入,以此类推。
类 MultiRNNCell 可以无缝的将其实现

lstm = rnn_cell.BasicLSTMCell(lstm_size)stacked_lstm = rnn_cell.MultiRNNCell([lstm] * number_of_layers)initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)for i in range(len(num_steps)):    # 每次处理一批词语后更新状态值.    output, state = stacked_lstm(words[:, i], state)    # 其余的代码.    # ...final_state = state

0x7: 在GPU上编译并运行

wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgzpython ptb_word_lm.py --data_path=./simple-examples/data/ --alsologtostderr --model large

Relevant Link:



7. 用深度学习网络搭建一个聊天机器人

python udc_train.py --num_gpus=1python udc_test.py --model_dir=./data python udc_predict.py --model_dir=./data

import osimport timeimport itertoolsimport sysimport numpy as npimport tensorflow as tfimport udc_modelimport udc_hparamsimport udc_metricsimport udc_inputsfrom models.dual_encoder import dual_encoder_modelfrom models.helpers import load_vocabtf.flags.DEFINE_string("model_dir", None, "Directory to load model checkpoints from")tf.flags.DEFINE_string("vocab_processor_file", "./data/vocab_processor.bin", "Saved vocabulary processor file")FLAGS = tf.flags.FLAGSif not FLAGS.model_dir:  print("You must specify a model directory")  sys.exit(1)def tokenizer_fn(iterator):  return (x.split(" ") for x in iterator)# Load vocabularyvp = tf.contrib.learn.preprocessing.VocabularyProcessor.restore(  FLAGS.vocab_processor_file)# Load your own data hereINPUT_CONTEXT = "how old are you!"POTENTIAL_RESPONSES = ["fine, thanks", "twenty six yesrs old"]def get_features(context, utterance):  context_matrix = np.array(list(vp.transform([context])))  utterance_matrix = np.array(list(vp.transform([utterance])))  context_len = len(context.split(" "))  utterance_len = len(utterance.split(" "))  features = {    "context": tf.convert_to_tensor(context_matrix, dtype=tf.int64),    "context_len": tf.constant(context_len, shape=[1,1], dtype=tf.int64),    "utterance": tf.convert_to_tensor(utterance_matrix, dtype=tf.int64),    "utterance_len": tf.constant(utterance_len, shape=[1,1], dtype=tf.int64),  }  return features, Noneif __name__ == "__main__":  hparams = udc_hparams.create_hparams()  model_fn = udc_model.create_model_fn(hparams, model_impl=dual_encoder_model)  estimator = tf.contrib.learn.Estimator(model_fn=model_fn, model_dir=FLAGS.model_dir)  # Ugly hack, seems to be a bug in Tensorflow  # estimator.predict doesn't work without this line  estimator._targets_info = tf.contrib.learn.estimators.tensor_signature.TensorSignature(tf.constant(0, shape=[1,1]))  print("Context: {}".format(INPUT_CONTEXT))  for r in POTENTIAL_RESPONSES:    prob = estimator.predict(input_fn=lambda: get_features(INPUT_CONTEXT, r))    print("{}: {:g}".format(r, prob[0,0]))


INPUT_CONTEXT = "how old are you!"POTENTIAL_RESPONSES = ["fine, thanks", "twenty six yesrs old"]


Relevant Link:

http://naturali.io/deeplearning/chatbot/introduction/2016/04/28/chatbot-part1.html http://naturali.io/deeplearning/chatbot/introduction/2016/05/16/chatbot-part2.htmlhttps://arxiv.org/abs/1506.08909https://github.com/dennybritz/chatbot-retrieval
