[待读论文提交示例]Visualizing and understanding recurrent networks #1

Linusp · 2017-03-05T03:18:20Z

作者

Andrej Karpathy
Justin Johnson
Li Fei-Fei

发表时间

2015 年

摘要

Recurrent Neural Networks (RNNs), and specifically a variant with Long Short-Term Memory (LSTM), are enjoying renewed interest as a result of successful applications in a wide range of machine learning problems that involve sequential data. However, while LSTMs provide exceptional results in practice, the source of their performance and their limitations remain rather poorly understood. Using character-level language models as an interpretable testbed, we aim to bridge this gap by providing an analysis of their representations, predictions and error types. In particular, our experiments reveal the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Moreover, our comparative analysis with finite horizon n-gram models traces the source of the LSTM improvements to long-range structural dependencies. Finally, we provide analysis of the remaining errors and suggests areas for further study.

[论文笔记提交示例]

作者

Andrej Karpathy
Justin Johnson
Li Fei-Fei

观点

LSTM 在实践中表现出了非常好的结果，但我们对其性能的来源和限制理解地都还很不够
过去的一些分析都是靠最终测试集上的全局困惑度来评价 LSTM 的效果，并没有在「真实数据」上进行分析，也不够直观

模型/实验/结论

数据集:

托尔斯泰的《战争与和平》文本，共 3,258,246 字
Linux 内核代码，共 6,206,996 字

模型:

RNN，分别尝试层数为 1 层、2 层和 3 层，隐层大小分别尝试 64, 128, 256 和 512，共 12 个模型
LSTM，同 RNN
GRU，同 RNN

实验:

用上述模型在两个数据集上训练语言模型，最后在测试集上计算交叉熵误差，对比三类共 36 个模型之间的结果
对 LSTM/GRU 的 gate 输出分布做可视化分析。如下图所示，图中的小圆圈代表一个神经元，横轴表示该神经元 gate 值超过 0.9 的比例，纵轴是 gate 值小于 0.1 的比例

(图片可拖拽上传)

分析了 LSTM 在《战争与和平》文本上的错误类型

结论:

多个隐藏层模型比单个隐藏层模型的效果要好
LSTM 和 GRU 之间难分伯仲，但都显著好于 RNN
LSTM 表现出了对长程结构的记忆能力，如在处理被引号括起来的长文本时，对开头和结尾的引号有特殊的响应
在多层的 LSTM/GRU 中，高层的神经元都开始分化，会有一部分倾向于接收新信息，有一部分则倾向于记住旧的信息
GRU 的第一层几乎不怎么使用旧的信息，即使到高层后也更倾向于使用当前输入
LSTM 建模长程依赖的能力大大超过 ngram 模型，一个 11MB 的 LSTM 模型效果能略微超过一个 3GB 的 20-gram 模型
对本身明显包含结构的文本(如内核代码)进行建模，当序列长度在 10 以下时，LSTM 和 20-gram 模型的差异不大，但随着序列变长，两者之间的差距逐渐变大，在 Linux 内核代码上，LSTM 能记忆最大约 70 左右的距离
LSTM 在迭代训练的过程中，首先建模了短程依赖，然后在此基础上逐渐地学习到对长程依赖的建模能力，这也是 seq2seq 论文中提到的逆序源语言序列有效的原因，因为这让模型先开始建模短程依赖再去建模长程依赖
LSTM 并不能完全利用好最近的一些输入，在LSTM 的错误中，有 42% 是 0-9 阶的 ngram 模型能够正确预测的
相信像 Memory Networks 那样，如果能直接对序列中最近的历史进行 attention 操作，那么能够提高 RNNLM 的效果
增大数据集、无监督预训练能提高 LSTM 对罕见词的预测效果
增大模型大小，显著减小了 ngram 错误，但对其他类型的错误却没有明显的改善，这说明光是增大模型大小是不够的，可能需要设计更好、更新的结构

Linusp added a commit that referenced this issue Mar 12, 2017

Merge pull request #1 from swarma/master

98574ff

update

Linusp added P2.已被领取 P1.论文过审 P.论文 and removed P1.论文过审 P.论文 P2.已被领取 labels Mar 12, 2017

Linusp assigned Linusp and pimgeek and unassigned Linusp and pimgeek Mar 12, 2017

pimgeek self-assigned this Mar 12, 2017

pimgeek closed this as completed Mar 12, 2017

pimgeek reopened this Mar 12, 2017

Linusp self-assigned this Mar 12, 2017

Linusp added the P2.已被领取 label Mar 12, 2017

Linusp added the P3.笔记完成 label Mar 12, 2017

Linusp unassigned pimgeek Mar 12, 2017

Linusp added the P4.笔记过审 label Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[待读论文提交示例]Visualizing and understanding recurrent networks #1

[待读论文提交示例]Visualizing and understanding recurrent networks #1

Linusp commented Mar 5, 2017 •

edited

Loading

Linusp commented Mar 5, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

Linusp commented Mar 12, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

Linusp commented Mar 12, 2017 •

edited

Loading

[待读论文提交示例]Visualizing and understanding recurrent networks #1

[待读论文提交示例]Visualizing and understanding recurrent networks #1

Comments

Linusp commented Mar 5, 2017 • edited Loading

作者

发表时间

摘要

推荐理由

Linusp commented Mar 5, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

Linusp commented Mar 12, 2017

pimgeek commented Mar 12, 2017

Linusp commented Mar 12, 2017

Linusp commented Mar 12, 2017 • edited Loading

[论文笔记提交示例]

作者

观点

模型/实验/结论

Linusp commented Mar 5, 2017 •

edited

Loading

Linusp commented Mar 12, 2017 •

edited

Loading