deep_vocab.txt

[PAD]
[EOS]
[UNK]
[CLS]
[SEP]
[MASK]
the
of
to
a
is
and
in
that
x
we
for
with
as
1
are
can
be
The
by
learning
##s
on
model
an
this
In
from
not
it
function
or
2
training
y
A
have
This
distribution
i
h
each
example
p
network
models
al
networks
et
one
CHAPTER
0
but
gradient
input
neural
deep
use
We
5
set
at
θ
has
##ing
other
these
some
more
data
algorithm
f
all
t
which
probability
only
than
many
3
output
may
using
such
units
diﬀerent
For
then
J
v
time
used
P
variables
machine
log
will
linear
also
parameters
where
when
over
matrix
between
6
error
10
if
hidden
same
layer
value
so
how
##d
L
##ed
4
very
because
number
examples
w
values
s
its
two
b
7
based
20
8
n
k
algorithms
point
g
##y
cost
optimization
##e
given
most
approach
graph
large
D
vector
do
representation
functions
case
problem
m
any
W
If
possible
learn
j
small
way
section
Y
step
##ly
9
into
G
E
must
variable
z
image
DEEP
S
figure
unit
make
It
H
space
features
see
c
M
they
R
inference
high
train
Bengio
weights
single
no
back
12
q
does
task
X
mean
likelihood
layers
usually
first
there
well
sampling
order
described
random
When
zero
size
propagation
information
One
convolutional
C
often
Figure
Learning
would
both
To
18
samples
##n
sequence
been
##2
parameter
Hinton
points
their
simple
distributions
weight
our
descent
2015
2014
need
about
was
test
form
called
V
Boltzmann
recurrent
local
rather
computational
while
These
However
14
##1
conditional
tasks
19
even
trained
structure
Neural
d
applied
As
inputs
should
language
estimate
16
through
state
term
much
useful
means
##t
unsupervised
process
could
variance
negative
T
supervised
new
just
compute
K
able
I
non
11
like
2011
respect
regularization
general
##es
being
equation
feature
Deep
2013
words
diﬃcult
B
perform
Gaussian
important
defined
obtain
every
15
convolution
u
terms
O
##3
##r
sparse
maximum
α
sample
pages
another
us
operation
methods
dimensional
chain
regression
performance
images
##l
dataset
corresponding
##7
second
generative
chapter
true
real
system
manifold
represent
recognition
17
were
direction
them
latent
fixed
factors
autoencoder
##5
##4
##0
whether
context
N
An
##ng
probabilistic
idea
FOR
specific
loss
Markov
##6
##er
##a
objective
method
2010
13
positive
cases
##on
##al
capacity
typically
problems
known
generalization
##9
steps
several
prior
graphical
µ
update
thus
solution
norm
common
simply
requires
memory
machines
##i
what
undirected
representations
Many
rate
binary
lower
arXiv
Z
now
kind
describe
long
less
kernel
energy
approximate
2012
without
derivatives
applications
noise
low
approximation
word
solve
Because
want
U
RBM
2009
##8
stochastic
feedforward
outputs
directed
σ
better
view
still
multiple
learned
allows
results
pretraining
pooling
estimator
vectors
provide
design
softmax
procedure
field
associated
RNN
require
region
e
change
NIPS
regions
modeling
derivative
class
##o
##k
similar
pmodel
part
minimum
might
best
autoencoders
See
gradients
connections
amount
Another
search
product
partition
enough
choose
work
eﬀect
approaches
τ
λ
find
discrete
##h
standard
define
choice
via
out
optimal
object
distributed
connected
##x
provides
observed
nonlinear
hi
far
natural
factor
encoder
element
xi
good
techniques
practice
hyperparameters
entire
computing
computation
Conference
properties
correct
exp
covariance
Hessian
phase
operations
end
bias
think
take
seen
max
decay
continuous
know
classification
visible
up
makes
due
directly
belief
Each
x1
particular
kinds
independent
computer
code
classifier
Instead
variational
right
processing
original
l
Some
IEEE
##ion
##g
within
squared
speech
per
Machine
yˆ
valued
three
result
research
φ
whose
sense
rule
matching
human
designed
activation
across
##p
strategy
statistical
include
forward
actually
Q
family
entropy
corresponds
There
you
underlying
those
introduced
estimation
after
too
target
pixels
penalty
elements
average
always
along
2008
scale
near
measure
knowledge
interactions
individual
book
analysis
allow
Most
##c
sigmoid
least
improve
hyperparameter
discussed
containing
become
F
2006
##m
predict
length
increase
illustrated
generalize
dropout
cannot
Goodfellow
AND
ω
wish
separate
joint
depth
density
bound
architecture
reduce
larger
initial
eﬃcient
directions
node
intractable
convex
constant
coding
International
representing
o
mapping
making
further
diagonal
β
your
ways
sum
reconstruction
present
edges
decoder
constraint
numbers
level
equal
course
Processing
Proceedings
MACHINE
LEARNING
##w
##j
systems
setting
expected
exactly
close
becomes
GENERATIVE
represented
probabilities
likely
left
itself
instead
drawn
determine
component
add
above
Bayesian
##b
under
theory
principle
necessary
either
early
complicated
brain
Specifically
Other
Networks
DBM
##ting
tangent
parametric
multiplication
minibatch
importance
higher
capture
adding
LeCun
ICML
translation
cross
computed
components
before
##an
scalar
next
L2
##th
restricted
related
path
move
matrices
during
chosen
behavior
apply
27
##en
transformation
top
strategies
prediction
performing
mixture
including
fact
ensemble
advantage
Suppose
FEEDFORWARD
wise
tree
states
shown
previous
nodes
minima
map
increases
few
extremely
criterion
contains
concepts
changes
avoid
arg
While
Press
22
##te
times
quadratic
pixel
generally
explicitly
datasets
basic
On
Information
though
posterior
obtained
multi
here
extra
editors
divergence
consists
application
24
##re
##ce
score
r
needed
generator
eigenvalues
consider
complete
TRAINING
OPTIMIZATION
At
##ve
##ble
##ation
validation
uses
sometimes
sharing
produce
precision
minimize
having
equations
correspond
cells
batch
assume
achieve
Systems
By
therefore
taking
saddle
line
easy
assumption
architectures
PCA
Monte
Gibbs
user
traditional
straightforward
stopping
smaller
show
initialization
generating
generate
definition
current
23
##le
write
valid
understand
subset
rectified
normalization
net
minimizing
equivalent
distance
dimension
complex
Unfortunately
Carlo
until
structured
relatively
refer
momentum
found
following
eﬀective
constraints
additional
1996
##z
##u
x2
version
together
remains
normal
magnitude
goal
free
entries
##st
##ne
visual
reason
provided
property
predictions
modern
made
involve
exponentially
denoising
appropriate
accuracy
2005
works
wide
variety
sub
sign
run
learns
fully
diﬀerence
describing
Salakhutdinov
SEQUENCE
RECURSIVE
RECURRENT
Newton
MODELING
Computer
26
##se
variation
shows
required
report
optimize
neurons
generated
evaluate
deviation
de
convergence
causes
active
Such
Rn
RBMs
2003
1992
##ies
ˆ
statistics
sequences
main
identity
highly
help
get
especially
dependencies
cell
around
already
Training
Technical
REGULARIZATION
Here
AI
2007
##E
##man
##ive
Ω
vision
table
objects
nets
nearest
logistic
grid
graphs
expensive
draw
construct
approximately
applying
They
1989
##S
##v
##ted
##ar
##able
square
remain
pdata
past
numerical
global
expectation
control
categories
almost
ability
So
Of
MIT
Linear
Graves
32
##D
##C
##f
tensor
spatial
name
modes
invariant
increasing
groups
greater
easier
down
deterministic
contain
available
appear
More
From
CONVOLUTIONAL
21
2001
##tion
strong
specialized
simpler
sentence
sampled
later
eigenvalue
diﬃculty
biases
axis
MLP
Computation
Chapter
Advances
##A
##or
##in
variations
specify
solving
software
simultaneously
significantly
significant
power
orthogonal
locally
interesting
implement
gives
faster
exponential
estimated
dimensionality
desired
depend
degree
decision
coordinate
choosing
Using
Sutskever
Right
Let
Intelligence
##I
transformations
showed
shared
share
running
relevant
re
ratio
practical
powerful
nearly
mass
labeled
label
introduce
ideas
fit
eﬃciently
estimates
developed
cause
build
begin
Sometimes
Jacobian
Convolutional
##W
##T
##us
##ic
yet
understanding
theoretical
th
special
regularized
read
proposed
mathematical
key
inverse
forms
fewer
exact
diﬀerentiable
defining
decrease
computationally
commonly
columns
Unsupervised
Require
Left
40
28
25
2014a
2000
##ri
##nt
##ma
who
volume
support
somewhat
shallow
range
parts
longer
list
implementation
generalized
empirical
embeddings
edge
domain
dimensions
classes
allowed
activations
With
Since
Models
Ex
##M
##L
##ity
width
uniform
total
suppose
randomly
parametrized
observe
missing
maximizing
last
involves
h2
functional
expression
evaluating
contrast
concept
computations
Tr
RNNs
LSTM
Gradient
30
##é
##ra
##ment
##la
##ally
world
wi
solutions
sets
return
relationship
rapidly
popular
pass
finite
finding
detail
contrastive
call
assign
artificial
arbitrary
applicable
alternative
adversarial
Typically
Optimization
MCMC
First
47
39
1995
1994
1993
##R
##hi
updates
unique
stage
singular
role
presented
preprint
position
performs
overfitting
operator
learner
indicate
greedy
full
easily
detect
curvature
basis
Output
Journal
50
##rd
##ck
σ2
yield
targets
subject
specified
series
separated
scaling
say
own
min
marginal
locations
limited
impose
defines
deeper
assumptions
argument
Vincent
University
Stochastic
Probability
Le
LINEAR
Di
Cho
Bernoulli
Artificial
37
35
1998
##ct
δ
various
unnormalized
throughout
store
starting
resulting
parallel
mostly
mode
meaning
location
initialize
final
expect
eigenvectors
decomposition
constrained
comparison
column
benefit
auto
addition
Transactions
SGD
Larochelle
Example
Even
Cambridge
ACM
46
36
1991
##ta
##ry
##na
##ian
##ent
##di
transfer
takes
situation
since
row
roughly
regularizer
pre
once
maximize
loop
interest
interact
independently
includes
hand
guaranteed
framework
follows
experiments
exist
earlier
description
depends
curve
critical
challenge
beyond
added
STRUCTURED
REPRESENTATION
Pascanu
PROBABILISTIC
Mnih
MODELS
MAP
L1
100
##N
##ring
##nce
##ers
##ch
yields
unbiased
technique
suﬃcient
successfully
side
reduction
red
recognize
reasons
observation
never
needs
influence
indicates
implementations
hardware
hard
happens
h1
face
explain
everywhere
embedding
drawing
direct
contractive
condition
come
clear
cat
Recurrent
Proc
KL
Ha
CD
Both
Ba
33
29
2002
1999
##B
##ˆ
##ual
##ro
##ll
variants
successful
seem
reduced
pseudolikelihood
person
occur
moment
give
force
extreme
estimating
determining
describes
chains
arises
THE
Ng
ICA
GPU
Collobert
Bottou
Algorithm
2014b
##nd
##ding
whole
types
turn
transition
thought
suﬃciently
represents
program
parametrization
neuron
neighbor
mechanism
gram
eﬃciency
extended
encode
discuss
determines
detection
denote
correctly
content
consisting
considered
conjugate
certain
bi
attention
Usually
Su
Finally
Any
60
44
##ton
##is
##ine
Λ
writing
universal
type
tractable
start
spaces
review
researchers
relationships
reduces
put
providing
progress
plot
paths
oﬀ
occurs
neighboring
maps
major
interpreted
hj
group
gap
estimators
develop
decreases
computes
compared
coeﬃcient
central
category
among
absolute
Wi
Var
Recall
Multi
Ko
Generative
Convolution
Bob
All
53
41
1997
##P
##op
##ke
##et
##ea
##de
##ba
weighted
third
study
sparsity
self
repeatedly
refers
propagate
predicting
place
pair
p0
nonlinearity
multiplying
moves
minimization
members
manifolds
little
issue
involving
integral
infinitely
implicitly
explicit
diag
development
detector
continue
conditions
combined
combination
co
clique
building
bottom
block
appears
address
Ranzato
PROBABILITY
INFORMATION
DBN
DBMs
Consider
62
##up
##ti
##ous
##ness
##ling
##ga
##der
velocity
typical
transpose
tells
symbol
slow
short
semi
scales
reinforcement
recent
reasonable
reach
proportional
produces
neighbors
naive
lies
lie
kernels
interval
intermediate
intelligence
implies
history
heuristic
gain
future
formed
feedback
except
essentially
eigenvector
done
diﬀerences
derive
criteria
complexity
collection
calculus
below
attempt
algebra
again
Speech
Science
Schmidhuber
Recognition
Re
PARTITION
MNIST
Likewise
Li
Hi
DKL
CONFRONTING
Bayes
After
67
65
56
##ure
##as
##ad
x3
written
video
uncertainty
tion
theorem
success
structures
specifies
si
seems
runtime
rules
rates
produced
principles
poor
pattern
particularly
naturally
maxout
lack
iterative
integer
included
imagine
illustration
hypothesis
half
grad
flat
conditioning
clustering
approximations
answer
advantages
action
account
Second
Rather
MA
Like
Do
Back
Analysis
55
2004
1990
##um
##ty
##red
##he
##bi
zi
wave
viewed
vi
transform
taken
stride
shrink
shape
sentences
quite
primary
partial
parents
p1
others
moving
me
logarithm
iterations
interested
intended
had
grow
formal
focus
fine
feasible
extract
dependent
coordinates
converge
connection
con
completely
community
cluster
aﬀect
automatically
arguments
allowing
accurate
What
Statistics
Ka
Courville
64
54
51
45
38
31
##za
##ur
##ot
##no
##her
##ge
##cal
##at
##age
η
Σ
vocabulary
underfitting
trick
text
tell
smooth
slab
saturate
reducing
rare
quantity
probably
pro
particle
observing
normalized
motivated
merely
linearly
latter
la
internal
improved
go
gate
four
formally
finishing
fields
event
errors
distinguish
contribution
contexts
conditioned
broad
aﬃne
away
area
actual
access
Workshop
Rumelhart
Po
Pattern
Pa
How
Distributed
De
Computing
Compute
Chen
Ar
APPROXIMATE
43
1986
##G
##ue
##tive
##ter
##ol
##lo
##ho
##ha
##ate
##ance
xt
treat
transcription
towards
toward
today
threshold
t1
speed
source
solved
settings
reward
respectively
relation
quickly
question
products
primarily
performed
ones
ni
metric
match
labels
jointly
involved
interaction
infer
index
improvement
immediately
guarantees
guarantee
fail
extend
experience
evaluated
dis
did
denoted
definite
deal
converges
configurations
conditionally
biased
augmentation
arbitrarily
applies
aim
according
accomplish
Szegedy
Sparse
SML
Probabilistic
PRACTICAL
Our
Note
Lo
Lee
Large
III
Eh
Conditional
CAE
Bergstra
52
42
2013b
2011a
1980s
##ﬀ
##ig
##est
##el
##ant
θJ
years
ti
surface
suggests
slightly
sizes
semantic
scenario
saw
salient
returns
resolve
regardless
pˆdata
principal
previously
prevent
practitioners
possibly
poorly
polynomial
plane
people
patient
oﬀer
neighborhood
multiply
interpret
independences
happen
guide
generation
fast
factorial
evaluation
entry
encoding
derived
depending
create
consistent
closed
channels
changing
cars
biological
begins
beginning
abstract
Xi
That
Ta
Signal
Sample
Report
Random
Ra
Parameter
Neal
Na
La
Krizhevsky
Fortunately
Dropout
Alice
AIS
57
49
150
##O
##ving
##tor
##su
##ss
##rs
##ping
##ized
##it
##ions
##ces
##ated
why
walk
unlabeled
truly
tend
temporal
tanh
ta
symbols
studied
stack
spike
skip
signal
separately
se
science
rows
rest
regularize
regressive
reasonably
realistic
purpose
patterns
origin
op
omit
obvious
observations
notation
neuroscience
necessarily
mix
manually
look
let
largest
interpretation
inter
inspired
initialized
ing
illustrates
hot
gray
goals
formulation
follow
exists
evidence
equivalently
encourages
dynamics
dot
doing
diﬃculties
discriminator
discover
designing
correlation
copy
comes
com
clearly
choices
care
besides
benchmark
asked
areas
achieved
accomplished
acceptable
Weston
Vision
Thus
Supervised
Structured
Springer
Sa
No
MP
Jaeger
Hyvärinen
EM
Computational
Co
CVPR
70
61
48
000
##zing
##xi
##ver
##ul
##ts
##ties
##ten
##rc
##pt
##pe
##os
##ning
##il
##cing
##ci
##be
##au
##ating
ρ
θm
widely
versions
upper
understood
un
tools
temperature
subsets
strongly
sta
specifically
situations
simplest
similarity
signals
shot
requiring
repeated
relations
recover
project
procedures
presence
precisely
popularity
perfectly
parametrize
option
mutually
multilayer
measures
measured
mcRBM
infinitesimal
infinite
increased
generic
generates
fundamental
fr
formula
forget
experts
expectations
events
encounter
decisions
conclude
compare
clipping
classifiers
channel
challenges
center
brightness
backprop
arrive
advanced
Wh
Welling
V1
Turing
Taylor
Srivastava
Set
SFA
Ro
Research
Part
Next
Kavukcuoglu
Ja
ICLR
Go
Factor
Examples
Early
Dirac
Denoising
Dataset
Data
Ca
Bi
Ax
AISTATS
95
59
58
34
01
##ut
##und
##to
##pa
##ou
##mic
##li
##ka
##fi
##ey
##ess
##ence
λi
αi
working
tuning
trying
trajectory
terminology
tends
t0
sure
su
strength
sort
simplify
showing
scope
root
risk
remove
regularizing
raw
rarely
push
purely
predicts
perspective
operate
nor
modification
mixing
levels
keep
iteratively
issues
intuition
indicating
identify
hierarchy
helps
greatly
great
gradually
game
frequency
forces
flow
facts
equilibrium
entirely
en
dx
diﬀerentiation
des
demonstrated
constructing
connectivity
confidence
clean
base
averaging
art
arrow
array
arise
adaptive
Wa
Vapnik
Ti
Theory
Tangent
Sampling
Now
Nesterov
NUMERICAL
NADE
Me
Ma
Language
Just
Is
Inc
ImageNet
Gu
Glorot
Gabor
Formally
Feedforward
Fa
Du
Center
Bo
BFGS
66
2013a
194
##ze
##ying
##we
##ward
##tes
##son
##ric
##ria
##ni
##nal
##me
##ld
##cke
##bility
##ations
θ0
updated
unless
uni
unfolded
technology
symbolic
summarize
storing
specifying
softplus
slope
sleep
simplified
sections
scheme
robust
rich
refine
recurrence
reconstruct
recently
reading
rank
ra
quality
projection
predictor
predicted
permission
oﬀset
originally
ordering
nonlinearities
multinoulli
minimal
mind
limit
introducing
instance
indirectly
independence
grows
geometric
frequently
eﬀort
exploit
experiment
etc
entities
dynamic
domains
discussion
di
determined
derivation
delay
decompose
cover
controlled
connecting
characters
causal
car
capable
break
br
bprop
bo
began
baseline
bagging
assigns
arrows
arithmetic
ahead
adapt
acoustic
Xdd
Williams
Wang
Unlike
Under
Trans
Top
Simard
Schölkopf
Saul
Rifai
ReLU
Per
Murray
Mohamed
Manifold
Lin
Kingma
Initialize
Inference
Ho
GCN
Dauphin
Dahl
Coates
Al
94
1987
00
##V
##ther
##sing
##q
##pi
##ns
##mi
##les
##go
##co
##che
##ap
αI
xj
whenever
vanish
users
trace
tool
suggest
span
sophisticated
soft
she
shaped
sensitive
selected
seek
satisfy
s1
rise
resources
relative
recursive
recommend
receive
preprocessing
preference
player
play
pieces
phrase
photo
normalizing
normalize
nearby
motivates
mechanisms
maximal
mask
maintain
ma
item
introduces
ignore
hope
holds
guess
going
followed
focused
eye
execute
ensure
encourage
eigendecomposition
dt
driven
downhill
diﬀer
divided
disadvantage
details
curse
cu
core
constructed
constrain
consist
configuration
combining
combine
cliﬀ
ca
bridge
beliefs
behind
backward
attempts
assuming
approximating
analyze
aligned
ad
XOR
Vinyals
Today
Society
Several
Rm
Regularization
Model
Mikolov
Mean
Mar
Long
Gregor
Given
GPUs
Euclidean
Equation
Em
El
Deng
Da
CPU
Bordes
Bar
Autoencoders
Although
Adam
79
77
71
1988
##NN
##ys
##yan
##ya
##use
##un
##tions
##tic
##sen
##ron
##rit
##ram
##ono
##min
##lle
##ler
##ie
##ide
##gh
##ful
##dy
##du
##cy
##cu
##ca
yielding
wrong
visit
undefined
teacher
summation
succeed
stored
stationary
ssRBM
specification
slowly
simplicity
select
respond
res
repeat
recall
pseudoinverse
proposal
programming
processes
preserve
perceptron
penalties
pa
overfit
multivariate
motivation
modify
middle
metrics
met
measurements
m1
languages
iteration
inversion
insight
ina
implicit
implemented
ideal
however
handle
generalizes
fa
express
explore
ex
eventually
encodes
division
distinct
default
database
da
cycle
cos
convert
convenient
controls
consistently
composition
composed
closely
carefully
captured
black
bits
automatic
Yu
Wj
Without
View
Vi
Uni
Though
Sha
Sejnowski
Se
Roweis
Ri
Ph
Ni
New
Ne
Natural
Moreover
Modern
Modeling
Mo
Min
Mi
Maximum
MSE
MONTE
Local
Lagrangian
KKT
Inter
Initial
Greedy
Gi
Ghahramani
GANs
Empirical
During
CARLO
Batch
Applications
Algorithms
Ai
99
76
69
68
2011b
2007a
##X
##U
##wi
##tter
##ski
##si
##sc
##sa
##rt
##ors
##ong
##nti
##ner
##mp
##lt
##ley
##ks
##izing
##ize
##ization
##ir
##ia
##fe
ζ
zeros
x4
wa
w2
vary
usual
unlikely
tri
trees
trains
track
symmetric
switch
super
summary
similarly
sigmoidal
replace
reflect
reached
potential
pose
places
perhaps
overcome
overall
optimizing
open
nonzero
names
minimizes
minibatches
measuring
matter
literature
leaky
lead
initializing
infinity
indices
incorrect
improvements
highlight
halt
giving
gen
filters
feed
families
faces
explored
exploding
expansion
existing
drawback
dog
dividing
diagram
decreasing
corrupted
continuation
confront
computers
competition
color
closer
chapters
cascade
capturing
bounded
blocks
axes
audio
asymptotically
associate
ancestral
Zhang
Z1
Warde
Their
Street
Socher
Schwenk
Recursive
RMSProp
Or
Ob
Non
Noise
Net
Mathematics
Mathematical
Man
Laplace
Independent
Indeed
Huang
He
Gar
Fo
Field
Eﬃcient
Erhan
Ep
Directed
Desjardins
Con
Comp
Col
Chain
Bishop
Bel
Approximate
Ad
96
93
63
2007b
1990s
1985
1984
1960
1412
##K
##ö
##yer
##ura
##ume
##ug
##sta
##sh
##ser
##ru
##rate
##one
##nta
##ms
##mann
##lu
##led
##lar
##lan
##king
##if
##ici
##ial
##ght
##ew
##enti
##ely
##ell
##ect
##da
##bo
##ari
##ali
##act
θA
xηn
xf
variant
unlike
una
tra
thousands
template
t2
subsequent
steepest
sp
something
sick
separation
selection
scene
satisfied
said
resist
resembles
requirement
replacing
ren
remaining
regarded
rectifier
receptive
reasoning
py
prop
presentation
po
pnoise
pairs
padding
outcome
otherwise
ordinary
noisy
nh
neither
needing
moments
message
member
meaningful
light
leverage
labeling
introduction
intensity
integrate
ind
incur
improving
impact
hˆi
horizontal
helpful
gated
floating
familiar
fails
extractor
expressions
expresses
expressed
engineering
emit
el
disease
directional
developing
decades
coverage
copies
constants
consequence
connectionist
compression
comparing
collected
codes
cla
ci
characterize
changed
boundary
believed
backwards
avoids
assumed
assigned
approximated
angle
analogous
amounts
alter
alone
ag
adapted
acts
Zi
You
York
Two
Twenty
Traditional
Toronto
TR
Ste
Statistical
Sp
Sontag
Simple
Si
Scale
SVD
Ru
Later
Kr
Ki
Jarrett
Hochreiter
HMM
Graphical
Global
Fei
Farley
FACTOR
Dayan
DBNs
Cou
Car
Bu
Bou
Bottom
Bl
Be
Bahdanau
Auto
Alain
Adaptive
Ac
78
1983
1980
##á
##zation
##yn
##work
##ult
##tri
##tial
##tan
##ster
##ssi
##rn
##rm
##res
##rel
##que
##ow
##onal
##og
##od
##lin
##lem
##ite
##ina
##im
##ied
##ication
##ib
##gin
##gi
##ered
##cha
##ay
##ang
##ain
##ac
wider
w1
unknown
undo
unconstrained
uncertain
trucks
translations
tiny
tightly
subexpressions
stri
spectral
speaking
spe
smoothness
sized
simplifying
serve
scientific
sa
s3
s2
rotation
roommate
robot
revisit
reverse
restrict
reproduced
representational
reparametrization
regular
regard
referred
readily
reaching
purposes
propose
proceed
problematic
portion
pool
piecewise
phone
perturbations
perturbation
permit
pdecoder
organized
numerically
notion
ne
nature
movement
motion
modified
millions
mi
ment
mathematics
mPoT
lose
library
libraries
li
leaf
justified
jump
items
inv
intuitive
intelligent
inside
infeasible
impossible
hybrid
hold
hierarchical
her
green
gather
forced
fi
fe
fairly
factorization
eﬀectively
extent
extension
explode
explanatory
explaining
examine
em
drive
discriminative
discard
digital
digit
dependence
denominator
den
dd
counter
count
costly
corruption
contours
continues
considerably
confer
coeﬃcients
classify
cha
centered
causing
cats
captures
bu
bounds
boosting
billions
benefits
bar
attributes
aspects
ascent
argue
analytically
algorithmic
actions
abs
ab
Wu
Variational
Variance
Units
Then
Theano
Sur
Slow
Sequence
Score
Rob
Pu
Pro
Pre
Parallel
McClelland
Mc
Machines
Koller
Jaitly
Input
Images
Illustration
Gre
Frobenius
Exp
Diﬀerent
Dean
Classification
Carol
CIFAR
Ber
A1
82
74
195
110
##00
##ﬀe
##ym
##vi
##va
##ua
##tz
##tt
##tr
##tit
##tim
##tal
##sts
##sis
##rie
##qui
##po
##per
##ory
##ments
##mar
##lli
##line
##let
##lea
##ko
##ip
##int
##ight
##ient
##id
##gu
##gg
##ger
##form
##for
##erson
##enta
##ei
##ction
##con
##cl
##bu
##bin
##av
##ast
##ak
##ag
##ability
θt
θB
worst
won
win
wheel
waveform
wJ
varying
vanishing
unc
trial
transformed
transcribe
tions
theoretically
thank
tensors
symmetry
subtrain
subtracting
student
sto
statistic
squares
sound
slowness
simulate
sharp
sequential
separable
sensor
scr
saturates
safely
resolution
requirements
remember
reject
regime
recognizing
receives
readers
reader
race
quantities
quantitative
propagating
propagated
proceeds
proba
priori
potentially
policy
pjoint
pi
physical
perfect
participate
parallelism
outside
obtaining
neuroscientific
mon
modeled
mo
mixtures
mistakes
mis
mirror
lying
lowest
largely
invariance
ins
inject
improves
implementing
illustrate
ill
hy
health
graphics
grams
glasses
generalizations
gating
fx
frequentist
forming
format
forcing
flipping
fix
finish
filter
eﬀects
exploration
experiences
expand
excellent
ever
evaluations
establish
equally
environment
ensembles
echo
dynamically
du
dream
double
document
distribu
distant
disentangle
discovered
discarded
determinant
designer
decomposed
costs
copying
convention
conv
concerned
commercial
cold
coin
chance
careful
burden
built
bipartite
believe
became
background
ba
avoided
asymptotic
ask
architectural
anything
animals
algebraic
ai
advances
admit
addresses
adaptation
act
accurately
accumulation
Zero
Zemel
Xu
Wil
Why
Vanhoucke
Val
USA
Tucker
Semi
Restricted
Representation
Previously
Pooling
Pe
PCD
Otherwise
Olshausen
Often
Not
Nature
Multipl
Moore
Met
Memory
Martens
Marlin
Lu
Lagrange
Kernel
Ke
Importance
ICASSP
Hu
Han
Gaussians
GSNs
GMM
Fergus
Feature
Farabet
Every
En
Due
Den
Common
Ci
Challenge
CPUs
But
Bre
Br
Bias
Apply
Applied
Also
Adversarial
AdaGrad
AB
A2
92
80
2x
2n
201
197
196
193
130
##zi
##vic
##uti
##ture
##tain
##stic
##so
##sm
##rou
##ressi
##qua
##ption
##point
##pl
##ov
##ori
##ons
##om
##och
##mo
##mm
##mit
##lls
##less
##kin
##ker
##ject
##ively
##ict
##ice
##ical
##hic
##gas
##fying
##eri
##era
##ept
##ene
##ek
##do
##ctor
##chi
##ced
##bly
##aus
##ative
##are
λ1
βW
ˆs
xη
xiii
worse
weak
waste
vertical
van
valley
upward
unobserved
underdetermined
tuned
trivial
travers
translated
trans
trade
tiled
tied
thin
thesis
themselves
theme
textbook
testing
ten
synthesis
style
steep
specialization
sources
soon
smallest
sixt
shrinks
shrinking
shad
serious
send
selecting
scenarios
scaled
says
runs
rewrite
reso
resemble
repre
rep
rely
relate
rel
ref
radius
prominent
producing
pretrained
preferred
precise
photographs
phoneme
phenomenon
peak
parsing
pancake
oﬀers
opti
online
ob
nx
nd
multiplied
math
manipulat
limitations
life
lengths
le
late
kn
isolation
inner
inexpensive
indicated
indexed
inactive
identifying
identified
hundreds
ho
held
he
handwriting
gra
ga
fy
fitting
fall
explains
earliest
dynamical
dramatically
divide
det
despite
descend
depicted
demonstration
demonstrate
delta
def
decide
dark
cy
curves
currently
crucial
cropped
cortex
convolutions
contribute
contemporary
consum
consistency
considerations
confident
concrete
commutative
circuits
circuit
chi
card
came
branch
biologically
bidirectional
behaved
awa
authors
au
arranged
arising
accept
accelerated
accelerate
absent
VAE
Un
Sto
Spa
Sch
Saxe
San
Representations
Rep
Reinforcement
Rectified
Rec
Raiko
Prior
Point
Penrose
PSD
Over
Numerical
NP
NCE
Much
Mont
Mirza
Methods
Matrix
Mat
Leaky
Kuhn
Ku
Knowledge
Karush
Ju
Joint
Its
Intr
Inf
II
ID
High
Har
Google
Ge
Fukushima
Freitas
Frasconi
Explicit
Expectation
Dynamic
Dr
Don
Delalleau
DAE
Component
Bra
Biological
Bas
American
ASR
87
85
83
2π
255
2014d
2009a
1986a
1958
190
164
111
102
101
09
07
04
02
##F
##ﬃ
##θ
##Σ
##zen
##yne
##wr
##where
##wa
##vid
##var
##ulate
##ular
##uit
##ui
##tu
##tional
##tin
##the
##sting
##sp
##sky
##sit
##sion
##she
##ren
##ran
##rac
##ques
##pta
##ps
##pri
##pping
##pli
##pha
##ph
##pan
##otic
##ose
##ort
##ora
##omi
##oma
##ola
##of
##nn
##nie
##net
##nes
##ned
##ncy
##mping
##mou
##metr
##mbi
##matic
##logy
##lla
##las
##ky
##ki
##jo
##ja
##iv
##iu
##itt
##ith
##istic
##ins
##ili
##ier
##ible
##ht
##han
##graph
##gram
##ging
##ged
##fa
##eti
##equ
##end
##enc
##ement
##ebra
##cture
##ctive
##cie
##cia
##cept
##berg
##bb
##avi
##asi
##ard
##ail
##ade
##ably
µm
µ0
ye
y0
xη1
x0
worth
visualization
vice
versa
ve
v2
uphill
unusual
unregularized
unf
underfit
twice
tru
treats
topo
topic
tional
thread
tests
tempering
synthetic
syn
suﬀer
suit
successive
statements
stand
spurious
speaker
song
smoothing
sh
semantics
searches
schemes
saturation
rounding
reveal
response
removing
removed
relies
relay
rectification
record
recommendations
recommendation
questions
quantify
px
pushing
proven
proper
propagates
proof
price
pretrain
prefer
preced
positions
plant
piece
physics
phonemes
pen
peaks
patch
passes
overcomplete
outperform
outcomes
orthonormal
orient
odd
norms
negligible
nat
narrow
n2
mu
medical
maximization
maxima
mappings
lost
loops
looking
loc
link
limiting
letters
knowing
invest
invert
interc
integers
insights
inf
inequality
ine
individually
indirect
implied
im
identifi
identical
hyper
humans
height
heads
hashing
h3
guided
gi
gathering
frame
fraction
forth
fold
flexible
fl
extensions
expressive
explanation
expert
exclusively
exclusive
exclude
equivariant
entity
enforce
encompass
employed
emphasize
dropp
dramatic
dr
diﬀerential
display
discovery
disc
digits
deployed
dep
decaying
day
dates
creating
created
copied
convey
controlling
contributions
contest
consideration
connectionism
confus
conceptual
compr
compatible
collecti
colleague
colla
coa
clusters
cliques
classical
circle
characteristics
character
cartoon
cards
burn
broader
bright
briefly
brief
bit
behave
bandits
aﬀected
augment
attribute
ar
approximator
appl
apparent
app
apart
although
against
adver
adjusting
addressing
activity
accumulate
absence
Ya
Xw
Wri
Word
Wiskott
Vo
Visual
Ve
Université
Tre
Therefore
Term
Student
Str
Software
Smo
Small
Simon
Similarly
Siegelmann
Shannon
Sermanet
Section
Samples
SVM
Run
Reg
Reco
Ratio
REINFORCE
Pri
Poo
Polyak
Pn
Pen
Partition
Osindero
Once
Number
Norm
Nonlinear
Neuro
Netflix
Neocognitron
Mozer
Mon
Mesnil
Matrices
Mas
Maass
Logistic
Lipschitz
Likelihood
Latent
Lap
Kumar
Je
Increasing
Image
Ideally
Hyperparameter
Hidden
Hence
Helmholtz
Graph
Ga
GAN
Frey
Fe
Fast
Ext
Evaluating
Ev
Element
EBM
Discr
Dis
Dev
Determin
Cr
Convex
Contrastive
Complex
Cau
CA
Building
Boo
Berg
Bastien
Based
Am
Acoustics
98
73
72
2y
268
2013d
1998b
1975
152
1410
131
08
06
03
##Net
##10
##λ
##α
##è
##xes
##wl
##war
##vsky
##vie
##ves
##vent
##ven
##van
##utation
##ust
##tual
##triz
##tom
##tl
##tical
##tho
##str
##sted
##sse
##sle
##ska
##sci
##rv
##ruct
##rting
##rse
##rne
##rent
##read
##rb
##rain
##put
##ppr
##pp
##ove
##ount
##oti
##ote
##osi
##orm
##ord
##ont
##olo
##nst
##nne
##nea
##nds
##nde
##nc
##mc
##lus
##low
##lov
##llow
##lity
##lis
##liar
##lent
##lens
##kh
##ist
##isi
##ise
##ire
##inat
##ifying
##ifie
##hl
##guous
##gl
##gati
##fy
##far
##ever
##ev
##ets
##erati
##enberg
##em
##eli
##ela
##eg
##een
##ee
##edi
##ec
##dic
##denti
##den
##com
##cks
##cce
##cc
##cation
##bular
##bli
##ber
##base
##aw
##ato
##ary
##ane
##ancy
##ami
##ah
##action
##ack
π
ηk
βx
ˆµm
yielded
xn
worked
window
web
wake
visualizations
vertices
variances
va
v1
useless
updating
unstructured
und
tune
try
trend
transitions
topological
tolerance
token
tens
technologies
teach
survey
surrogate
surprising
suggested
subst
studying
struggle
stochasticity
stages
stable
stabilize
stability
squa
sparsely
solid
slowe
ske
sin
simulati
sepa
sentiment
semantically
segmentation
sc
satisf
s4
round
ro
rithm
richer
ri
rewrit
retrieval
retina
resulted
responsible
responses
responds
resembling
rescaling
representa
replicate
replacement
replaced
rendering
rendered
relating
regularizes
regarding
reference
recognized
really
reads
rea
ratings
quali
qua
pure
pu
prov
proportion
presumably
preferences
pointe
plotted
plays
plat
persistent
perceptrons
patches
partially
paralleliz
pac
overflow
outer
ordered
optimized
operators
onto
omitted
occupy
obstacle
observer
numerous
nice
mou
motivating
monitoring
modality
mitigate
mistake
minu
minimized
million
methodology
medium
matched
marginals
mar
manage
lunch
lo
lines
limitation
lim
leg
landscape
jo
intrinsic
intellectual
int
insuﬃcient
instructions
inspect
innovation
influences
induced
induce
induc
incorrectly
incorporated
incom
impractical
imposes
imposed
imply
imp
identically
hypothes
hl
historical
his
hints
highest
heuristically
heavily
harmful
harm
harder
guidance
greatest
goes
gates
frequencies
former
forever
fo
flows
flip
flexib
finds
falls
factorized
extends
exploitation
experimentally
experimental
exhibit
excess
essential
era
equ
enter
engineer
engine
empirically
emphasis
emits
easiest
ears
ea
dy
drag
downsampl
dominant
diﬀusion
diﬀers
distri
disciplines
devi
desirable
derives
denoise
demonstrates
defect
damag
curriculum
crop
credit
corners
converting
converged
conti
contained
consecutive
connect
concern
composing
comple
compact
comp
communication
cli
classi
child
checkerboard
cheaper
cheap
ch
ces
certainly
canonical
buﬀer
bution
bring
breaking
branching
bold
binarization
beings
bea
bases
bad
aﬃnity
aﬀects
atta
asymptotes
association
associati
assigning
arrang
argued
arc
approxi
anyw
answers
ambiguity
am
alternate
advi
adjust
adds
acyclic
activated
achieves
accom
XW
Wy
Wor
Wiki
Weiss
Verlag
VC
Underf
Ui
Tη
Transfer
Train
Tra
Touretzky
Throughout
Thir
Th
Tes
Teh
Tan
Symposium
Support
Sun
Strategies
Sta
Special
Sol
Sim
Sigmoid
Sh
Self
SI
Roux
Rev
Res
Regression
Real
Read
RAN
R2
QΛQ
Qu
Py
Princip
Prec
Practical
Positive
Pol
Platt
Pl
Pin
Perhaps
Pal
PaG
Op
Onl
Ol
Notes
Normal
Nor
Muller
Morgan
Montufar
Mixture
Mis
Memisevic
Math
Martin
Map
Manning
Mak
MSEt
Log
Littman
Lev
Les
Leib
Learned
Las
Lang
Keep
Jordan
Jia
Jacobs
Hy
Human
Hoﬀ
Historical
Hebb
Gülçehre
Ger
Generati
Generalization
Gal
Fl
Fin
Ez
Estimation
Estimating
Er
English
Energy
Ec
Dre
Dom
Distributions
Distribution
Die
Det
Design
Depth
Dependenc
Denk
Decision
David
DOG
Curriculum
Continuous
Conte
Constrained
Connectionist
Compared
Com
CoRR
Cli
Clearly
Class
Chervonenkis
Chan
CO
CNNs
Boureau
Bagging
Automatic
Applying
Alternately
AAAI
97
86
84
75
422
410
280
2013c
2010b
2010a
1x
198
1962
1506
1411
134
121
108
##ML
##LIN
##J
##GA
##BM
##99
##87
##73
##55
##51
##ﬃcien
##σ
##yu
##yt
##ye
##ychologi
##wh
##wan
##vo
##vit
##vin
##ved
##vea
##vati
##val
##uw
##ute
##urc
##ulation
##ula
##uct
##uat
##tton
##tten
##tro
##tral
##tou
##ths
##terna
##teen
##tch
##spe
##sms
##site
##sibilit
##shing
##ses
##semble
##sel
##scale
##rte
##rp
##rk
##rec
##pu
##pse
##posi
##pose
##polate
##pol
##plet
##pert
##pal
##ously
##oss
##ormal
##onnect
##ond
##olut
##oir
##ode
##ock
##nts
##nom
##nk
##nh
##nge
##nec
##nat
##nar
##mpora
##modal
##mental
##lve
##los
##lor
##lm
##lik
##lif
##lica
##land
##jun
##jection
##itsk
##ition
##ities
##ists
##isten
##iro
##ink
##ining
##ini
##infor
##iminati
##imat
##ima
##ily
##illa
##ility
##ile
##ik
##iev
##ief
##ics
##ica
##iate
##hu
##hre
##hood
##hm
##hen
##har
##hang
##gul
##gui
##gon
##gni
##gn
##ges
##gence
##fu
##ft
##flow
##evi
##ert
##ero
##erg
##epe
##eo
##ende
##elo
##elm
##ein
##ech
##dl
##dition
##dge
##cula
##crib
##composi
##complete
##clusi
##chas
##cate
##cally
##bs
##bri
##bra
##br
##bou
##borat
##bili
##bed
##back
##ates
##atch
##ara
##and
##am
##alize
##ality
##akov
##af
λ2
θML
ηkn
βt
αv
Λ0
Ç
µi
zo
zj
yi
xηi
xii
wind
white
weakness
warp
vs
vo
viewing
verif
varie
ut
unstable
unp
unnecessar
univers
univariate
uniformly
underflow
underestimate
unable
tw
tries
transmi
translate
traditionally
tradeoﬀ
tract
tight
tic
theor
textu
terminate
ter
tendency
temp
tech
team
te
tation
tall
tackle
sy
swapp
surrounding
surrounded
supervision
subtract
subtle
strai
str
storage
ste
statistically
stands
spi
sphere
spam
solves
shuﬄe
shifts
shifted
shift
severe
sequentially
semidefinite
secondary
scores
sche
scalable
saying
saturating
routin
rou
robustness
rigid
rev
reused
retr
restrictions
restriction
reset
repro
repe
reliably
reliable
recur
recovered
recording
reconstructions
recommended
reci
reali
ranging
raised
quintessential
query
quantifying
qi
pηj
pur
prot
pron
projects
programs
processed
prize
principled
prevents
press
preserves
preprocess
predictive
pred
practitioner
practically
pr
pos
pooled
plausible
planes
pla
pl
picking
pic
photograph
perspectives
period
percent
pathological
passing
paper
page
overs
overly
overlapping
overcoming
outw
optimum
opposite
opportunit
opera
old
occurring
occasionally
numeric
numera
nu
noticeabl
none
neuro
neg
music
multipliers
multiplications
multimodal
multidimensional
moraliz
modal
mization
mild
messages
memoriz
media
maxi
mation
material
mat
marginalizing
mapped
mann
mammalian
machinery
losing
loose
looks
logical
logic
linked
lingu
legitimate
led
leaves
laser
land
k2
justification
iz
iterate
isotropic
invertible
invented
invariances
intralayer
intervals
insta
inspiration
insignificant
injection
initially
infrastructure
inform
influen
inferring
industry
induces
incrementally
incremental
incon
inc
imposing
impl
imperfect
imm
ignoring
identifiability
hyperplane
hur
http
head
ha
grand
gr
gater
formalize
five
fits
fir
fil
feel
fake
fair
failure
factorizes
factorize
factored
extraction
expressing
expos
expense
exe
exclu
exceeds
exc
es
er
epochs
environments
encouraging
encouraged
encoded
enc
enables
enabl
employ
emitt
emerge
embedded
eliminate
electro
electrical
ele
eigen
diﬀerently
diﬀerentiate
distinguished
distinction
discre
detectors
desire
deli
decom
decoding
decode
deciding
dashed
cr
counts
correlated
cor
converted
contract
continually
conditionals
concerning
concentrated
concentrat
concatenat
compensate
compan
communi
combines
collect
claim
circu
chord
che
challenging
certainty
ce
categori
catastrophic
caption
capabilit
canyon
calculation
calculate
byte
blurred
blue
birds
bility
beneficial
belongs
bel
barrier
bag
averages
av
assignments
ass
arriving
approximators
anneal
ann
animal
ancestors
analytical
ampli
altogether
alternating
align
algo
advocate
additionally
activate
Zη
Zhou
Zaremba
Z0
Yang
Weinberger
Weight
Visualiz
Vis
Very
Vectors
Various
Variables
Varia
Van
Use
Up
Universal
Undirected
Tηn
Tur
Tri
Tor
Together
Time
Tieleman
Three
TensorFlow
Ten
Temp
Tech
Sym
Swersky
Sussillo
Sum
Standard
Squ
Spi
Soh
Ski
Sing
Shi
Ser
Sequential
Sciences
Sci
Schuurmans
Schuster
Scaling
Sainath
Russ
Rule
Return
Representational
Regularized
Rd
Rasmus
Ram
RBF
Qua
Pseudo
Propagation
Prop
Programm
Predictive
Pra
Pinto
Peters
Pensemble
Paris
Par
PDF
Overall
Object
Neuroscien
Near
Nair
Multilayer
Montreal
Mitch
Minim
Micros
Mercer
Mel
McC
Maximiz
Max
Marc
Mac
MLPs
Lind
Layer
Larger
Lamblin
Label
Kl
King
Kha
Karpathy
Jozefowicz
Iris
Ir
Institute
Indu
Inco
Imp
Il
Identity
Ian
IJC
Hornik
Hg
Her
Halle
Gro
Grid
Graf
Gra
Goodman
Generalized
Genera
Gated
GS
GRU
Functions
Function
Friedman
Free
Fre
Franc
Fr
Foundation
Fisher
Fi
Face
FP
Exponential
Exploration
Experiment
Encoder
Diﬀerentia
Dist
Devi
Dept
Depending
Density
Defin
Dar
Cyc
Cul
Critic
Covariance
Cov
Corre
Corrado
Coo
Contractive
Continuation
Context
Cont
Cons
Connection
Compari
Ciresan
Chung
Chr
Chi
Chellap
Che
Ce
Carr
Canad
Brown
Bro
Bri
Bio
Bil
Beyond
Bes
Berry
Becker
Basic
Baldi
Bac
BPTT
Audio
Assum
Architectur
Approach
App
Ant
Annual
Annal
Alternatively
Additionally
Accu
Abadi
88
81
674
626
529
528
477
401
319
256
226
2015b
2015a
2014c
1g
1976
184
160
1511
1502
146
1409
140
123
106
103
05
##SA
##H
##DA
##CCV
##38
##29
##Ω
##η
##í
##éné
##zab
##yso
##ysiolog
##xon
##xiv
##x1
##wer
##way
##wal
##viso
##visi
##viou
##vert
##vely
##vel
##uye
##uting
##ustr
##uss
##ury
##urie
##uri
##ung
##una
##uma
##ull
##uk
##ucin
##uch
##uc
##ubl
##tunat
##ttle
##tting
##tteri
##truct
##tron
##training
##train
##tn
##tifi
##through
##tend
##tation
##tat
##stru
##sti
##ste
##sso
##sov
##sly
##sk
##sive
##simil
##shi
##sed
##sb
##sar
##rz
##rri
##rra
##ront
##rol
##rog
##rning
##rica
##rg
##rding
##rau
##ration
##rat
##rap
##ral
##proc
##pro
##pre
##pon
##pled
##ple
##pire
##phon
##pedia
##part
##parame
##panc
##ound
##oty
##otta
##oth
##ost
##oral
##ood
##oning
##oni
##oms
##omin
##ome
##omaly
##oke
##ob
##nzi
##nu
##nto
##nter
##nko
##nke
##niz
##nit
##nica
##nic
##ngs
##netic
##nding
##nded
##nda
##nced
##nary
##mulat
##mu
##mplifies
##mot
##mor
##mon
##model
##mod
##mn
##miza
##mis
##mil
##metric
##met
##mer
##menta
##men
##mati
##mate
##maps
##mand
##m2
##lv
##lum
##lse
##ls
##lon
##lly
##ller
##liz
##lities
##lit
##listi
##limi
##lie
##lic
##lesc
##len
##lation
##late
##lad
##laci
##kel
##ju
##itiv
##iting
##iss
##ish
##isc
##irable
##ipl
##iona
##intuitive
##ined
##ind
##ination
##ime
##ill
##ila
##igi
##idat
##ich
##ibi
##iable
##hˆ
##hou
##hip
##hing
##heren
##ham
##gress
##gr
##goi
##gm
##ggi
##gge
##gene
##gate
##fil
##fied
##fic
##fer
##exp
##ete
##estimat
##esti
##esh
##erv
##erta
##erin
##eous
##ently
##eau
##eati
##eas
##duct
##dra
##dr
##dow
##dom
##dol
##direc
##des
##dd
##date
##dat
##dan
##cri
##cod
##ckstein
##cially
##chr
##cho
##chart
##cca
##by
##bug
##book
##bit
##ban
##atic
##asu
##ass
##ase
##ared
##ao
##ana
##amp
##als
##ais
##ai
##agat
##acta
##ace
##abi
λ3
γ
βy
αw
Π
ˆσ
µˆm
zeroes
xˆ
workload
windows
wea
wants
wanted
w4
voca
visually
visiting
vis
violate
versus
verb
variability
vac
unseen
uns
unrolled
unrel
unrea
unr
universally
unfolding
underl
uncorr
unclear
typeface
tur
tuples
tu
trou
tremendous
travel
translating
toss
tor
topics
thinking
things
tai
sˆn
sˆ
suﬃce
suﬀers
suppl
superior
summing
substitute
substan
subspace
subscript
subs
subroutines
stuck
stru
stronger
strictly
standardize
spoken
splitting
split
spheri
spends
speculat
species
sol
slo
slid
sl
simul
simplification
sigmoids
sides
shortl
shortest
shaping
sentation
sensitivity
sensibl
searching
scan
scalars
sca
saving
runners
rotat
robotics
retrieve
restricting
restor
resolved
reshape
reserv
rescale
resample
reproduce
reparametrize
rem
relational
regulari
redundant
redun
rectifying
rectangular
recovers
recommender
recom
recogni
recipe
receiving
realize
rapid
radia
qualitatively
pushes
pursuit
published
publication
pseudo
ps
proved
protocol
propositions
proposi
promising
promise
programm
probable
priors
print
prim
preventing
pres
prep
preferable
predictabl
post
possibility
poses
populariz
pol
pointing
plu
pleasing
plan
pin
philosoph
ph
pertur
perm
periodic
perce
pencoder
penalizing
penalizes
pe
payoﬀ
pay
pars
parent
parame
param
paradigm
para
overl
os
organizing
organization
organi
orders
optimizer
optic
operates
olde
occurre
occ
obtains
observa
ny
note
nose
nine
nerv
nb
named
multiplie
multipli
moti
monitor
modul
modifying
modifi
mixed
mitigated
mini
micro
mentioned
maximizes
massive
marginalize
manu
mani
man
majority
maintains
lu
losses
loosely
literally
linearit
limits
letter
leave
learnable
leads
leading
lea
laws
keeping
justify
judg
ir
inverting
intersecti
interpolati
international
integration
integrates
integrat
integra
inte
instruction
instruct
instances
insert
insensitive
inherently
inh
informal
influenced
infect
indexing
indeed
increasingly
incorporating
incorporates
incorpo
impr
implements
ic
i2
hˆj
hˆ
hyperbol
huge
hours
hk
hetero
helped
hear
hc
harmon
handwritten
h22
glimps
gla
getting
gets
generalizing
generality
fun
frontier
frameworks
formali
fore
fool
flu
flipped
fla
finally
filling
fift
fed
favor
failed
fac
extracted
extensive
exponentiati
exponent
explosion
exploiting
existence
exceed
examin
ev
escape
equiva
equality
ep
enumerat
ensures
enhance
ends
encountered
encoders
emb
elegant
editor
dro
draws
downward
dominated
diﬀerentia
divides
divers
diverge
discriminat
discretize
discovering
discourage
discarding
disambiguat
diminish
dev
detailed
descriptions
descending
deriv
der
depicti
depende
dense
degrees
deg
definitions
decreased
decorrelate
decl
dec
debugg
cus
cou
correspondence
correction
corre
corner
convolv
continuing
consumer
constitu
constantly
const
conse
cons
concentrates
conceiv
computa
comprehensive
compo
complicat
competing
compet
compelling
comparably
compara
commen
colors
collaborative
cogniti
clue
clip
classifying
cit
circums
chords
chooses
chess
characteri
centur
centroid
caused
cast
carry
captioning
cancel
burning
brought
brings
branches
bor
bod
blurrin
blur
bl
bj
bio
bilinear
bil
big
behaves
becoming
bandwidth
ball
balance
autoregres
audi
attract
attempting
asym
aspect
arti
arm
approximates
appealing
answering
annotate
analyzing
analogy
analog
amenable
ambitious
allowable
alignment
agen
advent
advan
additive
activ
acting
acquir
achieving
accelerating
acc
academi
absolutely
abilities
Zh
Zeiler
Yosinski
Yos
Wol
Wiesel
Wierstra
Widrow
White
Whe
Waibel
Ven
Vector
VAEs
Unless
Unfold
Unc
UAI
Typical
Tura
Tu
Traditionally
Trac
Torralba
Thrun
Thomas
Theoretical
Tensor
Technology
Teacher
Task
TIMIT
Subs
Stat
Spec
Spe
Spar
Sou
Softmax
Sno
Smola
Singular
Simp
Short
Sho
She
Shar
Series
Separation
Separa
Senior
Sel
Scho
Scha
Sca
Sat
Sad
SPN
Roy
Rosenblatt
Ros
Rom
Rezende
Revi
Relati
Regardless
Red
Receptive
Rao
Ran
Radford
Publi
Ps
Properties
Product
Prob
Prepr
Pitts
Pietra
Pie
Pi
Physics
Pho
Pha
Perform
Pearl
Parametri
P2
P1
Overfitting
Os
Ortho
Optim
Operation
Nowlan
Niyogi
NY
NLP
NL
Mü
Murphy
Multinoulli
Mori
Mor
Montréal
Moment
Mod
Mix
Minimizing
Med
Measure
Maxout
Matching
Mao
Maa
MR
Luo
Luk
London
Liu
Lecture
Laﬀerty
Lau
Langford
LI
LAPGAN
Krueger
Kiros
Kas
Kar
Kalchbrenner
Kai
Jud
Jon
Joh
Jo
Ji
Jel
Jain
JMLR
Ioﬀe
Inve
Invarian
Inters
Int
Initially
Independence
Improving
Impr
Imagine
IT
ILSVRC
Hubel
Hol
Hihi
Hierarchical
Hend
Hen
Hel
Hass
Hadamard
Gul
Greﬀ
Grand
Gr
Gori
GoogLeNet
Gold
Geo
Generally
Gat
Gan
Gam
Fur
Fu
French
Fred
Fra
Fle
Fisc
Exploit
Euro
Eu
Estimator
Ess
Equiva
Eig
Echo
Earlier
ESN
EMNLP
Dur
Doubl
Divi
Diag
Despite
Des
Del
Decode
Dc
Danihelka
DC
Cybe
Cx
Currently
Curran
Cru
Cramér
Cox
Cover
Cortes
Conver
Contrast
Const
Conf
Compare
Comm
Cognitive
Cogniti
Clu
Circuits
Choro
Chin
Char
Chapelle
Challenges
Ch
California
Calcu
Burge
Bulletin
Brid
Brad
Bourlard
Bos
Bornschein
Blunsom
Blum
Berke
Bennett
Bei
Beh
Before
Av
Associates
Andr
And
Analy
Among
Almost
Alg
Alex
Ag
Adding
Ack
Acco
ABC
AA
91
90
689
683
666
570
545
525
513
507
492
476
474
427
372
364
328
2vh
2v
269
202
2012c
2012a
2011c
2000b
1k
1995a
1964
1940s
191
1900
169
166
151
1503
145
141
1406
139
135
132
125
120
119
112
104
##Y
##WA
##TL
##SR
##SI
##RA
##PC
##LT
##EC
##AP
##AM
##85
##80
##71
##58
##49
##31
##26
##1w1
##12
##0s
##Λ
##č
##ú
##å
##à
##Q
##Φ
##š
##ğ
##ü
##ô
##µ
##Z
##ﬄ
##ψ
##τ
##ν
##β
##ˇ
##ý
##ñ
##ë
##ç
##ζ
##δ
##Π
##ć
##ą
##ê
##æ
##â
##Ü
##É
##ω
##φ
##ρ
##π
##γ
##ä
##Ç
##~
##}
##|
##{
##`
##^
##]
##\
##[
##@
##?
##>
##=
##<
##;
##:
##/
##.
##-
##,
##+
##*
##)
##(
##'
##&
##%
##$
###
##"
##!
##_
à
,
ą
-
[
/
ñ
}
ç
.
>
ü
"
:
^
#
_
(
ﬄ
á
â
è
]
ê
'
;
!
ú
ﬃ
æ
~
ô
í
ψ
|
&
\
ˇ
é
å
%
Φ
š
ﬀ
ć
ν
ë
Ü
ğ
É
=
ä
ö
`
ý
@
{
<
+
*
č
$
?
)