-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdeep_vocab.txt
6631 lines (6631 loc) · 45.1 KB
/
deep_vocab.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[PAD]
[EOS]
[UNK]
[CLS]
[SEP]
[MASK]
the
of
to
a
is
and
in
that
x
we
for
with
as
1
are
can
be
The
by
learning
##s
on
model
an
this
In
from
not
it
function
or
2
training
y
A
have
This
distribution
i
h
each
example
p
network
models
al
networks
et
one
CHAPTER
0
but
gradient
input
neural
deep
use
We
5
set
at
θ
has
##ing
other
these
some
more
data
algorithm
f
all
t
which
probability
only
than
many
3
output
may
using
such
units
different
For
then
J
v
time
used
P
variables
machine
log
will
linear
also
parameters
where
when
over
matrix
between
6
error
10
if
hidden
same
layer
value
so
how
##d
L
##ed
4
very
because
number
examples
w
values
s
its
two
b
7
based
20
8
n
k
algorithms
point
g
##y
cost
optimization
##e
given
most
approach
graph
large
D
vector
do
representation
functions
case
problem
m
any
W
If
possible
learn
j
small
way
section
Y
step
##ly
9
into
G
E
must
variable
z
image
DEEP
S
figure
unit
make
It
H
space
features
see
c
M
they
R
inference
high
train
Bengio
weights
single
no
back
12
q
does
task
X
mean
likelihood
layers
usually
first
there
well
sampling
order
described
random
When
zero
size
propagation
information
One
convolutional
C
often
Figure
Learning
would
both
To
18
samples
##n
sequence
been
##2
parameter
Hinton
points
their
simple
distributions
weight
our
descent
2015
2014
need
about
was
test
form
called
V
Boltzmann
recurrent
local
rather
computational
while
These
However
14
##1
conditional
tasks
19
even
trained
structure
Neural
d
applied
As
inputs
should
language
estimate
16
through
state
term
much
useful
means
##t
unsupervised
process
could
variance
negative
T
supervised
new
just
compute
K
able
I
non
11
like
2011
respect
regularization
general
##es
being
equation
feature
Deep
2013
words
difficult
B
perform
Gaussian
important
defined
obtain
every
15
convolution
u
terms
O
##3
##r
sparse
maximum
α
sample
pages
another
us
operation
methods
dimensional
chain
regression
performance
images
##l
dataset
corresponding
##7
second
generative
chapter
true
real
system
manifold
represent
recognition
17
were
direction
them
latent
fixed
factors
autoencoder
##5
##4
##0
whether
context
N
An
##ng
probabilistic
idea
FOR
specific
loss
Markov
##6
##er
##a
objective
method
2010
13
positive
cases
##on
##al
capacity
typically
problems
known
generalization
##9
steps
several
prior
graphical
µ
update
thus
solution
norm
common
simply
requires
memory
machines
##i
what
undirected
representations
Many
rate
binary
lower
arXiv
Z
now
kind
describe
long
less
kernel
energy
approximate
2012
without
derivatives
applications
noise
low
approximation
word
solve
Because
want
U
RBM
2009
##8
stochastic
feedforward
outputs
directed
σ
better
view
still
multiple
learned
allows
results
pretraining
pooling
estimator
vectors
provide
design
softmax
procedure
field
associated
RNN
require
region
e
change
NIPS
regions
modeling
derivative
class
##o
##k
similar
pmodel
part
minimum
might
best
autoencoders
See
gradients
connections
amount
Another
search
product
partition
enough
choose
work
effect
approaches
τ
λ
find
discrete
##h
standard
define
choice
via
out
optimal
object
distributed
connected
##x
provides
observed
nonlinear
hi
far
natural
factor
encoder
element
xi
good
techniques
practice
hyperparameters
entire
computing
computation
Conference
properties
correct
exp
covariance
Hessian
phase
operations
end
bias
think
take
seen
max
decay
continuous
know
classification
visible
up
makes
due
directly
belief
Each
x1
particular
kinds
independent
computer
code
classifier
Instead
variational
right
processing
original
l
Some
IEEE
##ion
##g
within
squared
speech
per
Machine
yˆ
valued
three
result
research
φ
whose
sense
rule
matching
human
designed
activation
across
##p
strategy
statistical
include
forward
actually
Q
family
entropy
corresponds
There
you
underlying
those
introduced
estimation
after
too
target
pixels
penalty
elements
average
always
along
2008
scale
near
measure
knowledge
interactions
individual
book
analysis
allow
Most
##c
sigmoid
least
improve
hyperparameter
discussed
containing
become
F
2006
##m
predict
length
increase
illustrated
generalize
dropout
cannot
Goodfellow
AND
ω
wish
separate
joint
depth
density
bound
architecture
reduce
larger
initial
efficient
directions
node
intractable
convex
constant
coding
International
representing
o
mapping
making
further
diagonal
β
your
ways
sum
reconstruction
present
edges
decoder
constraint
numbers
level
equal
course
Processing
Proceedings
MACHINE
LEARNING
##w
##j
systems
setting
expected
exactly
close
becomes
GENERATIVE
represented
probabilities
likely
left
itself
instead
drawn
determine
component
add
above
Bayesian
##b
under
theory
principle
necessary
either
early
complicated
brain
Specifically
Other
Networks
DBM
##ting
tangent
parametric
multiplication
minibatch
importance
higher
capture
adding
LeCun
ICML
translation
cross
computed
components
before
##an
scalar
next
L2
##th
restricted
related
path
move
matrices
during
chosen
behavior
apply
27
##en
transformation
top
strategies
prediction
performing
mixture
including
fact
ensemble
advantage
Suppose
FEEDFORWARD
wise
tree
states
shown
previous
nodes
minima
map
increases
few
extremely
criterion
contains
concepts
changes
avoid
arg
While
Press
22
##te
times
quadratic
pixel
generally
explicitly
datasets
basic
On
Information
though
posterior
obtained
multi
here
extra
editors
divergence
consists
application
24
##re
##ce
score
r
needed
generator
eigenvalues
consider
complete
TRAINING
OPTIMIZATION
At
##ve
##ble
##ation
validation
uses
sometimes
sharing
produce
precision
minimize
having
equations
correspond
cells
batch
assume
achieve
Systems
By
therefore
taking
saddle
line
easy
assumption
architectures
PCA
Monte
Gibbs
user
traditional
straightforward
stopping
smaller
show
initialization
generating
generate
definition
current
23
##le
write
valid
understand
subset
rectified
normalization
net
minimizing
equivalent
distance
dimension
complex
Unfortunately
Carlo
until
structured
relatively
refer
momentum
found
following
effective
constraints
additional
1996
##z
##u
x2
version
together
remains
normal
magnitude
goal
free
entries
##st
##ne
visual
reason
provided
property
predictions
modern
made
involve
exponentially
denoising
appropriate
accuracy
2005
works
wide
variety
sub
sign
run
learns
fully
difference
describing
Salakhutdinov
SEQUENCE
RECURSIVE
RECURRENT
Newton
MODELING
Computer
26
##se
variation
shows
required
report
optimize
neurons
generated
evaluate
deviation
de
convergence
causes
active
Such
Rn
RBMs
2003
1992
##ies
ˆ
statistics
sequences
main
identity
highly
help
get
especially
dependencies
cell
around
already
Training
Technical
REGULARIZATION
Here
AI
2007
##E
##man
##ive
Ω
vision
table
objects
nets
nearest
logistic
grid
graphs
expensive
draw
construct
approximately
applying
They
1989
##S
##v
##ted
##ar
##able
square
remain
pdata
past
numerical
global
expectation