You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vanilla transformer uses sinusoidal positional encoding (position_encoding = true). We recommend to use “maximum relative positions” encoding instead (max_relative_positions=20, position_encoding=false) which again has a small overhead.
We kept the “fusedadam” (old legacy code) which provides the best performance in speed (compare to pytroch amp adam fp16, apex level O1/O2). We tested the new Adam(fused=true) released with pytorch 1.13 but it is way slower.
Always use the highest batch size possible (to your GPU ram capacity) and use an update interval according to the “true bach size” you want. For instance, if your GPU can accept 8192 tokens, then if you use accum_count=12, you will have a true batch size of 98304 tokens.
Adjust the bucket size to your CPU ram. Most of the time a bucket between 200K and 500K examples will be suitable. The highest your bucket size is, the less padding you will have since examples are sorted based on this bucket and batches yield from this bucket.
The text was updated successfully, but these errors were encountered:
https://forum.opennmt.net/t/opennmt-py-v3-0-is-out/5077
The text was updated successfully, but these errors were encountered: