OpenNMT-py v3 support #23

argosopentech · 2022-11-06T02:27:04Z

https://forum.opennmt.net/t/opennmt-py-v3-0-is-out/5077

The vanilla transformer uses sinusoidal positional encoding (position_encoding = true). We recommend to use “maximum relative positions” encoding instead (max_relative_positions=20, position_encoding=false) which again has a small overhead.

We kept the “fusedadam” (old legacy code) which provides the best performance in speed (compare to pytroch amp adam fp16, apex level O1/O2). We tested the new Adam(fused=true) released with pytorch 1.13 but it is way slower.

Always use the highest batch size possible (to your GPU ram capacity) and use an update interval according to the “true bach size” you want. For instance, if your GPU can accept 8192 tokens, then if you use accum_count=12, you will have a true batch size of 98304 tokens.

Adjust the bucket size to your CPU ram. Most of the time a bucket between 200K and 500K examples will be suitable. The highest your bucket size is, the less padding you will have since examples are sorted based on this bucket and batches yield from this bucket.

argosopentech · 2022-11-06T02:29:45Z

OpenNMT/OpenNMT-py#2242

PJ-Finlay · 2022-11-06T12:44:07Z

OpenNMT/OpenNMT-py#2244

argosopentech added enhancement New feature or request help wanted Extra attention is needed labels Nov 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenNMT-py v3 support #23

OpenNMT-py v3 support #23

argosopentech commented Nov 6, 2022

argosopentech commented Nov 6, 2022

PJ-Finlay commented Nov 6, 2022

OpenNMT-py v3 support #23

OpenNMT-py v3 support #23

Comments

argosopentech commented Nov 6, 2022

argosopentech commented Nov 6, 2022

PJ-Finlay commented Nov 6, 2022