Skip to content

Marian v1.10.0

Compare
Choose a tag to compare
@emjotde emjotde released this 06 Feb 23:37
· 371 commits to master since this release

[1.10.0] - 2021-02-06

Added

  • Added intgemm8(ssse3|avx|avx512)?, intgemm16(sse2|avx|avx512)? types to marian-conv with uses intgemm backend. Types intgemm8 and intgemm16 are hardware-agnostic, the other ones hardware-specific.
  • Shortlist is now always multiple-of-eight.
  • Added intgemm 8/16bit integer binary architecture agnostic format.
  • Add --train-embedder-rank for fine-tuning any encoder(-decoder) model for multi-lingual similarity via softmax-margin loss
  • Add --logical-epoch that allows to redefine the displayed epoch counter as a multiple of n data epochs, updates or labels. Also allows to define width of fractional part with second argument.
  • Add --metrics chrf for computing ChrF according to https://www.aclweb.org/anthology/W15-3049/ and SacreBLEU reference implementation
  • Add --after option which is meant to replace --after-batches and --after-epochs and can take label based criteria
  • Add --transformer-postprocess-top option to enable correctly normalized prenorm behavior
  • Add --task transformer-base-prenorm and --task transformer-big-prenorm
  • Turing and Ampere GPU optimisation support, if the CUDA version supports it.
  • Printing word-level scores in marian-scorer
  • Optimize LayerNormalization on CPU by 6x through vectorization (ffast-math) and fixing performance regression introduced with strides in 77a420
  • Decoding multi-source models in marian-server with --tsv
  • GitHub workflows on Ubuntu, Windows, and MacOS
  • LSH indexing to replace short list
  • ONNX support for transformer models
  • Add topk operator like PyTorch's topk
  • Use cblas_sgemm_batch instead of a for loop of cblas_sgemm on CPU as the batched_gemm implementation
  • Supporting relative paths in shortlist and sqlite options
  • Training and scoring from STDIN
  • Support for reading from TSV files from STDIN and other sources during training
    and translation with options --tsv and --tsv-fields n.
  • Internal optional parameter in n-best list generation that skips empty hypotheses.
  • Quantized training (fixed point or log-based quantization) with --quantize-bits N command
  • Support for using Apple Accelerate as the BLAS library

Fixed

  • Segfault of spm_train when compiled with -DUSE_STATIC_LIBS=ON seems to have gone away with update to newer SentencePiece version.
  • Fix bug causing certain reductions into scalars to be 0 on the GPU backend. Removed unnecessary warp shuffle instructions.
  • Do not apply dropout in embeddings layers during inference with dropout-src/trg
  • Print "server is listening on port" message after it is accepting connections
  • Fix compilation without BLAS installed
  • Providing a single value to vector-like options using the equals sign, e.g. --models=model.npz
  • Fix quiet-translation in marian-server
  • CMake-based compilation on Windows
  • Fix minor issues with compilation on MacOS
  • Fix warnings in Windows MSVC builds using CMake
  • Fix building server with Boost 1.72
  • Make mini-batch scaling depend on mini-batch-words and not on mini-batch-words-ref
  • In concatenation make sure that we do not multiply 0 with nan (which results in nan)
  • Change Approx.epsilon(0.01) to Approx.margin(0.001) in unit tests. Tolerance is now
    absolute and not relative. We assumed incorrectly that epsilon is absolute tolerance.
  • Fixed bug in finding .git/logs/HEAD when Marian is a submodule in another project.
  • Properly record cmake variables in the cmake build directory instead of the source tree.
  • Added default "none" for option shuffle in BatchGenerator, so that it works in executables where shuffle is not an option.
  • Added a few missing header files in shortlist.h and beam_search.h.
  • Improved handling for receiving SIGTERM during training. By default, SIGTERM triggers 'save (now) and exit'. Prior to this fix, batch pre-fetching did not check for this sigal, potentially delaying exit considerably. It now pays attention to that. Also, the default behaviour of save-and-exit can now be disabled on the command line with --sigterm exit-immediately.
  • Fix the runtime failures for FASTOPT on 32-bit builds (wasm just happens to be 32-bit) because it uses hashing with an inconsistent mix of uint64_t and size_t.

Changed

  • Remove --clip-gemm which is obsolete and was never used anyway
  • Removed --optimize switch, instead we now determine compute type based on binary model.
  • Updated SentencePiece repository to version 8336bbd0c1cfba02a879afe625bf1ddaf7cd93c5 from https://github.com/google/sentencepiece.
  • Enabled compilation of SentencePiece by default since no dependency on protobuf anymore.
  • Changed default value of --sentencepiece-max-lines from 10000000 to 2000000 since apparently the new version doesn't sample automatically anymore (Not quite clear how that affects quality of the vocabulary).
  • Change mini-batch-fit search stopping criterion to stop at ideal binary search threshold.
  • --metric bleu now always detokenizes SacreBLEU-style if a vocabulary knows how to, use bleu-segmented to compute BLEU on word ids. bleu-detok is now a synonym for bleu.
  • Move label-smoothing computation into Cross-entropy node
  • Move Simple-WebSocket-Server to submodule
  • Python scripts start with #!/usr/bin/env python3 instead of python
  • Changed compile flags -Ofast to -O3 and remove --ffinite-math
  • Moved old graph groups to depracated folder
  • Make cublas and cusparse handle inits lazy to save memory when unused
  • Replaced exception-based implementation for type determination in FastOpt::makeScalar