Skip to content

Latest commit

 

History

History
139 lines (123 loc) · 5.58 KB

CHANGELOG.md

File metadata and controls

139 lines (123 loc) · 5.58 KB

Bug fixes

  • Fix missing support for sequence/parser analyzers in the classify tools

New features

  • Support building with biicode
  • Add Vagrantfile for virtual machine configuration
  • Add Dockerfile for Docker support

Enhancements

  • Improve ir_eval unit tests

Bug fixes

  • Fix ir_eval::ndcg incorrect log base and addition instead of subtraction in IDCG calculation
  • Fix ir_eval::avg_p incorrect early termination

Bug fixes

  • Fix issues with system-defined integer widths in binary model files (mainly impacted the greedy tagger and parser); please re-download any parser model files you may have had before
  • Fix bug where parser model directory is not created if a non-standard prefix is used (anything other than "parser")

Enhancements

  • Silence inconsistent missing overrides warning on clang >= 3.6

Bug fixes

  • fix potentially incorrect generation of vocabulary map files on 32-bit systems (this appears to have only impacted non-default block sizes)

Bug fixes

  • fix calculation of average precision in ir_eval (the denominator was incorrect)
  • specify that labels are required for the file_corpus document list; this allows spaces in the path to each document

New features

  • additions to the graph library:
    • myopic search
    • BFS
    • preferential attachment graph generation model (supports node attractiveness from different distributions)
    • betweenness centrality
    • eigenvector centrality
  • added a new natural language parsing library:
    • parse tree library (visitor-based)
    • shift-reduce constituency parser for generating phrase structure trees
    • reimplementation of evalb metrics for evaluating parsers
    • new filter for Penn Treebank-style normalization
  • added a greedy averaged Perceptron-based tagger
  • demo application for various basic text processing (profile)
  • basic iostreams that support gzip compression (if compiled with ZLib support)
  • added iteration method for stats::multinomial seen events
  • added expected value and entropy functions to stats namespace
  • added linear_model: a generic multiclass classifier storage class
  • added gz_corpus: a compressed version of line_corpus
  • added macros for generating type safe identifiers with user defined literal suffixes
  • added a persistent stack data structure to meta::util

Enhancements

  • added operator== for util::optional<T>
  • better CMake support for building the libsvm modules
  • better CMake support for downloading unit-test data
  • improved setup guide in README (for OS X, Ubuntu, Arch, and EWS/ENGRIT)
  • tree analyzers refactored to use the new parser library (removes dependency on outside toolkits for generating tree files)
  • analyzers that are not part of the "core" have been moved into their respective folders (so ngram_pos_analyzer is in src/sequence, tree_analyzer is in src/parser)
  • make_index now checks if the files exist before loading an index, and if they are missing creates a new one (as opposed to just throwing an exception on a nonexistent file)
  • cpptoml upgraded to support TOML v0.4.0
  • enable extra warnings (-Wextra) for clang++ and g++

Bug fixes

  • fix sequence_analyzer::analyze() const when applied to untagged sequences (was throwing when it shouldn't)
  • ensure that the inverted index object is destroyed first before uninverting occurs in the creation of a forward_idnex
  • fix bug where icu_tokenizer would output spaces as tokens
  • fix bugs where index objects were not destroyed before trying to delete their files in the unit tests
  • fix bug in sparse_vector::find() where it would return a non-end iterator when asked to find an element that does not exist

New features

  • demo application for CRF-based POS tagging
  • nearest_centroid classifier
  • basic statistics library for representing relevant probability distributions
  • sparse_vector utility class

Enhancements

  • ngram_pos_analyzer now uses the CRf internally (see issue #46)
  • knn classifier new supports weighted knn
  • filesystem::copy_file() no longer hangs without progress reporting with large files
  • CMake build system now includes INTERFACE targets (better inclusion as a subproject in external projects)
  • MeTA can now (optionally) be built with C++14 support

Bug fixes

  • language_model_ranker scoring function corrected (see issue #50)
  • naive_bayes classifier scoring corrected
  • several incorrect instances of numeric_limits<double>::min() replaced with the intended numeric_limits<double>::lowest()
  • fix compilation with versions of ICU < 4.4

Changes

  • sequence analyzer and CRF implementation
  • basic language model
  • basic directed and undirected graphs
  • restructure CMakeLists
  • Initial release.