CHANGELOG

3.0.0
-----
* Update FARO to allow for plug-in support.
* Decouple FARO 2.0.0 functionality to be run separately in plug-ins
* Add plug-in template to use as a guide for new plug-in integration
* Add plug-in example (address_bitcoin plus tests) based on plug-in template
* Add option to run all plugins in configurable path
* Move tests to separate package
* Simplify configuration
* Support for logging configuration
* Update to tika 1.24


2.0.0
-----
* Add password-protected/encrypted file detection and score them as high sensitivity
* Remove gensim dependency
* Remove pandas dependency
* Remove luhn dependency
* Remove murmurhash dependency
* Remove custom regex library dependency and use standard package
* Clean obsolete or transitive dependencies from requirements
* Fix relative path with deep ancestors issue on the spider. Switch output to absolute paths since it gives more context
* Allow Non-ascii characters on detailed entities output file
* Include new contributors
* Simplify configuration
* Add testing and coverture metrics
* Replace custom ML models with standard ones (Spacy) Cost-Benefit ratio signals is a better approach.
* Remove scikit-learn and sklearn-crfsuite
* Update spacy to most recent version
* Decouple tika
* Add Docker-compose to setup development and production environments


1.1.2
-----
* Fix issue with logging while forcing OCR on PDF documents

1.1.1
-----

* Update to tika 1.23
* Add dockerhub image and update documentation on its use: https://hub.docker.com/r/gradiant/faro
* Fix #32: logging duplicates
* Fix #37 : fixing metadata when a list is extracted in some fields (dates and pages)

1.1.0
-----

* Add OCR capabilities
* Add option to disable OCR for performance reasons
* Let tika handle the supported file formats
* Allow for basic document classification adding metadata to ouput: type of doc, author, creation date, filesize, etc.
* Rewrite metadata handling
* Move log and OCR configuration to envvars to integrate better with docker

1.0.1
-----

* Add Docker support
* Fix path with spaces issue
* Fix sensitivy information patterns and redesign two phase approach
* Add more contextual validations

1.0.0
-----

* Initial release.