Skip to content

Commit

Permalink
Proofreading
Browse files Browse the repository at this point in the history
  • Loading branch information
Gallaecio committed Oct 30, 2024
1 parent 8e0a429 commit 3413d1b
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
4 changes: 2 additions & 2 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ examples. Use "Add New Pages" and "Annotate" IPython notebooks for that.

If you want to improve Formasaurus ML models check :ref:`how-it-works` section.

Generating built-in models
--------------------------
Generating the built-in model
-----------------------------

Every time we improve the training data, we should re-train the built-in model:

Expand Down
9 changes: 6 additions & 3 deletions formasaurus/classifiers.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,14 @@ def trained_on(cls, data_folder):
def save(self, filename):
if self.form_classifier is None or self._field_model is None:
raise ValueError("FormFieldExtractor is not trained")
# Using joblib here is fine because we have control over
# sklearn-cfrsuite, used for the field model.
joblib.dump(self._field_model, self._field_filename(filename), compress=3)
# For the form classifier we use a custom serialization implementation,
# as using joblib could lead to breakages when mixing different
# scikit-learn versions.
with open(self._form_filename(filename), "w") as fp:
json.dump(self.form_classifier.to_dict(), fp)
# Using joblib is here because we have control over sklearn-cfrsuite,
# used for the field model.
joblib.dump(self._field_model, self._field_filename(filename), compress=3)

def train(self, annotations):
"""Train FormFieldExtractor on a list of FormAnnotation objects."""
Expand Down

0 comments on commit 3413d1b

Please sign in to comment.