Proofreading

scrapinghub · Oct 30, 2024 · 3413d1b · 3413d1b
1 parent 8e0a429
commit 3413d1b
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 5 deletions.
diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -25,8 +25,8 @@ examples. Use "Add New Pages" and "Annotate" IPython notebooks for that.
 
 If you want to improve Formasaurus ML models check :ref:`how-it-works` section.
 
-Generating built-in models
---------------------------
+Generating the built-in model
+-----------------------------
 
 Every time we improve the training data, we should re-train the built-in model:
 

diff --git a/formasaurus/classifiers.py b/formasaurus/classifiers.py
@@ -172,11 +172,14 @@ def trained_on(cls, data_folder):
     def save(self, filename):
         if self.form_classifier is None or self._field_model is None:
             raise ValueError("FormFieldExtractor is not trained")
+        # Using joblib here is fine because we have control over
+        # sklearn-cfrsuite, used for the field model.
+        joblib.dump(self._field_model, self._field_filename(filename), compress=3)
+        # For the form classifier we use a custom serialization implementation,
+        # as using joblib could lead to breakages when mixing different
+        # scikit-learn versions.
         with open(self._form_filename(filename), "w") as fp:
             json.dump(self.form_classifier.to_dict(), fp)
-        # Using joblib is here because we have control over sklearn-cfrsuite,
-        # used for the field model.
-        joblib.dump(self._field_model, self._field_filename(filename), compress=3)
 
     def train(self, annotations):
         """Train FormFieldExtractor on a list of FormAnnotation objects."""