You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
StopIteration Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:259, in DedupeMatching.pairs(self, data)
257 self.fingerprinter.index_all(data)
--> 259 id_type = core.sqlite_id_type(data)
261 # Blocking and pair generation are typically the first memory
262 # bottlenecks, so we'll use sqlite3 to avoid doing them in memory
File ~/.local/lib/python3.10/site-packages/dedupe/core.py:335, in sqlite_id_type(data)
334 def sqlite_id_type(data: Data) -> Literal["text", "integer"]:
--> 335 example = next(iter(data.keys()))
336 python_type = type(example)
StopIteration:
The above exception was the direct cause of the following exception:
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:200, in DedupeMatching.partition(self, data, threshold)
162 """
163 Identifies records that all refer to the same entity, returns
164 tuples containing a sequence of record ids and corresponding
(...)
197 ]
198 """
199 pairs = self.pairs(data)
--> 200 pair_scores = self.score(pairs)
201 clusters = self.cluster(pair_scores, threshold)
202 clusters = self._add_singletons(data.keys(), clusters)
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:129, in IntegralMatching.score(self, pairs)
125 matches = core.scoreDuplicates(
126 pairs, self.data_model.distances, self.classifier, self.num_cores
127 )
128 except RuntimeError:
--> 129 raise RuntimeError(
130 """
131 You need to either turn off multiprocessing or protect
132 the calls to the Dedupe methods with a
133 if __name__ == '__main__' in your main module, see
134 https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods"""
135 )
137 return matches
tried many solution, still getting this error:
StopIteration Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:259, in DedupeMatching.pairs(self, data)
257 self.fingerprinter.index_all(data)
--> 259 id_type = core.sqlite_id_type(data)
261 # Blocking and pair generation are typically the first memory
262 # bottlenecks, so we'll use sqlite3 to avoid doing them in memory
File ~/.local/lib/python3.10/site-packages/dedupe/core.py:335, in sqlite_id_type(data)
334 def sqlite_id_type(data: Data) -> Literal["text", "integer"]:
--> 335 example = next(iter(data.keys()))
336 python_type = type(example)
StopIteration:
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:125, in IntegralMatching.score(self, pairs)
124 try:
--> 125 matches = core.scoreDuplicates(
126 pairs, self.data_model.distances, self.classifier, self.num_cores
127 )
128 except RuntimeError:
File ~/.local/lib/python3.10/site-packages/dedupe/core.py:124, in scoreDuplicates(record_pairs, featurizer, classifier, num_cores)
122 from .backport import Process, Queue # type: ignore
--> 124 first, record_pairs = peek(record_pairs)
125 if first is None:
File ~/.local/lib/python3.10/site-packages/dedupe/core.py:278, in peek(seq)
277 try:
--> 278 first = next(seq)
279 except TypeError as e:
RuntimeError: generator raised StopIteration
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
Cell In[14], line 117
114 deduper.write_settings(sf)
116 print('clustering...')
--> 117 clustered_dupes = deduper.partition(data_d, 0.7)
119 print('# duplicate sets', len(clustered_dupes))
121 cluster_membership = {}
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:200, in DedupeMatching.partition(self, data, threshold)
162 """
163 Identifies records that all refer to the same entity, returns
164 tuples containing a sequence of record ids and corresponding
(...)
197 ]
198 """
199 pairs = self.pairs(data)
--> 200 pair_scores = self.score(pairs)
201 clusters = self.cluster(pair_scores, threshold)
202 clusters = self._add_singletons(data.keys(), clusters)
File ~/.local/lib/python3.10/site-packages/dedupe/api.py:129, in IntegralMatching.score(self, pairs)
125 matches = core.scoreDuplicates(
126 pairs, self.data_model.distances, self.classifier, self.num_cores
127 )
128 except RuntimeError:
--> 129 raise RuntimeError(
130 """
131 You need to either turn off multiprocessing or protect
132 the calls to the Dedupe methods with a
133
if __name__ == '__main__'
in your main module, see134 https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods"""
135 )
137 return matches
RuntimeError:
You need to either turn off multiprocessing or protect
the calls to the Dedupe methods with a
if __name__ == '__main__'
in your main module, seehttps://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods
The text was updated successfully, but these errors were encountered: