Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text from Create Table cannot be used for text mining #169

Open
wvdvegte opened this issue Jul 12, 2024 · 1 comment
Open

Text from Create Table cannot be used for text mining #169

wvdvegte opened this issue Jul 12, 2024 · 1 comment

Comments

@wvdvegte
Copy link

Educational version

0.8.0

Orange version

3.37.0

Expected behavior

Doing text mining with text entered in Create Table should be possible by Editing the Domain of the table output to force the text to be interpreted as text (rather than categorical data), then connect Corpus to Edit Domain, and select the text variable as "Used text features"

Actual behavior

Connecting Corpus to Edit Domain results in an error:

Error encountered in widget Corpus:

Traceback (most recent call last):
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/widgets/owcorpus.py", line 336, in update_feature_selection
    corpus = self.corpus.copy()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 481, in copy
    c = super().copy()
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 1491, in copy
    t = self.__class__(self)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 71, in __new__
    return super().__new__(cls, *args, **kwargs)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/table.py", line 718, in __new__
    return cls.from_table(args[0].domain, args[0], **kwargs)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 558, in from_table
    Corpus.retain_preprocessing(source, c, row_indices)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 649, in retain_preprocessing
    new.text_features = list(filter(None, [
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 650, in 
    new._find_identical_feature(tf)
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/orangecontrib/text/corpus.py", line 129, in _find_identical_feature
    var == feature
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/data/variable.py", line 418, in __eq__
    and var1._compute_value == var2._compute_value
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/Orange/preprocess/transformation.py", line 240, in __eq__
    and np.allclose(self.lookup_table, other.lookup_table,
  File "", line 180, in allclose
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/numeric.py", line 2265, in allclose
    res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
  File "", line 180, in isclose
  File "/Applications/Orange.app/Contents/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/numpy/core/numeric.py", line 2372, in isclose
    xfin = isfinite(x)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

If the error is ignored and the text variable is selected as "Used text features", Corpus will ignore it and put the default corpus book-excerpts.tab on its output.

Steps to reproduce the behavior

Open Create table with text.ows.zip and connect Corpus to Edit Domain to reproduce the behavior described above.

@janezd
Copy link
Collaborator

janezd commented Jul 12, 2024

@ajdapretnar, could you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants