Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic Modeling Crash #820

Closed
layudhi opened this issue Apr 19, 2022 · 55 comments · Fixed by #885
Closed

Topic Modeling Crash #820

layudhi opened this issue Apr 19, 2022 · 55 comments · Fixed by #885

Comments

@layudhi
Copy link

layudhi commented Apr 19, 2022

What's wrong?

Topic Modeling Crash when used with Twitter Widget

How can we reproduce the problem?

Please see the Screen Shoot, i cannot save the owl with Topic Modeling connected since the App will freeze and then Not Responding after connecting the Widget from Corpus

What's your environment?

  • Operating system: Windows 10 64Bit
  • Orange version: 3.32
  • How you installed Orange:
  1. By clinking the standalone installation files.
  2. Installation Folder Orange: C:\Program Files\Orange
  3. Installation miconda3 : C:\Users\ASUS\miniconda3
    Screenshot 2022-04-19 165813
    Screenshot 2022-04-19 170110
@ajdapretnar
Copy link
Collaborator

First of all, don't use another Corpus between Preprocess Text and Topic Modelling because you will override all the preprocessing results.

Second, I believe this is the reason for the crash. You did the preprocessing, but overrode it and now the default preprocessing is run, resulting a large number of tokens which don't fit into your RAM. Decrease the number of tokens and try again.

@janezd janezd transferred this issue from biolab/orange3 Apr 19, 2022
@ajdapretnar
Copy link
Collaborator

I've collated all the reports of the same kind. Two other uses say Topic Modelling crashes on Windows (doesn't happen on Mac). One user doesn't even have a large corpus (174 documents and 909 tokens).

We need to research why this happens. @noahnovsak @PrimozGodec

@PrimozGodec
Copy link
Collaborator

@djukicn already identified the reason for crashes. I think also the crash described in this issue have the same source.

The problem is that corpus._ngrams_corpus stays defined when tokens are rested (by corpus widget). Similar things happen with subsampling bow features: _ngrams_corpus is subsampled row-wise but stay the same column wise, while dictionary is different.
In those cases we would need to reset also subsample ngrams_corpus column vise or remember the dictionary of columns.

Ayway we got the idea for different solutions which can solve all that problems and minimize the probability for errors: we would deprecate _ngrams_corpus and take bag of words counts from the Table if exit (columns with bow attribute flag) in they do not exits topic modeling would compute bow features as it does already (when _ngrams_corpus is not defined).

I think this solution would minimize the probability that something does not work and even give users the option to manipulate with bag-of-words features before the topic modeling.

@nadiaelen
Copy link

@djukicn already identified the reason for crashes. I think also the crash described in this issue have the same source.

The problem is that corpus._ngrams_corpus stays defined when tokens are rested (by corpus widget). Similar things happen with subsampling bow features: _ngrams_corpus is subsampled row-wise but stay the same column wise, while dictionary is different. In those cases we would need to reset also subsample ngrams_corpus column vise or remember the dictionary of columns.

Ayway we got the idea for different solutions which can solve all that problems and minimize the probability for errors: we would deprecate _ngrams_corpus and take bag of words counts from the Table if exit (columns with bow attribute flag) in they do not exits topic modeling would compute bow features as it does already (when _ngrams_corpus is not defined).

I think this solution would minimize the probability that something does not work and even give users the option to manipulate with bag-of-words features before the topic modeling.

Hi, I am sorry, perhaps I am in the wrong place, but is there a solution for this? I can't recover my work, orange crashes after the latest update when running topic modeling. I use LDA...

@NAsic123
Copy link

I have the same problem - Topic Modelling crushes when I want to run it.

@PrabodhaCha
Copy link

I have the same issue, this is on windows.
It crashes even with 100 tweets
image

@nadiaelen
Copy link

I've collated all the reports of the same kind. Two other uses say Topic Modelling crashes on Windows (doesn't happen on Mac). One user doesn't even have a large corpus (174 documents and 909 tokens).

We need to research why this happens. @noahnovsak @PrimozGodec

Hi, any news on this? Thanks...

@ajdapretnar
Copy link
Collaborator

We just released Orange3-Text v. 1.10.0. Please update the add-on and let us know if it works.

If not, we would appreciate if you could provide a workflow, data sample (if possible) and the pip freeze output, if you installed Orange via the terminal.

@NAsic123
Copy link

NAsic123 commented Jul 8, 2022

@ajdapretnar I updated add-on and it still crushes. I am sending additional information bellow.

What's your environment/workflow?

  • processor Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz 1.50 GHz

  • installed memory RAM 8,00 GB (uporabno 7,80 GB)

  • system 64-bitni operacijski sistem, procesor x64

  • Windows Windows 11 Home, version 21H2

  • Orange version: 3.32.0

How you installed Orange:

My data sample is here.
kurent_neprecisceno.xlsx

Pip freeze output:
When I run Topic Modelling it is like this for a few minutes:
image

And then it is like:
image

And then it says:
image

@ajdapretnar
Copy link
Collaborator

@NAsic123 What happens it you select to wait?

@ajdapretnar
Copy link
Collaborator

Also, I tried it on Mac with your data. I am assuming you are using the default preprocessing and LDA? It works normally for me.
@djukicn Any ideas? Could it be a Windows issue?

@NAsic123
Copy link

NAsic123 commented Jul 8, 2022

@ajdapretnar thank you for your answer and help. I will run LDA and I will leave it running and see what will happen. Then I will report what happens. Maybe it needs extra time.

No, I am not using default preprocessing, I am using these preprocessors:
image

@ajdapretnar
Copy link
Collaborator

Two comments, unrelated to the crashing widget. In preprocess, you don't need Regexp, because tokenization you've set already omits all punctuation. Also, your POS tag filter doesn't do anything, because your data is not tagged, so filtering cannot work.

@NAsic123
Copy link

NAsic123 commented Jul 8, 2022

@ajdapretnar thank you. I will correct it.

And I let the LDA run for one hour and it was still on 0 % and then it crushed.

@djukicn
Copy link
Collaborator

djukicn commented Jul 8, 2022

@NAsic123 Is you gensim version currenlty 4.2.0? If so, could you please install 4.1.2 and see whether the same problem occurs?

@NAsic123
Copy link

NAsic123 commented Jul 8, 2022

@djukicn I am sorry for the very basic question, but where do I install gensim?
I am mostly an R user and not of Python so I'm not that familiar with Python.

In the Orange Command Promt I have to type:
C:\Users\amit_>pip install gensim

?

@ajdapretnar
Copy link
Collaborator

@NAsic123 Sorry, this was a bit technical. I would be wary of tampering with the set up environment.
However, there is a special program called Orange Command Prompt, which you can find from the start menu. Open it and first run pip freeze and post the output here. This will help us identify which version of the gensim library you currently have. Then we can proceed with carefully downgrading and then upgrading again.

@NAsic123
Copy link

NAsic123 commented Jul 11, 2022

@ajdapretnar thank you for your instruction. I ran pip freeze and got this information below (I copied and saved it in .txt). I hope it is useful and thank you again.

pip freeze.txt

@ajdapretnar
Copy link
Collaborator

Ok, it does indeed seem like you have gensim==4.2.0.
Now please try running pip install gensim==4.1.2.
Then open Orange and see it Text works. Please, let me know.

@NAsic123
Copy link

Thank you, I installed it and now it works, Topic Modelling does not crash.
But I get this notification (init()_, got un unexpected keyword argument 'random_seed'). Is it maybe that I insert wrong Preprocessing settings?
image

My Preprocessing settings:
image

@ajdapretnar
Copy link
Collaborator

@NAsic123 No, your preprocessing is fine. This one needs to be solved by the core team.

@djukicn It seems like gensim==4.1.2 works. But LSI model has the new random_seed parameter added in version 4.2.0. Do you perhaps have an idea what causes gensim 4.2.0 to no work on Windows?

@NAsic123
Copy link

@ajdapretnar thank you. I ran LDA and now it works. I get the results so not it works. Thank you so much for help.
But also with the LDA, I get the same message with random_seed.
image

@djukicn
Copy link
Collaborator

djukicn commented Jul 11, 2022

@ajdapretnar I was actually able reproduce the error (although to me the results were produced after clicking "Wait" a few times) on Ubuntu so it's not just a Windows issue. Somewhere in the background gensim raises an exception. I'll look into it today and see what can be done.

@ajdapretnar
Copy link
Collaborator

@djukicn Fantastic! Thanks!

@nadiaelen
Copy link

It would be highly appreciated if you could also provide info on updating that gensim library. I understand that's where the issue might be, but I don't know how to update it. Thank you.

@PrimozGodec
Copy link
Collaborator

We have reported the issue to Gensim (the library which computes topics), hope they will consider it soon piskvorky/gensim#3368

@ajdapretnar
Copy link
Collaborator

@nadiaelen If you are on Windows, you could try opening Orange Command Prompt (a separate program available from Start menu). Then enter pip install gensim==4.1.2. Hopefully, this will solve the issue. It's the best I can give at the moment.

@Katzengurke
Copy link

Hey, I tried source installation with the merged files from #885 (newest biolab repository), but it didnt work. Orange still crashes. I also tried different versions of orange3 with older add-on versions, didnt work either.
Im using Windows 11. Any idea how to fix it?

@ajdapretnar
Copy link
Collaborator

@Katzengurke When you say Orange crashes, do you mean the software or the topic modelling widget?
Could you perhaps open Orange Command Prompt, run python -m Orange.canvas and try the workflow that results in a crash? Then copy and paste the log here, please.

@Katzengurke
Copy link

Katzengurke commented Aug 18, 2022

The software crashes.
I tried running the command prompt, but I only get the message "Python stopped working".

Edit: Tried the same on a windows 10 laptop where I didnt temper with any files whatsoever, and it works there. Is it maybe Windows 11?

@ajdapretnar
Copy link
Collaborator

Tried the same on a windows 10 laptop where I didnt temper with any files whatsoever, and it works there. Is it maybe Windows 11?

It might be. Does it happen even if you uninstall orange3-text add-on?

@ajdapretnar
Copy link
Collaborator

Can you try running python in Orange Command Prompt and let me know the Python version it reports? I'll check if any known Python+Win11 bugs exist.

@Katzengurke
Copy link

Tried the same on a windows 10 laptop where I didnt temper with any files whatsoever, and it works there. Is it maybe Windows 11?

It might be. Does it happen even if you uninstall orange3-text add-on?

Yes, sadly. Ill uninstall my Python, my Anaconda and Orange, and then try again with a new installation, and let you know in a couple of minutes.

Can you try running python in Orange Command Prompt and let me know the Python version it reports? I'll check if any known Python+Win11 bugs exist.

Same with that.

@ajdapretnar
Copy link
Collaborator

Ok, so it is a Python bug not an Orange bug.

Does running python --version work?

@Katzengurke
Copy link

Katzengurke commented Aug 18, 2022

Alright, I got a bit farther, but the bug got stranger too.
I uninstalled everything like I said, and reinstalled Orange 3.32 and text from source. Afterwards I tried connecting a corpus widget to topic modelling, but Python stopped working again. I went for python -m Orange.canvas in order to get the log (which works now), but once I tried getting the log it stopped working again, and changed my resolution settings.

Python is 3.8.8

Edit: I actually got a log in the shell, although only for the topic modeler without the connection to the corpus I think

C:\Users\xxx\AppData\Local\Programs\Orange\lib\site-packages\xgboost\compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
C:\Users\xxx\AppData\Local\Programs\Orange\lib\site-packages\orangecanvas\scheme\link.py:74: RuntimeWarning: Failed to resolve name 'orangecontrib.network.Network' to a type: ModuleNotFoundError: No module named 'orangecontrib.network'

return tuple(filter(None, resolve_types(types)))

Edit 2: Yep, retried it a couple of times, no chance to get the log of the crash. As soon as I connect corpus to topic modeler, "Python stopped working"

@nadiaelen
Copy link

So, the same thing remains: even with the new gensim library, it runs into the same problem, it just hangs forever, after all install, deinstall, etc...:(

@Katzengurke
Copy link

Katzengurke commented Aug 22, 2022

So, the same thing remains: even with the new gensim library, it runs into the same problem, it just hangs forever, after all install, deinstall, etc...:(

The fix #885 definitely works under Windows 10, I was able to install it and work with it, without crashes so far. You have to use the source installation with git and the orange command prompt. Doesnt work with my Windows 11 laptop though.

Edit: Nevermind, works now. I have to open Orange via python -m Orange.canvas though. I uninstalled a bunch of programs like node.js though

@JosieVor
Copy link

Hey,

I am using orange data mining for my master thesis. However, when I try topic modelling, it crashes. I already read this thread but it was a bit too technical for me to understand since I really do not have any experience with Orange or Python. Can someone help me out?

My environment: Windows 11 Home 64Bit

@nadiaelen
Copy link

I really love Orange and appreciate you, your work and everything, but, truly, when it comes to topic modelling, which is the hottest topic right now, one weeks works, one week crashes and stays like that for a month...

@PrimozGodec
Copy link
Collaborator

@JosieVor and @nadiaelen, sorry for the late response. I tried to reproduce the error on MacOS and Windows, and it works for me. Can you please give me more information so I can dig deeper into the problem?

  • @nadiaelen, do you also use Windows?
  • Can you prepare a workflow which crashes? What dataset do you use?
  • Which Topic modelling model causes the crash?
  • What is the version of the Text add-on that you use? You can see the version if you go under Options>Add-ons and see the currently installed version in the list.
  • What version of the Orange do you have? You can find the version on Windows under the Help>About menu.
  • What is the version of the Gensim library? You can get it if you open the Orange command prompt (click the Start button on Windows and then start typing the Orange command prompt). Type pip freeze in the command prompt and then press enter. Scroll through the list and find Gensim.

Thank you in advance.

@PrimozGodec PrimozGodec reopened this Mar 24, 2023
@JosieVor
Copy link

I am using a Twitter Dataset, which I scrapped directly on Orange. My dataset is quite big (about 30.000 Tweets) but I also have tried topic modeling on smaller datasets (about 100 Tweets, and it still did not work).

I am using LDA, which crashes every time. LSI sometimes works, but more often than not it does not work, either.

Text Add-On Version: 1.12.0

Orange Version: 3.34.0

Gensim Version: the pip freeze command does not work for me, but when I use pip list, it says my version is 4.1.2

@calliope212
Copy link

Hi, I'm using windows 10 and also have the same issues. I already tried the pip freeze using orange command but it still crashed. I also tried to reinstall but still have the same result. Is there any other solution?

Orange version: 3.34.0
Text Add-On version: 1.12.1
Gensim: 4.1.2

@PrimozGodec
Copy link
Collaborator

Thank you @JosieVor and @calliope212, for the additional information. We noticed that the newest release didn't support using genim>=4.3.0 (on the master branch we already switched to >=4.3.0).

We fixed the release. Can you please update the Text addon to version 1.12.2 and try again? Please let us know if it helps.

@calliope212
Copy link

Thank you @PrimozGodec for the suggestion. I tried it but unfortunately, it still won't work. When I check the pip list, it says my gensim version is 4.3.1. Is that affect the result?

@sohbl
Copy link

sohbl commented Jun 27, 2023

Hi, I also have similar issues when I run Topic Modelling, it hangs for more than 2 minutes and I have to kill it ultimately.
I'm running on Windows 11.
Orange version = 3.34.0
Text Add-on version = 1.13.1
Gensim version = 4.3.0

Another laptop has the same problem. It is running on Windows 10.
Orange version = 3.34.0
Text Add-on version = 1.13.1
Gensim version = 4.3.1

Is there any solution for this problem?

@IzaClaro
Copy link

Hi!
I am also having this problem, whenever I run Topic Modelling widget, the program crashes and ultimatly I have to force shutdown on my computer, as it seems everything fails after!
I am running windows 11 as well.

Orange version 3.35
Gensim version = 4.3.0

Do you already have any solution for this problem?
Thanks.

@PrimozGodec
Copy link
Collaborator

The new version of Gensim (4.3.2) is out. Can you try with this version? We still cannot reproduce the bug on the new or previous version.
How do I upgrade to a new Gensim (select any of the following options)?

  • Open Orange Command Prompt from the start menu and write conda update gensim orange3-text (Windows only)
  • Open the Addons dialogue in Orange, click Add more and type Gensim. Check gensim at the bottom of the list. Also, check orange3-text to be upgraded (if not yet).

@ajdapretnar
Copy link
Collaborator

I am closing this as stale. If the error persists, please open a new issue.

@Bragheto
Copy link

2024 and the error persist. If preprocess uses n_gram range, it crashes.

@ajdapretnar
Copy link
Collaborator

Is this still a Windows issue? Because I ran grimm-tales with n-grams and TM worked. I have the latest gensim installed (gensim==4.3.2).

@Bragheto
Copy link

Bragheto commented Apr 24, 2024

Is this still a Windows issue? Because I ran grimm-tales with n-grams and TM worked. I have the latest gensim installed (gensim==4.3.2).
image
Same here.
Yes, I am using the GUI in Windows 11 (Compilation 22631.3447 and Windows Feature Experience Pack 1000.22688.1000.0).

As soon as I disabled n_gram range in the preprocess workflow, everything worked.
I am using a custom dataset, with imported .pdf/.txt documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

14 participants