Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index too slow #37

Open
boh5 opened this issue Apr 26, 2020 · 2 comments
Open

Update index too slow #37

boh5 opened this issue Apr 26, 2020 · 2 comments

Comments

@boh5
Copy link

boh5 commented Apr 26, 2020

I use whoosh backend with jieba analyzer. And use sqlalchemy link to mysql.
I use flask-apscheduler to perform update whoosh index tasks regularly. Code like blow:

@scheduler.task('cron', id='do_refresh_whoosh_index', hour=2)
def refresh_whoosh_index():
    app = scheduler.app
    with app.app_context():
        search.update_index()

The first time to update index is very quick( no index file yet), thousands rows per second. But when update_index run again, it will be very slow, about just 10 rows per second. I have to delete the index, and recreate.
Is there another solution? Or I made some mistake?
Thanks!

@boh5 boh5 changed the title update index too slow Update index too slow Apr 26, 2020
@honmaple
Copy link
Owner

flask-msearch would update index automatically after rows have been created or updated, you shouldn't do it manually, search.update_index() always update all rows ranther than new rows.

If you want to update all index manually, you should disable MSEARCH_ENABLE, and increase the size of yield_per

@boh5
Copy link
Author

boh5 commented Apr 30, 2020

Because I update some rows of my database table in other application per day, I have to update the index manually per day. ( Is that right? database update in other applications, flask_msearch can not update index automatically.)
Now the problem is that delete_index() and update_index() too slow. And increasing yield_per not work, because the cpu limit.
So, I have to delete all index file manually and create index again. create_index() is thousands times faster than update_index() and delete_index().
Is there a problem with the algorithm?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants