Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the function for building database #1612

Closed
blurHY opened this issue Sep 15, 2018 · 13 comments
Closed

Optimize the function for building database #1612

blurHY opened this issue Sep 15, 2018 · 13 comments

Comments

@blurHY
Copy link
Contributor

blurHY commented Sep 15, 2018

rev3594
It takes about 5min to build the database for Horizon.And during building,i can't do anything on zeronet.it just says loading.

@HelloZeroNet
Copy link
Owner

HelloZeroNet commented Sep 15, 2018

For me: INFO Site:1CjMsv..uxBn Imported 10 data file in 11.4669420719s on a 20 USD/yr VPN which is I think pretty OK since it reads and inserts around 70MB of data

I will check the possibility to moving db operations to separate thread with moving to python3

@blurHY
Copy link
Contributor Author

blurHY commented Sep 15, 2018

70MB? It should be 216MB. I'm running zeronet on HDD not SSD,so it will be slower.Maybe some cache in RAM is needed. Also the data is growing,as i said.

@HelloZeroNet
Copy link
Owner

$ gzip -l *.json.gz
         compressed        uncompressed  ratio uncompressed_name
            3303011            27374599  87.9% data_keywords.json
            2570148            21068213  87.8% data_main.json
            1006795             4339919  76.8% data_phrases.json
            1460239            15571250  90.6% data_relationship.json
            8340193            68353981  87.8% (totals)

SSD highly recommended for ZeroNet

@blurHY
Copy link
Contributor Author

blurHY commented Sep 15, 2018

Not everyone would use SSD.And the small file read/write is not HHD good at.So it needs a solution to bypass small file read/write.Also cache the db read/write to file system

@HelloZeroNet
Copy link
Owner

The file reads are cached and a write is handled by the operating system. The db cache is handled by the sqlite module.

@slrslr
Copy link

slrslr commented Sep 17, 2018

@HelloZeroNet i also have this issue, but 4 times slower in my case to load the .db (24 minutes) + CPU overload. As @blurHY says, not everyone will use HDD. And think about smartphone users.. I found this thread because i wanted to submit the issue about same thing. On mentioned Horizon site, it took my older Pentium computer 15 minutes of full CPU load (HDD activity was not exhausted whole time) to finish rechecking of Horizon site.

This is what i did on my Linux Ubuntu 16.04 computer with latest Zeronet:
cd ~/Apps/ZeroBundle/ZeroNet/data/
find ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn -delete
mkdir ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn
git clone https://github.com/blurHY/Horizon.git
mv Horizon/* ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/
../zeronet.py sitePublish 1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn

Then i go to ZeroHello and click "Check files" next to Horizon site. Result was like 15 minut CPU overload of the computer, debug.log not went crazy, but i seen in Horizon site (0) menu that the site has 300MB .db
And this size is nothing special, Zeronet should be able to cope with dynamic sites having lets say 10GB databases etc.

cd ~/Apps/ZeroBundle/ZeroNet/data/1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/data
$ gzip -l *.json.gz

         compressed        uncompressed  ratio uncompressed_name
            2743133            23285428  88.2% data_keywords1.json
            2208428            23940669  90.8% data_keywords2.json
            2602574            23455548  88.9% data_keywords3.json
            2656768            23706139  88.8% data_keywords4.json
             902385             8252620  89.1% data_keywords5.json
            2789194            24995191  88.8% data_main1.json
            2155699            15099977  85.7% data_main2.json
            3347288            19561521  82.9% data_phrases1.json
            2197791            23694010  90.7% data_relationship1.json
            2425075            24690472  90.2% data_relationship2.json
            1197466            14682409  91.8% data_relationship3.json
             303130             1297035  76.6% data_zites1.json
           25528931           226661019  88.7% (totals)

I think this happened to me several times on this site, because in debug-last.log i see:
Site:1CjMsv..uxBn Imported 12 data file in 1443.11739898s

It may be related to unsolved issue where ZeroMe db tooks days to rebuild: ZeroTalk topic, also described in this unsolved issue: HelloZeroNet/ZeroMe#121

@HelloZeroNet
Copy link
Owner

The problem is it's limited by IO/Sqlite, so we can't do much about it. You can try experiment it by removing some indexes as that's one of the factors of insert performance.

@blurHY
Copy link
Contributor Author

blurHY commented Sep 18, 2018

removing some indexes

Then it will be slower to query ?

@HelloZeroNet
Copy link
Owner

It's not necessary going to be slower. Worth experimenting with it.

@blurHY
Copy link
Contributor Author

blurHY commented Sep 18, 2018

1min and 30 secs building after removed all of indexes.And the query seems quicker.Maybe the reason is that the cpu is idle .

@skwerlman
Copy link

what about making db writes async? that way it at least won't lock up the whole client

@tangdou1
Copy link
Contributor

tangdou1 commented Sep 20, 2018

Each time I add this zite ( Horizon) to my poor vps (only have 500MB memery), the zeronet.py program would be killed by the system due to out of memery.

@blurHY
Copy link
Contributor Author

blurHY commented Sep 20, 2018

Is there any way to know that the database has been built? Also show progress bar when the db is building
It won't show progress when the db is big and not filled with user data.Then users don't know what are zeronet doing

@blurHY blurHY closed this as completed Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants