-
Notifications
You must be signed in to change notification settings - Fork 1
Statistics #59
Comments
Building on top of what @SANDR00 did. Relationship Statistics : Currently running here just for testing http://104.198.227.113/ First two IPs are of relationship Algorithms (Scaled 2 Times) Currently the scaling is not high. More the scaling then more parallel requests can be served! Running on 765 wiki pages scrapped by team 2 here( MusicConnectionMachine/UnstructuredData#65 (comment) ) Also resulted data can be checked @ 35.187.17.177 with username and database as default values of postgres A snapshot after some elapsed time: |
@ansjin this looks goodish? What is the status column? Is that the number of finished requests of a total of 1707 after 1022 seconds? |
@kordianbruck yes, the status says how many are done. The total are all (incl. the remaining ones). The "design" was just to get started, it's not very intuitive for now. |
Great. Breaking it down: So those Take roughly 30seconds to process one request. That's a lot. |
@kordianbruck it's on the free tier of Google so it's not that powerful - If we give it 100 machines instead of 2, we'll get it done faster |
And one more thing: The size of the request has to be taken into account before making a conclusion about the speed. Currently, one request is about half a website as far as I know. |
@kordianbruck And yes the other relationship algorithms takes around 30sec to 1 min to process a request but if we would have many multiple machines running than those requests can be processed in parallel. Also we could use kubernetes feature which allow to create multiple pods on the same machine(which acts as a new machine only) and can fully utilize the compute power of a VM. |
Status : Inferences
For the later deployment time, we should use VMs from different zones in our cluster so that if there is an issue with a zone then still the service gets available from other Zone VM
|
Thanks for the updates / explanation! What does "completes too fast" mean? Is it not working? Why is too fast a problem? 1-2% error rate is fine. Anything above we should investigate. |
@kordianbruck about investigation: In the case of the date event extraction we once had a lot of errors and just tweaked some parameters on the Google side (number of pods per machine) and on our side (number of parallel requests). On the one hand, since they were the same errors, we hope to be able to reduce those for the relationship algorithms to virtually zero as well with the same approach.
|
@kordianbruck Complete so fast, is a good thing for us. Actually it doesn't need as much processing as the other algorithms so that's why it is getting completed fast. @simonzachau Those errors were mostly socket time out and they were coming because we were sending 5 parallel requests at a time when there were only 2 machines in the back-end to serve the purpose. So out of those 5, one or 2 requests were getting timed out and our error count was increasing. |
Gather statistics about the stability of our algorithms in order to find bottlenecks in the whole pipeline.
additional:
The text was updated successfully, but these errors were encountered: