Skip to content
This repository has been archived by the owner on Apr 12, 2019. It is now read-only.

Statistics #59

Open
4 of 6 tasks
Sandr0x00 opened this issue Apr 13, 2017 · 11 comments
Open
4 of 6 tasks

Statistics #59

Sandr0x00 opened this issue Apr 13, 2017 · 11 comments

Comments

@Sandr0x00
Copy link

Sandr0x00 commented Apr 13, 2017

Gather statistics about the stability of our algorithms in order to find bottlenecks in the whole pipeline.

  • How many requests were processed
  • How many errors occurred
  • Which errors occurred: @ansjin please commit the error outputs I've seen in your last used version
  • How much time took everything
  • Average time for each algorithm

additional:

  • improve design
@Sandr0x00 Sandr0x00 self-assigned this Apr 13, 2017
@ansjin
Copy link
Member

ansjin commented Apr 13, 2017

Building on top of what @SANDR00 did.

Relationship Statistics :

Currently running here just for testing http://104.198.227.113/

First two IPs are of relationship Algorithms (Scaled 2 Times)
Last one is of DateEvent Extraction Algorithms (scaled 2 times )

Currently the scaling is not high. More the scaling then more parallel requests can be served!

Running on 765 wiki pages scrapped by team 2 here( MusicConnectionMachine/UnstructuredData#65 (comment) )

Also resulted data can be checked @ 35.187.17.177 with username and database as default values of postgres

A snapshot after some elapsed time:

test

@ansjin ansjin mentioned this issue Apr 13, 2017
@kordianbruck
Copy link
Contributor

kordianbruck commented Apr 13, 2017

@ansjin this looks goodish? What is the status column? Is that the number of finished requests of a total of 1707 after 1022 seconds?

@simonzachau
Copy link
Contributor

simonzachau commented Apr 13, 2017

@kordianbruck yes, the status says how many are done. The total are all (incl. the remaining ones). The "design" was just to get started, it's not very intuitive for now.

@kordianbruck
Copy link
Contributor

Great. Breaking it down: So those Take roughly 30seconds to process one request. That's a lot.

@simonzachau
Copy link
Contributor

@kordianbruck it's on the free tier of Google so it's not that powerful - If we give it 100 machines instead of 2, we'll get it done faster

@simonzachau
Copy link
Contributor

And one more thing: The size of the request has to be taken into account before making a conclusion about the speed. Currently, one request is about half a website as far as I know.

@ansjin
Copy link
Member

ansjin commented Apr 13, 2017

@kordianbruck
Actually the timer doesn't gets stop if one of the algorithm has completed its all requests. It keeps on running until all the requests are finished.
The last one is the DateEventExtraction which is very fast as compared to other NLP algorithms, it roughly process those many requests in less than 200sec with not much scaling.

And yes the other relationship algorithms takes around 30sec to 1 min to process a request but if we would have many multiple machines running than those requests can be processed in parallel. Also we could use kubernetes feature which allow to create multiple pods on the same machine(which acts as a new machine only) and can fully utilize the compute power of a VM.
Currently this is just the test we are running to see how our complete application works and how much we will have to scale up when we will be running on azure.

@ansjin
Copy link
Member

ansjin commented Apr 13, 2017

Status :

test2

Inferences

  1. OpenIE, the second algorithm was running on VM in different Zone as compared to other algorithms. I checked my account, it showed that there were a few seconds downtime for some machines in that zone. I think that is the reason why we got so many ECONNREFUSED errors for that algorithm.

For the later deployment time, we should use VMs from different zones in our cluster so that if there is an issue with a zone then still the service gets available from other Zone VM

  1. Date Event Extraction completes too fast!

  2. The errors are either Socket hangup or the ECONNREFUSED, which I think would not be there once we have better VMs and more compute power.

@kordianbruck
Copy link
Contributor

Thanks for the updates / explanation!

What does "completes too fast" mean? Is it not working? Why is too fast a problem?

1-2% error rate is fine. Anything above we should investigate.

@simonzachau
Copy link
Contributor

@kordianbruck about investigation:

In the case of the date event extraction we once had a lot of errors and just tweaked some parameters on the Google side (number of pods per machine) and on our side (number of parallel requests). On the one hand, since they were the same errors, we hope to be able to reduce those for the relationship algorithms to virtually zero as well with the same approach.
On the other hand, there are differences: Compared to the date event extraction the relationship algorithms

  • require a lot more processing power -> solved by scaling up and out
  • vary in processing power: "difficult" input takes more time -> I think @MusicConnectionMachine/group-2 might have improved their output since the time when the wikipages that we are using were generated; regardless of that we are for sure optimising the requests

@ansjin
Copy link
Member

ansjin commented Apr 15, 2017

@kordianbruck Complete so fast, is a good thing for us. Actually it doesn't need as much processing as the other algorithms so that's why it is getting completed fast.

@simonzachau Those errors were mostly socket time out and they were coming because we were sending 5 parallel requests at a time when there were only 2 machines in the back-end to serve the purpose. So out of those 5, one or 2 requests were getting timed out and our error count was increasing.

@simonzachau simonzachau self-assigned this Apr 15, 2017
@kordianbruck kordianbruck added this to the 21.04 - Pre Hackathon milestone Apr 17, 2017
@Sandr0x00 Sandr0x00 changed the title Stability Statistics Apr 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants