Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to format incoming requests w multiple images? #6

Closed
jywarren opened this issue Mar 19, 2019 · 80 comments
Closed

How to format incoming requests w multiple images? #6

jywarren opened this issue Mar 19, 2019 · 80 comments

Comments

@jywarren
Copy link
Member

Should it be a Json format with an array of steps, and should we start by importing a list of URLs and assigning them short variable names in the Json?

Or should we basically be submitting a Js function that runs using an array of URLs as additional input?

@jywarren
Copy link
Member Author

@tech4GT how did you address this in the initial implementation? Thanks!!

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

Hi Jeff, right now I am taking the urls as an array in the request json and the steps are taken as a sequencer string.

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@harshkhandeparkar
Copy link
Member

Is there a way to compress the images if they exceed a limit. Maybe a size limit like 20MB?

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

@jywarren For now I used a common string but I'll update it to this following architecture

{
// request body
   images: [
         {
          url: <URL>
          steps: <String> 
         },
         {
          url: <>
          steps: <>
         }
   ]
}

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

We can then expand each of the images with as much metadata as we want like coordinates and orientation.

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

But it's gonna be that anyway, right? I mean if we transfer the image in part initially from the client side, then too the total data remains the same, I think the best option we have is to compress them in mapknitter and host those images on the urls. We can then create a step like decompress which gets the original image back.

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

Yeah this sounds good to me!! 👍

@tech4GT
Copy link
Member

tech4GT commented Mar 22, 2019

Hi @jywarren , I updated the api structure and now on using this url
http://localhost:4000/api/v1/process/?images=[{"url":"./Images/Mario.png","sequence":"invert"},{"url":"./Images/Mario.png","sequence":"colormap"}]

this is the result:
Screen Shot 2019-03-22 at 2 08 08 PM

@tech4GT
Copy link
Member

tech4GT commented Mar 22, 2019

I think this completes our part 1 majorly? Should I get started on the stitch module using https://github.com/publiclab/mapknitter-exporter/blob/main/lib/mapknitterExporter.rb#L252 as reference??

@jywarren
Copy link
Member Author

jywarren commented Mar 22, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 24, 2019

Hmm, this sounds right! Maybe what we can do is have a common sequence that runs once others are done and then every image can have a sequence property which is specific to that image

{
images: [
    {
       url: <>,
       sequence: <>
    }
],
sequence: <Common sequence>
}

@tech4GT
Copy link
Member

tech4GT commented Mar 24, 2019

Once we get this sorted we can move on to part 2 finally 😃

@jywarren
Copy link
Member Author

jywarren commented Mar 24, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 24, 2019

Hmm, Sure! So should I update it according to the format you suggested earlier?
One thing I can't wrap my head around though, is that how would we use these IDs to generate the dependency info?

@jywarren
Copy link
Member Author

jywarren commented Mar 25, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 26, 2019

Hello Jeff, sorry for the late reply! Okay this sounds good!!

  • Every Image has an ID which is used for ordering and dependency resolution
  • Inside the each image object we can have the properties, pre and post which will be specific to that particular image.
  • We also have a requisites property on the image which is an array of IDs which need to be completed before this can be started.

@tech4GT
Copy link
Member

tech4GT commented Mar 26, 2019

This makes a lot of sense, I was meaning to ask that if we don't mind using some extra memory, I can implement a stripped down version of the directed graph data structure. We can then run a topological sorting algorithm on it which will take care of the dependencies. How does that sound?

@tech4GT
Copy link
Member

tech4GT commented Mar 26, 2019

So the API request will kind of look like

{
images: [
   {
      <ID>
      <pre>
      <post>
      <sequence>
      <requisites>
   }
],
common_sequence: <String>
upload: <Boolean>
redirect: <Boolean>
}

@tech4GT
Copy link
Member

tech4GT commented Mar 27, 2019

cc @jywarren

@harshkhandeparkar
Copy link
Member

harshkhandeparkar commented Mar 27, 2019

  • I had an idea, since upload times can be slow, how about creating multiple requests instead of one and each request can have a special ID.
  • Each request can contain say n images or images of total size n MB. For each request with the same ID, an asynchronous (separate) thread can be opened with a maximum of let's say t threads. Each thread will process the img(s) it receives and put it in some kind of an output queue.
  • Each output queue will have it's own ID(same as that of the request) and as the requests come in, images will be processed asynchronously.
  • After processing each image and creating the output queue, the images can be sent back to the same ID asynchronously
OR
  • Every request can have an index and the total number of indices can be put in every request. So if there are x requests, the first request will have an index 0, the next one will have 1 and so on. The server can process the requests asynchronously and put them in the output queue in the order of their indices.
  • The API can listen for more requests, if requests with some indices haven't arrived (with a timeout of course)
  • Finally the whole output queue can be sent at once or asynchronously (to reduce download times as well).
  • As old requests are processed, threads will be freed and new ones can be processed asynchronously.
  • Again the maximum number of requests that can be sent with the same ID should also have a limit (i think).

@harshkhandeparkar
Copy link
Member

cc @jywarren @tech4GT

@harshkhandeparkar
Copy link
Member

One thing is that the frontend will also have to be developed to calculate the no of requests and no of images per request and to distribute those images accordingly.

@harshkhandeparkar
Copy link
Member

harshkhandeparkar commented Mar 27, 2019

  • If the frontend isn't developed for this kind of behaviour, a single request with all the imgs can also be taken.
  • So the server has to get the no of images i and the the number of requests r. If the number of requests is small, each request should be allowed a smaller no of images for the asynchronous processing to work efficiently. If there is a single request, no of imgs per request can be more as only a single thread will be utilised for it and no performance will be reduced.

Idea 2

  • If the inet speed of the user is fast and the user uploads a lot of images at once in the same request, the server can distribute the work itself between different threads and provide the full output at once (because inet speed is fast, that won't be a bottleneck).

@jywarren
Copy link
Member Author

jywarren commented Mar 27, 2019 via email

@harshkhandeparkar
Copy link
Member

@jywarren

passing a JSON post request to this app with URLs

Where would the URLs link to? Some other site like imgur? If yes, how will the user upload images to that site? Wouldn't that be a bottleneck?

@harshkhandeparkar
Copy link
Member

Also is there a way to "stream" the images? I mean the rgba data values for each pixel can be sent in a stream for a max of a few imgs at once. The output can also be received in a stream.
This method can only be used for a few modules like contrast which perform the same func on each pixel(on use changePixel func).
Maybe the optimal method can be chosen by the user(or frontend) at runtime based on the requirements. The API should just support different options.

@harshkhandeparkar
Copy link
Member

My concern is that process time(local processing time) should not be less than the upload time for multiple imgs so parallelizing upload, processing and download I think can be a good way to do this. I'm not sure though.

@jywarren
Copy link
Member Author

jywarren commented Apr 19, 2019

OK, removing the parameters, to just use the default webgl-distort settings, finally it tried to boot Chrome!

https://us-central1-public-lab.cloudfunctions.net/is-function/api/v1/process/?steps=[{"id":1,"input":"https://publiclab.org/i/31655.jpg","steps":"webgl-distort"}]

But it did not succeed:

E  is-function n2cbn71yu75d Unhandled rejection is-function n2cbn71yu75d 
  undefined
E  is-function n2cbn71yu75d Error: Failed to launch chrome!
[0419/212419.930186:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.


TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

    at onClose (/srv/node_modules/puppeteer/lib/Launcher.js:342:14)
    at Interface.helper.addEventListener (/srv/node_modules/puppeteer/lib/Launcher.js:331:50)
    at emitNone (events.js:111:20)
    at Interface.emit (events.js:208:7)
    at Interface.close (readline.js:368:8)
    at Socket.onend (readline.js:147:10)
    at emitNone (events.js:111:20)
    at Socket.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:139:11) is-function n2cbn71yu75d 
  undefined

@jywarren
Copy link
Member Author

@jywarren
Copy link
Member Author

OK, also opened publiclab/image-sequencer#1033 and publiclab/image-sequencer#1032 which also need to be solved before this will run. 🎉

@jywarren
Copy link
Member Author

jywarren commented Apr 19, 2019

Once we can confirm this is running, we'll need to make a converter function from the input.json format in publiclab/mapknitter-exporter-sinatra#1 (comment) to this URL format shown here:

https://us-central1-public-lab.cloudfunctions.net/is-function/api/v1/process/?steps=[{"id":1,"input":"https://publiclab.org/i/31655.jpg","steps":"webgl-distort{nw:0,101|ne:1023,-51|se:1223,864|sw:100,764}"}]

@jywarren
Copy link
Member Author

@jywarren
Copy link
Member Author

jywarren commented Apr 30, 2019

OK, well -- with a small image, this works!!!

https://us-central1-public-lab.cloudfunctions.net/is-function-edge/api/v1/process/?steps=[{%22id%22:1,%22input%22:%22https://i.publiclab.org/system/images/photos/000/000/354/thumb/Boots-ground-02.png%22,%22steps%22:%22webgl-distort%22}]

Input:

Output:

image

Here's a better test image:


OK! We had a JSON parse error but it seems to relate to the encoding of the punctuation marks; this URL will bypass it:

https://us-central1-public-lab.cloudfunctions.net/is-function-edge/api/v1/process/?steps=[{"id":1,"input":"https://i.publiclab.org/i/31778.png","steps":"webgl-distort"},{"id":2,"input":"https://i.publiclab.org/i/31778.png","steps":"webgl-distort"}]

It's not completing... but it meets a timeout: Function execution took 540002 ms, finished with status: 'timeout'

We either need to optimize a LOT, or to move to a non Cloud Functions approach where we can leave this open longer and/or increase power to the container we run it in.

anyways, next step would be to try to pull in the output of one sequencer into the other, with a sequence like: import-image{url:https://i.publiclab.org/i/31778.png},overlay{x:10|y:20}

Actually, with the function code @tech4GT has written, we should be able to point at the output of the earlier step with a number as input, and then use overlay:

https://us-central1-public-lab.cloudfunctions.net/is-function-edge/api/v1/process/?steps=[{"id":1,"input":"https://i.publiclab.org/i/31778.png","steps":"webgl-distort"},{"id":2,"input":1,"steps":"import-image{url:https://i.publiclab.org/i/31778.png},overlay"}]

@jywarren
Copy link
Member Author

jywarren commented Apr 30, 2019

Trying something that doesn't require Puppeteer, to confirm that we can do this overlay thing: https://us-central1-public-lab.cloudfunctions.net/is-function-edge/api/v1/process/?steps=[{%22id%22:1,%22input%22:%22https://publiclab.org/i/31655.jpg%22,%22steps%22:%22colormap%22},{%22id%22:2,%22input%22:1,%22steps%22:%22import-image{url:https://publiclab.org/i/31655.jpg},overlay%22}]

Update: OK! Got an error for the data-url here:

Bad image path { src: 'data:text/html; charset=utf-8;base64,PGh0bWw+PGJvZHk+WW91IGFyZSBiZWluZyA8YSBocmVmPSJodHRwczovL3B1YmxpY2xhYi5vcmcvc3lzdGVtL2ltYWdlcy9waG90b3MvMDAwLzAzMS82NTUvbGFyZ2UvREpJXzAwMjRfc21hbGwuanBnIj5yZWRpcmVjdGVkPC9hPi48L2JvZHk+PC9odG1sPg==', 

Update: haha, cool, that data-url wasn't an image -- it displayed:

You are being redirected.

@tech4GT
Copy link
Member

tech4GT commented May 8, 2019

@jywarren Is the parsing error specific to google cloud or are we facing it locally too?

@tech4GT
Copy link
Member

tech4GT commented May 8, 2019

Can you guide me to replicate the error locally?

@jywarren
Copy link
Member Author

jywarren commented May 8, 2019 via email

@jywarren
Copy link
Member Author

jywarren commented May 8, 2019

publiclab/image-sequencer#1033 shows how to replicate it in the CLI mode - it gives the same error, so it seems likely it's the same parsing issue!

@jywarren
Copy link
Member Author

jywarren commented May 8, 2019

I'm sorry - wait. The error is not the same. But @VibhorCodecianGupta has gone into some detail on the parsing in there, so there is more context available in publiclab/image-sequencer#1033.

@jywarren
Copy link
Member Author

jywarren commented May 8, 2019

So, i'd go by this comment to replicate it: #6 (comment)

And I bet you could get this running locally: https://gist.github.com/jywarren/7721a38725affdbc4fad2598ff1be59e

@tech4GT
Copy link
Member

tech4GT commented May 16, 2019

@jywarren So to sum this up, the problem is with using | marks with our regular express framework, correct?
The query that is being passed to the sequencer gets messed up on using them as we did in the urls above, so this should happen locally too, right?
I tried running webgl-distort with options on the app running locally on my machine!
Great!! I got the error locally!!
On this now.

@tech4GT
Copy link
Member

tech4GT commented May 16, 2019

Okay I have isolated this error to our import string functions. Okay I'll make sure to write some tests for the string parsing code to prevent this in the future. 👍

@tech4GT
Copy link
Member

tech4GT commented May 16, 2019

Also @jywarren I found out why the topological sort wasn't working! Actually that one is my fault since I didn't factor in that the graph might have nodes which are entirely disconnected. The library we are using does not support such nodes so all we need to do make that work is add those nodes in a loop. I'll make the necessary changes in a separate PR. 🎉

@jywarren
Copy link
Member Author

Oh this is great to hear you reproduced it!

@jywarren
Copy link
Member Author

Also this issue with the data url returned -- after these two fixes, would you mind taking a look? Thanks!

#6 (comment)

@jywarren
Copy link
Member Author

And Varun, if we get this version of multi-sequencer (from cloud functions) running in a Docker container, we can move it to a non Cloud Functions part of Google Cloud and get longer runtimes and more memory. I think @icarito can help us with this. So, let's aim to get all these bugs fixed and we'll ask @icarito for some help getting this booting in a docker container?

@tech4GT
Copy link
Member

tech4GT commented May 16, 2019

Sure! The project is already set up with docker, so it shouldn't be a problem adding the multi sequencer! 😄

@tech4GT
Copy link
Member

tech4GT commented May 17, 2019

Okay finally figured this out! The commas in the input for webgl-distort are messing with the commas separating steps!

@tech4GT
Copy link
Member

tech4GT commented May 17, 2019

@jywarren the problem is with the sequencer string not being URL safe. So how importString works is the inputs to every step, that is the stuff between {...} needs to be run through encodeURIComponent. So webgl-distort{nw:0,101|ne:1023,-51|se:1223,868|sw:100,764} will not work but this will webgl-distort{nw:0%2C101|ne:1023%2C-51|se:1223%2C868|sw:100%2C764}

@tech4GT
Copy link
Member

tech4GT commented May 17, 2019

I did it this way to make sure that complete functions can be passed in as arguments like in the blend step. 😄

@tech4GT
Copy link
Member

tech4GT commented May 28, 2019

Okay also closing this up, since we have fixed the API structure and implemented parallel processing!

@tech4GT tech4GT closed this as completed May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants