Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapKnitterExporter architecture discussion #298

Open
jywarren opened this issue Jan 16, 2019 · 61 comments
Open

MapKnitterExporter architecture discussion #298

jywarren opened this issue Jan 16, 2019 · 61 comments

Comments

@jywarren
Copy link
Member

jywarren commented Jan 16, 2019

We are exploring parallel tracks for cloud-based MapKnitter exporting, and one option is a JavaScript based process.

The base idea is to run the export process as a scalable web service, possibly "serverless" or REST, in Google Cloud and/or other cloud providers like Amazon AWS Lambda (primarily Google Cloud but compatible with others). Comments/suggestions/eurekas welcome! 🎉

Importantly, either track would ideally present the same API so that we could compare their performance.

JavaScript track

In this track, more experimentally, we'd use Image Sequencer, possibly with the webgl-distort library.

The major challenges here, I'd guess, would be:

  1. handling very big image files (up to 8mb each?) in memory
  2. serious speed improvements in IS, such as the proposed WebAssembly or WebGL adapters
  3. figuring out the best way to persist images for later access, and how to integrate the exporter with this (passing a callback function to upload them to a given store? Credentials?)
  4. trying to duplicate or integrate GDAL's generation of a giant combined GeoTIFF (just really huge images to manage in memory?)
  5. trying to duplicate or integrate GDAL's generation of TMS-formatted map tiles

For these last two, see #296 where there are some JS options to experiment with.

Also, we would try to develop this track in such a way as to make it possible to run locally in the browser, natively or in an Electron-style local JS app.

Ruby/ImageMagick/GDAL track

A more traditional approach is being explored here: #258, where we take the exporting sections currently featured in MapKnitter, and duplicate them in a minimal Ruby container that can be run on-demand.

Spec

To guide the development of both tracks, we're imagining a basic common behavior of:

  1. receiving a collection of image URLs or data-URLs of images AND a scale (cm/px or final pixel size)
  2. outputting a combined JPG image at a given scale or pixel size
  3. advanced versions might cut tiles or output GeoTiffs (see challenges in JS version above)

Links and resources are being compiled here: #296


What have I missed? @tech4GT @icarito would you mind adding any questions, clarifications?


Update: diagrams

I've put together a diagram of the current exporter workflow, which I hope is helpful. It's also largely ported into a standalone Ruby library in #341 -- soon to potentially be a Gem:

screenshot 2019-02-16 at 2 49 30 pm

Image Sequencer should allow us to parallelize this, and improve its speed, as illustrated in this diagram:

screenshot 2019-02-16 at 2 49 20 pm

@tech4GT
Copy link
Member

tech4GT commented Jan 16, 2019

@jywarren this looks really nice! Things immediately make a lot more sense!🎉

@icarito
Copy link
Member

icarito commented Jan 22, 2019

I've just spent some time deploying a learning project to Google Cloud Platform (App Engine) as a Docker container. I've got a better understanding now of what is required! Thanks!

@jywarren
Copy link
Member Author

jywarren commented Jan 22, 2019 via email

@jywarren
Copy link
Member Author

jywarren commented Jan 28, 2019

OK, i'd like to add in an overview of the export system step by step; i've left notes where we might make changes or improvements as well, and will link to lines of code where these things currently happen!

Also -- a couple ideas:

  1. Idea: produce separate GeoTiffs to skip the
  2. Idea: produce TMS tiles from any collection of images, given tile coordinates and image sources (with known corner coordinates)

Breaking down the export process

Separable steps

  1. collect set of image URLs and their corner coordinates
  2. for each image (could do this from existing Ruby code or in npm module):
    • determine image pixel dimensions
    • convert corner coordinates to pixel positions
  3. for each image (using existing Ruby/ImageMagick code or in remote Image Sequencer container):
  4. given collection of warped images, calculate pixel positions of image collection relative to each other (Ruby code exists)
    • (optional alternative) produce SVG or PDF containing images at relative positions (less memory use)
    • currently code appears in https://github.com/publiclab/mapknitter/blob/main/app/models/map.rb#L231 in
      run_export, distort_warpables, generate_composite_tiff, generate_tiles, generate_jpg
    • produce composite/merged image using this data
    • save and return URL of combined image for download
    • (optional) produce GeoTiff of combined image
    • (optional) pass GeoTiff to GDAL for conversion into traditional TMS tileset
  5. Possible next steps:
    • produce merged TMS of step 3 per-image TMS tiles instead of generating from step 4's giant GeoTiff
    • produce single TMS from combination of per-image GeoTiffs from end of step 3

@jywarren
Copy link
Member Author

@SidharthBansal @tech4GT @icarito just so you see this additional note breaking down the export process. There are portions that could be accomplished with traditional ImageMagick/GDAL combo just breaking out the Ruby-controlled code in our codebase (see #296 but i'll copy in more here), but I am hoping we can accomplish a lot in stand-alone containers in a serverless or at least remote REST model.

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

Starting work now! @icarito Can you please share some of the resources you have been going through, that would be a big help for me :)

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren @icarito I would be starting with a basic express configuration that takes an image url and a sequencer string and returns the final output, can we create a repository for this on publiclab? Or should I make this on my github??

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

Okay a couple of things here

  • We should add a flag to the run config which allows us to disable the progress logs(it'll unnecessarily slow down the server otherwise)
  • I have some ideas in mind to speed up the pixelManipulation API which will in turn speed up most modules
  • Should we return the output as a data uri or us the imgur service like we originally planned? Or maybe we can have a parameter in the request which allows both options
/* Request Body */
{
'url': <String>, // URL if input image
'sequence': <String> // The sequence string which will be imported into sequencer,
'upload': <Boolean> // Denotes whether to return the data uri or to upload to imgur and return that
}

How does this sound @jywarren @icarito ??

@jywarren
Copy link
Member Author

jywarren commented Feb 7, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren Ok pushing the most basic setup now!

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren Can you please grant me push access to the repository 😅

@jywarren
Copy link
Member Author

jywarren commented Feb 7, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

@jywarren One more thing, do you want me to get cracking on the optimizations for sequencer first or deploy the container first?

@jywarren
Copy link
Member Author

jywarren commented Feb 7, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

Okay I'll try to deploy the container with a very basic setup tomorrow, and then I'll raise an issue for the optimizations, maybe I can document some of my ideas over there too!
Also on a different note I tried out the app locally and it works like a charm ✌️

@jywarren
Copy link
Member Author

jywarren commented Feb 7, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

One think I am concerned about though is, if we do switch to web assembly, what parts of the main code we would need to re-write or should we just switch to something like openCV entirely?
I think we can start with making optimizations in javascript and then move towards web-assembly if that gets unmanageable, what do you think?

@jywarren
Copy link
Member Author

jywarren commented Feb 7, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Feb 7, 2019

I think you are right, also please do have a look at the repository, I have pushed the basic file I wrote earlier today, will be extending this A LOT but I think this gives us a start.

@icarito
Copy link
Member

icarito commented Feb 8, 2019

Just a note that Google Cloud Engine has Standard Environment and Flexible Environment and Ruby seems to only be supported on Flexible Environment which is significantly more expensive: https://cloud.google.com/appengine/docs/standard/appengine-generation

@tech4GT
Copy link
Member

tech4GT commented Feb 9, 2019

Oh, but is-app will be pure node.js, so I guess thats not a problem!

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

Also I was meaning to ask whether leaflet works in node? Actually I have no idea about that. Does it use gl?

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

Okay so do you want me to update import image to work in node using method similar to load-image then?

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

I mean without that there's no way of merging images anyway, so let's go with this for now!

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

Okay and another thing is that we need a way to figure out where exactly to overlay the images onto one another, as in if some Image needs to be rotated first and such. Would it be possible for you to give me a simple sample to work this out, actually I tried doing this with the test dataset you mentioned but it's kind of confusing. I was wondering If I can get a stripped down version of that data on which I can figure this out.

@jywarren
Copy link
Member Author

jywarren commented Mar 19, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Mar 19, 2019

I'll check this out!

@tech4GT
Copy link
Member

tech4GT commented Mar 20, 2019

Alright, so we have import Image working now, I'll try a workflow of combining Images, right now overlay cuts the image off but we can fix that in a stitch module later.

@jywarren
Copy link
Member Author

Yes or perhaps we can make a "resize-canvas" or "canvas-size" module; that might be useful later anyways.

@jywarren
Copy link
Member Author

Portions of the above in the Ruby workflow are coming online:

Thanks @icarito !!!

@tech4GT
Copy link
Member

tech4GT commented Apr 4, 2019

@jywarren Since divy is looking into puppeteer for the gl based implementation, I was wondering what should we pick next?
I mean on the js track we are left with canvas-resize, geotiff and JSON conversion. Did I miss anything?

@jywarren
Copy link
Member Author

jywarren commented Apr 4, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented Apr 5, 2019

Sure! I’ll get started of cancas-resize then. 😄

@tech4GT
Copy link
Member

tech4GT commented May 21, 2019

Okay just to sum up, we have a basic exporter process running in gCloud based on is-app
Now the nest steps would be:

  • To complete the function which converts the lat/lon coordinates in the json to the inputs for the sequencer module
  • Export the completed app as a docker container and deploy it on cloud engine
  • Add a geotiff generation module in image sequencer
  • Doing optimizations to make the export process faster.
    cc @jywarren @icarito

@jywarren
Copy link
Member Author

This seems just right, although geotiff can wait, for sure! I think the key here is getting the canvas-resize, the overlay/compositing, and the whole thing running to generate a correct output for a given map, which need not include geotiff for now. And actually the optimization is important, but with more memory, we should be OK on that too, and can circle back to that later.

For reference, here's where we're testing the output of map exports in the Ruby-based parallel exporter project: publiclab/mapknitter-exporter-sinatra#23

Those maps and the warpables.json files they're based on are great for demonstrating that this system is working properly and for comparing the output to what MapKnitter shows!

@tech4GT
Copy link
Member

tech4GT commented May 28, 2019

Update on the node.js exporter:

  • The export process is now using multiSequencer parallelized workflow!
  • The app is set up with docker, and is ready to be deployed and tested.
    cc @jywarren

@jywarren
Copy link
Member Author

jywarren commented May 28, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented May 28, 2019

@jywarren I was also thinking of adding a benchmarking test like we have on main is repository. That'll tell us how each of the optimizations we make affects the server response times!

One more thing.. what kind of workload can we expect on this service? I mean is it possible we would be dealing with multiple requests at the same time?
Actually, right now I have not scaled the app for multiple processors, and it will depend on our load how we want to scale.
So if we can have concurrent requests then I recommend using cluster API to mimic the server instance on all the cores.
And if we are looking at single request at a time we can run the different steps on different cores to make the process even faster?
😄

@jywarren
Copy link
Member Author

jywarren commented May 28, 2019 via email

@tech4GT
Copy link
Member

tech4GT commented May 28, 2019

@jywarren Could you please also respond to the query I mentioned above! Thanks a ton! :)

@jywarren
Copy link
Member Author

We should be able to load balance a cluster using the infrastructure Sebastian has made, yes!

@tech4GT
Copy link
Member

tech4GT commented May 28, 2019

Oh! In that case I don't have to worry about it! Awesome!! Thanks a lot!! 😃

@jywarren
Copy link
Member Author

jywarren commented May 28, 2019 via email

@jywarren
Copy link
Member Author

Getting closer in #1192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants