Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency between local and distributed completion service file specifications #397

Open
d1donlydfink opened this issue Feb 21, 2020 · 0 comments

Comments

@d1donlydfink
Copy link

d1donlydfink commented Feb 21, 2020

When running distributed completion service runs against a data center with Minio S3 emulation in it, I must specify my file endpoints with s3://<host>:<port>/<bucket>/<path>
otherwise nothing works.

When running a similar run locally, I need to instead specify my file endpoints with http://<host>/<bucket>/<path>


There are really 2 problems here that have to do with consistency of file specification between the local and distributed worker execution in the studio python lib:

  1. When doing local runs (everything entirely within the studioml python library), I cannot specify the s3:// -style file paths, and thus, I am not able to use the same Minio S3 emulation for my local runs at all. What you get is an exception ending with:
  File "/home/danfink/venv/enn-3.6/lib/python3.6/site-packages/studio/util.py", line 333, in _get_active_s3_client
    raise NotImplementedError("Artifact store is not set up or has the wrong type")
  1. When doing distributed runs in a data center (using Minio and GoRunner), I cannot specify the http:// -style file paths at all to point at real S3 to get data when the studio database/storage requires the AWS creds of Minio, even when the http:// -style file ref points to something on S3 that requires no credentials to access. (e.g. I can wget it just fine, but somehow GoRunner doesn't like it within the context of getting a file for running a job within a data center)

Ideally, what I am really looking for is consistency between the two types of runs.
I would love to be able to set my files to be either s3:// style to access the data center
or http:// to access real S3 -- set it and forget it, but still be able to switch back and forth between local and distributed workers.

Pure conjecture as to some potential hurdles to this:

  • the studioml python libs that handle the local workers probably just does not know how to deal with s3://-style file references at all which would aid in credentialed access to S3/Minio
  • GoRunner for distributed execution might need to make a distinction between s3:// and http:// where s3:// is maybe always something that uses an AWS API and http//: is maybe something that never really requires credentials?
  • It's possible that the current conventions for GoRunner data centers do not allow firewall holes for regular S3 access, even for plain-old http/wget access for open-access buckets
  • There is nothing within the completion file dictionary spec that allows for multiple credentials (separate issue filed for that)

Feel free to split this up into multiple issues if that is the best way to attack the problems listed here, but again, what I am really looking for is file-specification consistency between the two main modes of running completion service jobs.

It's also possible that #381 addresses part of this, but in my recent testing with studioml==0.0.15, it's not all there yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant