Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build #815

Closed
wants to merge 7 commits into from
Closed

Docker build #815

wants to merge 7 commits into from

Conversation

ccharest93
Copy link
Contributor

Added a docker API client, and changed the build function to its low-level version. This makes it return an event stream that we can then log to screen for real-time update on the image build process. This prints the same things as running docker build locally, so docker RUN commands can get quite verbose, we could filter the 'stream' events and only print the ones starting with 'Step ' or '---> ' .

Also changed the log message from api.py to make more sense (encompasses the case where dockerfile.torchx is used).

#813

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2024
@ccharest93
Copy link
Contributor Author

ccharest93 commented Feb 6, 2024

  • Fixed the pyre typing errors
  • Unitest are failling because of codecov failing to upload?
  • Slurm integration stuck in post

I also removed a redundant error case in my commit ( next() doesnt raise ValueError) and renamed output to build_msg.

@ccharest93 ccharest93 marked this pull request as draft February 11, 2024 12:49
@ccharest93
Copy link
Contributor Author

ccharest93 commented Feb 11, 2024

My previous commit #809 was flawed, it didn't change logic cause if it did, it would introduce a bug. Ill fix it in a different commit, in the meantime this commit is in draft mode.

@d4l3k
Copy link
Contributor

d4l3k commented Apr 9, 2024

@ccharest93 are you interested in polishing this up and landing it? cc @clumsy

if len(old_imgs) == 0 or role.image not in old_imgs:
role.image = image.id
build_msg: Dict[str, str] = {}
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use a simple for loop?

for build_msg in build_events:
    if "stream" in build_msg:
        output_str = build_msg["stream"].strip("\r\n").strip("\n")
        if output_str:
            log.info(output_str)
    if "aux" in build_msg:
        image_id = build_msg["aux"]["ID"]
        break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build events is a stream, until stop_iteration is given we need to wait for further input on the stream. A for_loop wouldn't wait for the stream to complete and would only print events up to this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it how Iterables/generators work in for loop in Python? If you give a for loop an Iterable/generator (something that has __next__()) it will call __next__() until it faces StopIteration and exists the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem to be working fine, please check the latest version of #874

@@ -119,7 +131,7 @@ def build_workspace_and_update_role(
f"failed to pull image {role.image}, falling back to local: {e}"
)
log.info("Building workspace docker image (this may take a while)...")
image, _ = self._docker_client.images.build(
build_events = self._docker_api_client.build(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we simply use the second value in the tuple that self._docker_client.images.build() returns? It's the same build_events Iterable, isn't it? Then we don't have to copy the logic to retrieve the image id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it done so that we can print all the output leading to the error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so we don't seem to print error events

Copy link
Contributor Author

@ccharest93 ccharest93 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i recall correctly, the first call is blocking and returns all the events at the same time, the second isnt and returns a stream which we can then print/process in real time

@ccharest93
Copy link
Contributor Author

Ok i cleaned up the commit, i still need to rebuild my devcontainer to run tests and fix any errors that show up, will do this weekend. Would still like feedback on my original post though. Don't want to end up cluttering the torchx logs too much is my opinion.

This prints the same things as running docker build locally, so docker RUN commands can get quite verbose, we could filter the 'stream' events and only print the ones starting with 'Step ' or '---> ' .

@clumsy
Copy link
Contributor

clumsy commented Apr 10, 2024

@ccharest93 adding quiet scheduler_arg should help suppressing where it's needed.

@ccharest93
Copy link
Contributor Author

This was landed in #874 right? I can close this PR? @clumsy

@ccharest93 ccharest93 closed this Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants