Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink size of terra-jupyter-base #333

Open
SHuang-Broad opened this issue Jun 15, 2022 · 8 comments
Open

Shrink size of terra-jupyter-base #333

SHuang-Broad opened this issue Jun 15, 2022 · 8 comments

Comments

@SHuang-Broad
Copy link

Currently, it's 8.4GB

https://console.cloud.google.com/gcr/images/broad-dsp-gcr-public/US/terra-jupyter-base@sha256:8b8da2a3ac90e04015694b0fae20518eb38db4a5f7bc18144d3fbf81e8d27066/details?tag=latest

Pulling that takes a long time (on my config, it took over 25 minutes, though I expect it to be faster on GCP's network). This eats into the 30 minutes limit of creating one's custom environment on Terra.

Thank you.

@sjfleming
Copy link

I agree with this sentiment. Even if pulling the image does not eat into the time limit (somehow), the image is just too large. It's hard to develop on top of this as a base image when it is so huge.

Maybe the solution is to have a "developer" base image that is totally minimal?

@Qi77Qi
Copy link
Collaborator

Qi77Qi commented Jun 15, 2022

thanks for reporting...The image is big becuz we're using https://cloud.google.com/deep-learning-containers/docs/choosing-container and these images are around 10G themselves.

We do want to somehow make things better, but it's not currently prioritized yet..

@SHuang-Broad
Copy link
Author

To follow up, I just ran an experiment using us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:latest as the custom image, and the following startup script

#!/bin/bash

set -eu

echo "test"

It timed out.

Surely this is just one data point. But it's been consistent with my previous experiences with other startup scripts as well, where the startup script was just an installation of samtools and bcftools, and it took close to 30 minutes to finish.

We could build a custom image based on us.gcr.io/broad-dsp-gcr-public/terra-jupyter-base:latest, but it will be inevitably bigger and may time out easily, given this large initial overhead.

Can you please help on this?

@SHuang-Broad
Copy link
Author

I see.

It's imaginable some of us not needing the DL functionalities, but do need some of the Jupyter-related stuff, i.e. what @sjfleming has suggested.

So I think his suggestion makes sense, i.e. to truly make this a "base" image, as from IMHO my base case doesn't need GPU/DL.

That being said, I understand it's not a priority for you. So we'd appreciate it if there are instructions for a PR to make that happen.

@sjfleming
Copy link

sjfleming commented Jul 27, 2022

Just to add another observation, I see that us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:2.0.5 is currently 20.1 GB

@sjfleming
Copy link

And @Qi77Qi , as @SHuang-Broad mentioned, I would also be happy to contribute to a PR to make this happen, if we know what would be acceptable in terms of requirements

@Qi77Qi
Copy link
Collaborator

Qi77Qi commented Jul 27, 2022

@sjfleming it is sth we want to address at some point but really haven't had bandwidth to address....if you don't mind taking a stab at it, would you mind write a design doc before you attempt implementation? so that we can review the proposal and give feedback before you spend too much time on this?

@sjfleming
Copy link

Hi @Qi77Qi , thanks for your response. I can write a design doc; however, having zero experience with what a design doc is, I wonder if you could point me to an example! :)
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants