Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling images fails when image is bigger #242

Open
woonghu opened this issue Feb 1, 2019 · 7 comments
Open

Pulling images fails when image is bigger #242

woonghu opened this issue Feb 1, 2019 · 7 comments

Comments

@woonghu
Copy link

woonghu commented Feb 1, 2019

Hi. I wanna download & convert some of docker images to shifter images.
I've downloaded image successfully when image size is smaller than amount 4GB.
but the problem is that Image size is bigger than amount 4GB.

there are infinity "PULLING" messages and cannot finish pulling un image like this.

Message: {
"ENTRY": "MISSING",
"ENV": "MISSING",
"WORKDIR": "MISSING",
"groupACL": [],
"id": "MISSING",
"itype": "docker",
"last_pull": 1549005472.860614,
"status": "PULLING",
"status_message": "Extracting Layers",
"system": "mycluster",
"tag": [],
"userACL": []
}
2019-02-01T07:58:59 Pulling Image: docker:image_name:1.0.0-12, status: PULLING

and here are access and error log while pulling an image.

_==> error.log <==
[2019-02-01 07:59:37 +0000] [51665] [DEBUG] Closing connection.
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {u'status': u'PULLING', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549005477.959195, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'Extracting Layers', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549005472.860614, u'remotetype': u'dockerv2', u'_id': ObjectId('5c53f2a0227509ba7a533871'), u'arch': u'amd64'}

...

[2019-02-01 08:19:08 +0000] [51691] [DEBUG] Closing connection.
[2019-02-01 08:19:09 +0000] [51659] [CRITICAL] WORKER TIMEOUT (pid:51666)
[2019-02-01 08:19:09 +0000] [51666] [WARNING] 1
[2019-02-01 08:19:09 +0000] [51666] [ERROR] ERROR: dopull failed system=mycluster tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51666] [INFO] Worker exiting (pid: 51666)
[2019-02-01 08:19:09 +0000] [51685] [WARNING] Operation failed for 5c5400b6227509c9d2e26974
[2019-02-01 08:19:09 +0000] [51685] [INFO] Shutting down Status Thread
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {u'status': u'FAILURE', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549009149.245521, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'FAILURE', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549009078.383216, u'remotetype': u'dockerv2', u'_id': ObjectId('5c5400b6227509c9d2e26974'), u'arch': u'amd64'}

==> access.log <==
127.0.0.1 - - [01/Feb/2019:07:59:37 +0000] "POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/ HTTP/1.1" 200 290 "-" "-"_

I have tried to upgrade gunicorn(to 19.9) and also install gevent(1.3.6). but it does not help them.

ExecStart=/usr/bin/gunicorn
-b 0.0.0.0:6000 --backlog 2048
--log-level=debug
--access-logfile=/var/log/shifter_imagegw/access.log
--log-file=/var/log/shifter_imagegw/error.log
--timeout 60
--workers 4
--threads 4
--worker-class=gevent
shifter_imagegw.api:app

how could resolve this problem?

@scanon
Copy link
Member

scanon commented Feb 1, 2019

Which version are you running? Specifically is this the version that still uses celery? There is a timeout parameter that needs to be boosted.

@woonghu
Copy link
Author

woonghu commented Feb 1, 2019

I’m using 18.03.
Does parameter ‘PullUpdateTimeout’ mean?
I set it up to 3600. But it does not work.

@scanon
Copy link
Member

scanon commented Feb 3, 2019

I had to remind myself how I had fixed this before. The issue is with the gunicorn timeout.
I just submitted a PR for this. But you can take a look at it and see what has to be changed.

#243

You just need to modify the service script to add the -t 3600 option. That gives it an hour which should be enough even for very large images.

@woonghu
Copy link
Author

woonghu commented Feb 3, 2019

Thank you for response.

I have another question.

Why pulling & converting an image is consuming much more time unlike docker?

Is there way to improve that?

Happy new year:)

@scanon
Copy link
Member

scanon commented Feb 11, 2019

I meant to reply this sooner.

Shifter has to do the expansion and squash on each fresh pull. It does cache the layers. But it has to re-unzip each layer to build the squash image. I have noticed that the unzip for some layers can be very slow but have never been able to get to the bottom of it. I think it has something to do with the zip python library we use and how we are using.

I would recommend using a fast file system for the temporary space where the API/worker runs. If it is a large memory node (> ~32 GB), you can even use /dev/shm for the expand directory. This can help to some degree.

Let me know if you need the exact parameter to adjust.

@woonghu
Copy link
Author

woonghu commented Feb 11, 2019

Thank you Canon.

Your answer is very helpful for me.

would you give me the exact parameter to adjust?

@scanon
Copy link
Member

scanon commented Feb 11, 2019

Look this line the example config file. You just need to change that to /dev/shm or some other location that is one fast local storage or RAM.

https://github.com/NERSC/shifter/blob/master/imagegw/imagemanager.json.example#L12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants