Pulling images fails when image is bigger #242

woonghu · 2019-02-01T08:21:46Z

Hi. I wanna download & convert some of docker images to shifter images.
I've downloaded image successfully when image size is smaller than amount 4GB.
but the problem is that Image size is bigger than amount 4GB.

there are infinity "PULLING" messages and cannot finish pulling un image like this.

Message: {
"ENTRY": "MISSING",
"ENV": "MISSING",
"WORKDIR": "MISSING",
"groupACL": [],
"id": "MISSING",
"itype": "docker",
"last_pull": 1549005472.860614,
"status": "PULLING",
"status_message": "Extracting Layers",
"system": "mycluster",
"tag": [],
"userACL": []
}
2019-02-01T07:58:59 Pulling Image: docker:image_name:1.0.0-12, status: PULLING

and here are access and error log while pulling an image.

_==> error.log <==
[2019-02-01 07:59:37 +0000] [51665] [DEBUG] Closing connection.
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 07:59:37 +0000] [51691] [DEBUG] {u'status': u'PULLING', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549005477.959195, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'Extracting Layers', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549005472.860614, u'remotetype': u'dockerv2', u'_id': ObjectId('5c53f2a0227509ba7a533871'), u'arch': u'amd64'}

...

[2019-02-01 08:19:08 +0000] [51691] [DEBUG] Closing connection.
[2019-02-01 08:19:09 +0000] [51659] [CRITICAL] WORKER TIMEOUT (pid:51666)
[2019-02-01 08:19:09 +0000] [51666] [WARNING] 1
[2019-02-01 08:19:09 +0000] [51666] [ERROR] ERROR: dopull failed system=mycluster tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51666] [INFO] Worker exiting (pid: 51666)
[2019-02-01 08:19:09 +0000] [51685] [WARNING] Operation failed for 5c5400b6227509c9d2e26974
[2019-02-01 08:19:09 +0000] [51685] [INFO] Shutting down Status Thread
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] pull system=mycluster imgtype=docker tag=si-swhong:1.0.0-12
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'tag': u'si-swhong:1.0.0-12', 'itype': u'docker', 'system': u'mycluster'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {'magic': 'imagemngrmagic', 'uid': 0, 'system': u'mycluster', 'tokens': {u'soe-db1:5000': u'u:p', u'default': u'u:p'}, 'gid': 0, 'user': 'root', 'group': 'root'}
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] Pull called Test Mode=0
[2019-02-01 08:19:09 +0000] [51691] [DEBUG] {u'status': u'FAILURE', u'ostcount': u'0', u'itype': u'docker', u'format': u'squashfs', u'last_heartbeat': 1549009149.245521, u'os': u'linux', u'groupACL': [], u'system': u'mycluster', u'private': None, u'status_message': u'FAILURE', u'pulltag': u'si-swhong:1.0.0-12', u'replication': u'1', u'tag': [], u'userACL': [], u'location': u'', u'last_pull': 1549009078.383216, u'remotetype': u'dockerv2', u'_id': ObjectId('5c5400b6227509c9d2e26974'), u'arch': u'amd64'}

==> access.log <==
127.0.0.1 - - [01/Feb/2019:07:59:37 +0000] "POST /api/pull/mycluster/docker/si-swhong%3A1.0.0-12/ HTTP/1.1" 200 290 "-" "-"_

I have tried to upgrade gunicorn(to 19.9) and also install gevent(1.3.6). but it does not help them.

ExecStart=/usr/bin/gunicorn
-b 0.0.0.0:6000 --backlog 2048
--log-level=debug
--access-logfile=/var/log/shifter_imagegw/access.log
--log-file=/var/log/shifter_imagegw/error.log
--timeout 60
--workers 4
--threads 4
--worker-class=gevent
shifter_imagegw.api:app

how could resolve this problem?

scanon · 2019-02-01T17:42:40Z

Which version are you running? Specifically is this the version that still uses celery? There is a timeout parameter that needs to be boosted.

woonghu · 2019-02-01T22:38:45Z

I’m using 18.03.
Does parameter ‘PullUpdateTimeout’ mean?
I set it up to 3600. But it does not work.

scanon · 2019-02-03T01:14:36Z

I had to remind myself how I had fixed this before. The issue is with the gunicorn timeout.
I just submitted a PR for this. But you can take a look at it and see what has to be changed.

#243

You just need to modify the service script to add the -t 3600 option. That gives it an hour which should be enough even for very large images.

woonghu · 2019-02-03T23:32:59Z

Thank you for response.

I have another question.

Why pulling & converting an image is consuming much more time unlike docker?

Is there way to improve that?

Happy new year:)

scanon · 2019-02-11T04:46:09Z

I meant to reply this sooner.

Shifter has to do the expansion and squash on each fresh pull. It does cache the layers. But it has to re-unzip each layer to build the squash image. I have noticed that the unzip for some layers can be very slow but have never been able to get to the bottom of it. I think it has something to do with the zip python library we use and how we are using.

I would recommend using a fast file system for the temporary space where the API/worker runs. If it is a large memory node (> ~32 GB), you can even use /dev/shm for the expand directory. This can help to some degree.

Let me know if you need the exact parameter to adjust.

woonghu · 2019-02-11T05:11:56Z

Thank you Canon.

Your answer is very helpful for me.

would you give me the exact parameter to adjust?

scanon · 2019-02-11T23:59:51Z

Look this line the example config file. You just need to change that to /dev/shm or some other location that is one fast local storage or RAM.

https://github.com/NERSC/shifter/blob/master/imagegw/imagemanager.json.example#L12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pulling images fails when image is bigger #242

Pulling images fails when image is bigger #242

woonghu commented Feb 1, 2019

scanon commented Feb 1, 2019

woonghu commented Feb 1, 2019

scanon commented Feb 3, 2019

woonghu commented Feb 3, 2019

scanon commented Feb 11, 2019

woonghu commented Feb 11, 2019 •

edited

Loading

scanon commented Feb 11, 2019

Pulling images fails when image is bigger #242

Pulling images fails when image is bigger #242

Comments

woonghu commented Feb 1, 2019

scanon commented Feb 1, 2019

woonghu commented Feb 1, 2019

scanon commented Feb 3, 2019

woonghu commented Feb 3, 2019

scanon commented Feb 11, 2019

woonghu commented Feb 11, 2019 • edited Loading

scanon commented Feb 11, 2019

woonghu commented Feb 11, 2019 •

edited

Loading