MAINT: use Github based GPU instance #181

mmcky · 2024-05-09T04:51:21Z

This PR makes use of the Tesla T4 instance now available on GitHub Actions as a beta instance.

uses github actions supplied instances
deploys from github containers for our docker environment to speed up builds

this migrates

ci
publish
cache

to run on GitHub actions.

netlify · 2024-05-09T04:51:37Z

✅ Deploy Preview for incomparable-parfait-2417f8 ready!

Name	Link
🔨 Latest commit	`a672c97`
🔍 Latest deploy log	https://app.netlify.com/sites/incomparable-parfait-2417f8/deploys/666a2c387abc8e0008e222c7
😎 Deploy Preview	https://deploy-preview-181--incomparable-parfait-2417f8.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

github-actions · 2024-05-09T05:03:27Z

🚀 Deployed on https://666a36597c14415a37c64edb--incomparable-parfait-2417f8.netlify.app

mmcky · 2024-05-09T05:14:43Z

re-enable build cache
remove docker dependency and test local builds using anaconda (simpler)

mmcky · 2024-05-09T06:01:49Z

The results between EC2 (left) and GitHub Actions (right)

@jstac there is a really interesting mix of timing results here between the V100 on EC2 and the T4 on GitHub. Many times are lower with a few exceptions such as wealth_dynamics. I will try and understand the root causes.

mmcky · 2024-05-09T07:47:51Z

Just triggered a new publish so we are comparing like for like. (https://github.com/QuantEcon/lecture-jax/releases/tag/publish-2024may09)

kp992 · 2024-05-09T10:59:55Z

This is interesting @mmcky! Now since we are using GA's GPU for trial, shall we compare the costs that we were having through AWS and now on GA -- maybe need to figure out how are we going to compare the costs since I believe it will depend on the frequency of commit push?

mmcky · 2024-05-09T23:06:15Z

This is interesting @mmcky! Now since we are using GA's GPU for trial, shall we compare the costs that we were having through AWS and now on GA -- maybe need to figure out how are we going to compare the costs since I believe it will depend on the frequency of commit push?

that's right @kp992 -- the pricing is:

Service	Cost	Units
EC2 (`p2.xlarge`)	$0.90	per instance Hour
GA (`Ubuntu GPU 4-core`)	$0.07	per minute

So if we have a 10 minute job then

Service	Cost
EC2 (`p2.xlarge`)	$0.90
GA (`Ubuntu GPU 4-core`)	$0.70

so the pricing really depends on the frequency of long runs vs short runs. Honestly (while the per hour price on GA is a LOT higher, I think it will work out to be pretty similar).

mmcky · 2024-05-09T23:11:02Z

@kp992 this is the like-for-like time comparisons now with the current live site.

still an interesting mix of performance differences.

Machine Details:

EC2:

Name	GPUs	vCPUs	RAM (GiB)	NetworkBandwidth	Price/Hour*	RI Price / Hour**
p2.xlarge	1	4	61	High	$0.900	$0.425

Github:

CPU	GPU	GPU card	Memory (RAM)	GPU memory (VRAM)	Storage (SSD)	Operating system (OS)
4	1	Tesla T4	28 GB	16 GB	176 GB	Ubuntu, Windows

So it appears we are running on a machine with less RAM which is interesting.

mmcky · 2024-05-10T01:23:57Z

remove the docker container layer to see if that speeds up compute times

This reverts commit 6b3a167.

mmcky · 2024-05-10T02:41:33Z

Currently the kernel is dying when installing directly onto the vm provided by github (rather than using our docker container). IT would be quicker and more efficient to get this route working.

kp992 · 2024-05-13T05:23:04Z

Thanks @mmcky, are we moving forward with moving to Github Actions VM for all our repos using AWS?

mmcky · 2024-05-13T05:26:23Z

Thanks @mmcky, are we moving forward with moving to Github Actions VM for all our repos using AWS?

I would like to if we can -- as that is less to maintain. But currently I am getting issues with kernels dying which suggest that jax install isn't working properly (without a container).

mmcky · 2024-05-21T00:59:49Z

The driver versions under docker are:

NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.3

and when using the native VM

 NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2

so the CUDA version is likely causing the issue?

mmcky · 2024-05-21T01:04:58Z

@kp992 any ideas on why the jupyter kernel is dying when running directly on the VM but the docker container is OK?

@mmcky can we host the docker container on github to speed up the compute?

kp992 · 2024-05-21T12:56:14Z

Thanks @mmcky, I will try to look into it. I will create a new PR on top of these commits so I can test and play around separately.

mmcky · 2024-06-10T06:18:59Z

@mmcky working on using github containers to store the docker container here
QuantEcon/lecture-python.docker#4

mmcky · 2024-06-11T04:19:49Z

@kp992 the fetch from github containers is about 10min. That is pretty good right?

mmcky · 2024-06-11T04:21:25Z

@kp992 it looks like these instances may have CUDA=12.3 installed. Our docker is configured for CUDA=12.5 so there are a lot of ptax warnings. We may need to adjust the Docker container to enable this context (or upgrade CUDA drivers). I think CUDA upgrades would require a reboot but looking into it.

mmcky · 2024-06-11T05:36:45Z

@kp992 looks like the newer CUDA driver is working. Will post a speed comparison with the current live site once I get the preview.

mmcky · 2024-06-11T06:11:54Z

@jstac and @kp992 here are the latest results moving our computations to the GitHub based GPU instance. LHS = current live site (built on EC2) and RHS = this PR (built on Github + using CUDA=12.5 driver). Many times are improved except for Wealth Dynamics (@kp992 would you mind reviewing this lecture to see why this might be?)

jstac · 2024-06-12T00:12:30Z

Thanks @mmcky , good to know.

kp992 · 2024-06-12T02:13:22Z

Thanks @mmcky, this looks great. I can look at the wealth dynamics timings difference.

mmcky · 2024-06-12T04:08:00Z

thanks @jstac and @kp992. I am doing one final round of review on this and then I will migrate to use github instances for this lecture series as well.

mmcky · 2024-06-12T05:13:59Z

check this closely as the nvidia-smi is reporting the following and the docker container is using CUDA=12.5

NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 12.3

AH HA! That page hasn't bee re-executed as the date is from 06th of June. This will refresh in a full build.

mmcky · 2024-06-12T23:26:01Z

@kp992 I think this is ready. If you can cast your eye over it one more time then I'll merge.

kp992

Looks perfect! Thanks @mmcky

MAINT: use Github based GPU instance

1e5709d

github-actions bot temporarily deployed to pull request May 9, 2024 05:03 Inactive

TMP: disable build cache

d1fac0d

github-actions bot temporarily deployed to pull request May 9, 2024 05:57 Inactive

update status page

6e55c8f

mmcky added 4 commits May 10, 2024 11:30

TST: remove docker layer and test locally

61d5561

try building html only and add numpyro

398dcd0

try cuda_pip installer

6b3a167

Revert "try cuda_pip installer"

a1d4efb

This reverts commit 6b3a167.

github-actions bot temporarily deployed to pull request May 10, 2024 02:50 Inactive

remove numpyro

11fc84f

revert to docker container to check nvidia driver versions

543e50a

github-actions bot temporarily deployed to pull request May 13, 2024 05:36 Inactive

github-actions bot temporarily deployed to pull request May 13, 2024 06:10 Inactive

kp992 mentioned this pull request May 22, 2024

TEST: use Github based GPU instance for CI #183

Closed

TST: fetch github actions hosted container

8274171

mmcky added the in-work label Jun 11, 2024

try install updated nvidia drivers (need reboot?)

ecb00c6

mmcky added 2 commits June 11, 2024 14:27

update apt sources

92f9247

use latest cuda=12.5 container

dd66b38

github-actions bot temporarily deployed to pull request June 11, 2024 06:02 Inactive

enable latex and download notebook builds

4c5cbf9

github-actions bot temporarily deployed to pull request June 12, 2024 04:52 Inactive

switch cache and publish to github actions

a672c97

github-actions bot temporarily deployed to pull request June 12, 2024 23:59 Inactive

mmcky added ready and removed in-work labels Jun 13, 2024

mmcky mentioned this pull request Jun 13, 2024

MAINT: Update to use the latest docker for GPU lectures QuantEcon/meta#131

Open

3 tasks

kp992 approved these changes Jun 13, 2024

View reviewed changes

mmcky merged commit c995915 into main Jun 13, 2024
6 checks passed

mmcky deleted the maint-gpu-runner branch June 13, 2024 03:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: use Github based GPU instance #181

MAINT: use Github based GPU instance #181

mmcky commented May 9, 2024 •

edited

Loading

netlify bot commented May 9, 2024 •

edited

Loading

github-actions bot commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

kp992 commented May 9, 2024

mmcky commented May 9, 2024

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 10, 2024 •

edited

Loading

mmcky commented May 10, 2024

kp992 commented May 13, 2024

mmcky commented May 13, 2024

mmcky commented May 21, 2024

mmcky commented May 21, 2024 •

edited

Loading

kp992 commented May 21, 2024

mmcky commented Jun 10, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

jstac commented Jun 12, 2024

kp992 commented Jun 12, 2024

mmcky commented Jun 12, 2024

mmcky commented Jun 12, 2024 •

edited

Loading

mmcky commented Jun 12, 2024

kp992 left a comment

MAINT: use Github based GPU instance #181

MAINT: use Github based GPU instance #181

Conversation

mmcky commented May 9, 2024 • edited Loading

netlify bot commented May 9, 2024 • edited Loading

✅ Deploy Preview for incomparable-parfait-2417f8 ready!

github-actions bot commented May 9, 2024 • edited Loading

mmcky commented May 9, 2024 • edited Loading

mmcky commented May 9, 2024 • edited Loading

mmcky commented May 9, 2024 • edited Loading

kp992 commented May 9, 2024

mmcky commented May 9, 2024

mmcky commented May 9, 2024 • edited Loading

mmcky commented May 10, 2024 • edited Loading

mmcky commented May 10, 2024

kp992 commented May 13, 2024

mmcky commented May 13, 2024

mmcky commented May 21, 2024

mmcky commented May 21, 2024 • edited Loading

kp992 commented May 21, 2024

mmcky commented Jun 10, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

mmcky commented Jun 11, 2024

jstac commented Jun 12, 2024

kp992 commented Jun 12, 2024

mmcky commented Jun 12, 2024

mmcky commented Jun 12, 2024 • edited Loading

mmcky commented Jun 12, 2024

kp992 left a comment

Choose a reason for hiding this comment

mmcky commented May 9, 2024 •

edited

Loading

netlify bot commented May 9, 2024 •

edited

Loading

github-actions bot commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 9, 2024 •

edited

Loading

mmcky commented May 10, 2024 •

edited

Loading

mmcky commented May 21, 2024 •

edited

Loading

mmcky commented Jun 12, 2024 •

edited

Loading