cgroups are done ‼️ this course is going to be so good

btholt · Apr 14, 2024 · ad10f5d · ad10f5d
1 parent 11104ef
commit ad10f5d
Show file tree

Hide file tree

Showing 3 changed files with 87 additions and 90 deletions.
diff --git a/lessons/02-crafting-containers-by-hand/C-namespaces.md b/lessons/02-crafting-containers-by-hand/C-namespaces.md
@@ -32,6 +32,7 @@ exit # from our chroot'd environment if you're still running it, if not skip thi
 
 # install debootstrap
 
+```bash
 apt-get update -y
 apt-get install debootstrap -y
 debootstrap --variant=minbase jammy /better-root
@@ -42,7 +43,7 @@ unshare --mount --uts --ipc --net --pid --fork --user --map-root-user chroot /be
 mount -t proc none /proc # process namespace
 mount -t sysfs none /sys # filesystem
 mount -t tmpfs none /tmp # filesystem
-``
+```
 
 This will create a new environment that's isolated on the system with its own PIDs, mounts (like storage and volumes), and network stack. Now we can't see any of the processes!
 

diff --git a/lessons/02-crafting-containers-by-hand/D-cgroups.md b/lessons/02-crafting-containers-by-hand/D-cgroups.md
@@ -2,62 +2,6 @@
 title: cgroups
 ---
 
-### TODO - This is the valid cgroup2 code
-
-```bash
-
-grep -c cgroup /proc/mounts # if this is greater than 0, then you're on cgroups v1 and this won't work
-
-mkdir /sys/fs/cgroup/sandbox # creates the cgroup
-
-# Find your PID, it's the bash one immediately after the unshare
-cat /sys/fs/cgroup/cgroups.proc # should see the process in the root cgroup
-echo <PID> > /sys/fs/cgroup/sandbox/cgroup.procs # puts the unshared env into the cgroup called sandbox
-cat /sys/fs/cgroup/sandbox/cgroup.proc # should see the process in the sandbox cgroup
-
-cat /sys/fs/cgroup/cgroups.proc # should see the process no longer in the root cgroup - processes belong to exactly 1 cgroup
-mkdir /sys/fs/cgroup/other-procs # make new cgroup for the rest of the processes, you can't modify cgroups that have processes and by default Docker doesn't include any subtree_controllers
-echo <PID> > /sys/fs/cgroup/other-procs/cgroup.procs # you have to do this one at a time for each process
-
-cat /sys/fs/cgroup/sandbox/cgroup.controllers # no controllers
-cat /sys/fs/cgroup/cgroup.controllers # should see all the available controllers
-echo "+cpuset +cpu +io +memory +hugetlb +pids +rdma" > /sys/fs/cgroup/cgroup.subtree_control # add the controllers
-cat /sys/fs/cgroup/sandbox/cgroup.controllers # all the controllers now available
-
-### Peg the CPU
-
-apt-get install htop # a cool visual representation of CPU and RAM being used
-htop
-
-yes > /dev/null # inside #1 / the cgroup/unshare – this will peg one core of a CPU at 100% of the resources available, see it peg 1 CPU
-kill -9 <PID of yes> # from #2, (you'll have to stop htop with CTRL+C) to stop the CPU from being pegged
-htop
-
-echo '5000 100000' > /sys/fs/cgroup/sandbox/cpu.max # this allows the cgroup to only use 5% of a CPU
-yes > /dev/null # inside #1 / the cgroup/unshare – this will peg one core of a CPU at 5% since we limited it
-kill -9 <PID of yes> # from #2, to stop the CPU from being pegged
-htop
-
-### Limit memory
-
-yes | tr \\n x | head -c 1048576000 | grep n # run this from #3 terminal and watch it in htop to see it consume about a gig of RAM and 100% of CPU core, CTRL+C to stop it
-cat /sys/fs/cgroup/sandbox/memory.max # should see max, so the memory is unlimited
-echo 83886080 > /sys/fs/cgroup/sandbox/pids.max # set the limit to 80MB of RAM
-yes | tr \\n x | head -c 1048576000 | grep n # from inside #1, see it limit both the CPU and the RAM taken up
-
-### Stop fork bombs
-
-cat /sys/fs/cgroup/sandbox/pids.current # See how many processes the cgroup has at the moment
-cat /sys/fs/cgroup/sandbox/pids.max # See how many processes the cgroup can create before being limited (max)
-echo 5 > /sys/fs/cgroup/sandbox/pids.max # set a limit that the cgroup can only run 5 processes
-for a in $(seq 1 5); do sleep 60 & done # this runs 5 60 second processes that run and then stop. run this from within #2 and watch it work. now run it in #1 and watch it not be able to.
-
-:(){ :|:& };: # DO NOT RUN THIS ON YOUR COMPUTER. This is a fork bomb. If not accounted for, this would bring down your computer. However we can safely run inside our #1 because we've limited the amount of PIDs available. It will end up spawning about 100 processes total but eventually will run out of forks to fork.
-
-```
-
-### END TODO - below is the old cgroup1 code
-
 Okay, so now we've hidden the processes from Eve so Bob and Alice can engage in commerce in privacy and peace. So we're all good, right? They can no longer mess each other, right? Not quite. We're almost there.
 
 So now say it's Black Friday, Boxing Day or Singles' Day (three of the biggest shopping days in the year, pick the one that makes the most sense to you 😄) and Bob and Alice are gearing up for their biggest sales day of the year. Everything is ready to go and at 9:00AM their site suddenly goes down without warning. What happened!? They log on to their chroot'd, unshare'd shell on your server and see that the CPU is pegged at 100% and there's no more memory available to allocate! Oh no! What happened?
@@ -72,69 +16,116 @@ Enter the hero of this story: cgroups, or control groups. Google saw this same p
 
 This is a bit more difficult to accomplish but let's go ahead and give it a shot.
 
-``bash
+> cgroups v2 is now the standard. Run `grep -c cgroup /proc/mounts` in your terminal. If the number that is **greater than one**, the system you're using is cgroups v1. [Click here][move-to-v2] if you want to try to get your system from cgroup v1 to v2. As this is fairly involved, I would just suggest using a more recent version of Ubuntu as it will have cgroups v2 on it.
+>
+> If you want to learn cgroups v1 (which I would not suggest, they're getting phased out), [the first version of this course][v1] teaches them.
 
-# in #2, outside of unshare'd environment get the tools we'll need here
+cgroups as we have said allow you to move processes and their children into groups which then allow you to limit various aspects of them. Imagine you're running a single physical server for Google with both Maps and GMail having virtual servers on it. If Maps ships an infinite loop bug and it pins the CPU usage of the server to 100%, you only want Maps to go down and _not_ GMail just because it happens to be colocated with Maps. Let's see how to do that.
 
-apt-get install -y cgroup-tools htop
+You interact with cgroups by a pseudo-file system. Honestly the whole interface feels weird to me but that is what it is! Inside your #2 terminal (the non-unshared one) run `cd /sys/fs/cgroup` and then run `ls`. You'll see a bunch of "files" that look like `cpu.max`, `cgroup.procs`, and `memory.high`. Each one of these represents a setting that you can play with with regard to the cgroup. In this case, we are looking at the root cgroup: all cgroups will be children of this root cgroup. The way you make your own cgroup is by creating a folder inside of the cgroup.
 
-# create new cgroups
+```bash
+mkdir /sys/fs/cgroup/sandbox # creates the cgroup
+ls /sys/fs/cgroup/sandbox # look at all the files created automatically
+```
 
-cgcreate -g cpu,memory,blkio,devices,freezer:/sandbox
+We now have a sandbox cgroup which is a child of the root cgroup and can putting limits on it! If we wanted to create a child of sandbox, as you may have guessed, just create another folder inside of sandbox.
 
-# add our unshare'd env to our cgroup
+Let's move our unshared environment into the cgroup. Every process belongs to exactly one cgroup. If you move a process to a cgroup, it will automatically be removed from the cgroup it was in. If we move our unshared bash process from the root cgroup to the sandbox cgroup, it will be removed from the root cgroup without you doing anything.
 
-ps aux # grab the bash PID that's right after the unshare one
-cgclassify -g cpu,memory,blkio,devices,freezer:sandbox <PID>
+```bash
+ps aux # Find your isolated bash PID, it's the bash one immediately after the unshare
+cat /sys/fs/cgroup/cgroup.procs # should see the process in the root cgroup
+echo <PID> > /sys/fs/cgroup/sandbox/cgroup.procs # puts the unshared env into the cgroup called sandbox
+cat /sys/fs/cgroup/sandbox/cgroup.procs # should see the process in the sandbox cgroup
+cat /sys/fs/cgroup/cgroups.proc # should see the process no longer in the root cgroup - processes belong to exactly 1 cgroup
+```
 
-# list tasks associated to the sandbox cpu group, we should see the above PID
+We now have moved our unshared bash process into a cgroup. We haven't placed any limits on it yet but it's there, ready to be managed. We have a minor problem at the moment though that we need to solve.
 
-cat /sys/fs/cgroup/cpu/sandbox/tasks
+```bash
+cat /sys/fs/cgroup/cgroup.controllers # should see all the available controllers
+cat /sys/fs/cgroup/sandbox/cgroup.controllers # there's no controllers
+cat /sys/fs/cgroup/cgroup.subtree_control # there's no controllers enabled its children
+```
 
-# show the cpu share of the sandbox cpu group, this is the number that determines priority between competing resources, higher is is higher priority
+You have to enable controllers for the children and none of them are enabled at the moment. You can see the root cgroup has them all enabled, but hasn't enabled them in its subtree_control so thus none are available in sandbox's controllers. Easy, right? We just add them to subtree_control, right? Yes, but one probelm: you can't add new subtree_control configs while the cgroup itself has processes in it. So we're going to create another cgroup, add the rest of the processes to that one, and then enable the subtree_control configs for the root cgroup.
 
-cat /sys/fs/cgroup/cpu/sandbox/cpu.shares
+```bash
+mkdir /sys/fs/cgroup/other-procs # make new cgroup for the rest of the processes, you can't modify cgroups that have processes and by default Docker doesn't include any subtree_controllers
 
-# kill all of sandbox's processes if you need it
+cat /sys/fs/cgroup/cgroups.proc # see all the processes you need to move, rerun each time after you add as it may move multiple processes at once due to some being parent / child
+echo <PID> > /sys/fs/cgroup/other-procs/cgroup.procs # you have to do this one at a time for each process
 
-# kill -9 $(cat /sys/fs/cgroup/cpu/sandbox/tasks)
+echo "+cpuset +cpu +io +memory +hugetlb +pids +rdma" > /sys/fs/cgroup/cgroup.subtree_control # add the controllers
 
-# Limit usage at 5% for a multi core system
+ls /sys/fs/cgroup/sandbox # notice how few files there are
+cat /sys/fs/cgroup/sandbox/cgroup.controllers # all the controllers now available
+ls /sys/fs/cgroup/sandbox # notice how many more files there are now
+```
 
-cgset -r cpu.cfs_period_us=100000 -r cpu.cfs_quota_us=$[ 5000 * $(getconf _NPROCESSORS_ONLN) ] sandbox
+We did it! We went ahead and added all the possible controllers but normally you should just add just the ones you need. If you want to learn more about what each of them does, [the kernel docs are quite readable][kernel].
 
-# Set a limit of 80M
+Let's get a third terminal going. From your host OS (Windows or macOS or your own Linux distro, not within Docker) run another `docker exec -it docker-host bash`. That way we can have #1 inside the unshared environment, #2 running our commands, and #3 giving us a visual display of what's going with `htop`, a visual tool for seeing what process, CPU cores, and memory are doing.
 
-cgset -r memory.limit_in_bytes=80M sandbox
+So, let's go three little exercises of what we can do with a cgroup. First let's make it so the unshared environment only has access to 80MB of memory instead of all of it.
 
-# Get memory stats used by the cgroup
+```bash
+apt-get install htop # a cool visual representation of CPU and RAM being used
+htop # from #3 so we can watch what's happening
 
-cgget -r memory.stat sandbox
+yes | tr \\n x | head -c 1048576000 | grep n # run this from #1 terminal and watch it in htop to see it consume about a gig of RAM and 100% of CPU core
+kill -9 <PID of yes> # from #2, (you can get the PID from htop) to stop the CPU from being pegged and memory from being consumed
+cat /sys/fs/cgroup/sandbox/memory.max # should see max, so the memory is unlimited
+echo 83886080 > /sys/fs/cgroup/sandbox/memory.max # set the limit to 80MB of RAM (the number is 80MB in bytes)
+yes | tr \\n x | head -c 1048576000 | grep n # from inside #1, see it limit the RAM taken up; because the RAM is limited, the CPU usage is limited
+```
 
-# in terminal session #2, outside of the unshare'd env
+I think this is very cool. We just made it so our unshared environment only has access to 80MB of RAM and so despite there being a script being run to literally just consume RAM, it was limited to only consuming 80MB of it.
 
-htop # will allow us to see resources being used with a nice visualizer
+However, as you saw, the user inside of the container could still peg the CPU if they wanted to. Let's fix that. Let's only give them 5% of a core.
 
-# in terminal session #1, inside unshared'd env
+```bash
+yes > /dev/null # inside #1 / the cgroup/unshare – this will peg one core of a CPU at 100% of the resources available, see it peg 1 CPU
+kill -9 <PID of yes> # from #2, (you can get the PID from htop) to stop the CPU from being pegged
 
-yes > /dev/null # this will instantly consume one core's worth of CPU power
+echo '5000 100000' > /sys/fs/cgroup/sandbox/cpu.max # from #2 this allows the cgroup to only use 5% of a CPU
+yes > /dev/null # inside #1 / the cgroup/unshare – this will peg one core of a CPU at 5% since we limited it
+kill -9 <PID of yes> # from #2, to stop the CPU from being pegged, get the PID from htop
+```
 
-# notice it's only taking 5% of the CPU, like we set
+Pretty cool, right? Now, no matter how bad of code we run inside of our chroot'd, unshare'd, cgroup'd environment, we cannot take more than 5% of a CPU core.
 
-# if you want, run the docker exec from above to get a third session to see the above command take 100% of the available resources
+One more demo, the dreaded [fork bomb][fork-bomb]. A fork bomb is a script that forks itself into multiple processes, which then fork themselves, which them fork themselves, etc. until all resources are consumed and it crashes the computer. It can be written plainly as
 
-# CTRL+C stops the above any time
+```bash
+fork() {
+    fork | fork &
+}
+fork
+```
 
-# in terminal session #1, inside unshare'd env
+but you'll see it written as `:(){ :|:& };:` where `:` is the name of the function instead of `fork`.
 
-yes | tr \\n x | head -c 1048576000 | grep n # this will ramp up to consume ~1GB of RAM
+So someone could run a fork bomb on our system right now and it'd limit the blast radius of CPU and RAM but creating and destroying so many processes still carries a toll on the system. What we can do to more fully prevent a fork bomb is limit how many PIDs can be active at once. Let's try that.
 
-# notice in htop it'll keep the memory closer to 80MB due to our cgroup
+```bash
+cat /sys/fs/cgroup/sandbox/pids.current # See how many processes the cgroup has at the moment
+cat /sys/fs/cgroup/sandbox/pids.max # See how many processes the cgroup can create before being limited (max)
+echo 3 > /sys/fs/cgroup/sandbox/pids.max # set a limit that the cgroup can only run 3 processes at a time
+for a in $(seq 1 5); do sleep 15 & done # this runs 5 15 second processes that run and then stop. run this from within #2 and watch it work. now run it in #1 and watch it not be able to. it will have to retry several times
 
-# as above, connect with a third terminal to see it work outside of a cgroup
+:(){ :|:& };: # DO NOT RUN THIS ON YOUR COMPUTER. This is a fork bomb. If not accounted for, this would bring down your computer. However we can safely run inside our #1 because we've limited the amount of PIDs available. It will end up spawning about 100 processes total but eventually will run out of forks to fork.
+```
 
-``
+Attack prevented! 3 processes is way too few for anyone to do anything meaningful but by limiting the max PIDs available it allows you to limit what damage could be done. I'll be honest, this is the first time I've run a fork bomb on a computer and it's pretty exhilirating. I felt like I was in the movies Hackers. [Hack the planet!][hackers].
 
-And now we can call this a container. Using these features together, we allow Bob, Alice, and Eve to run whatever code they want and the only people they can mess with is themselves.
+And now we can call this a container. You have handcrafted a container. A container is literally nothing more than we did together. There's other sorts of technologies that will accompany containers like runtimes and daeomons, but the containers themselves are just a combination of chroot, namespaces, and cgroups! Using these features together, we allow Bob, Alice, and Eve to run whatever code they want and the only people they can mess with is themselves.
 
 So while this is a container at its most basic sense, we haven't broached more advance topics like networking, deploying, bundling, or anything else that something like Docker takes care of for us. But now you know at its most base level what a container is, what it does, and how you _could_ do this yourself but you'll be grateful that Docker does it for you. On to the next lesson!
+
+[move-to-v2]: https://medium.com/@charles.vissol/cgroup-v2-in-details-8c138088f9ba#aa07
+[v1]: https://btholt.github.io/complete-intro-to-containers/cgroups
+[kernel]: https://docs.kernel.org/admin-guide/cgroup-v2.html#controllers
+[fork-bomb]: https://en.wikipedia.org/wiki/Fork_bomb
+[hackers]: https://youtu.be/Rn2cf_wJ4f4
diff --git a/styles/courses.css b/styles/courses.css
@@ -303,6 +303,11 @@ header .cta-btn {
   display: block;
 }
 
+.lesson-content pre {
+  overflow-x: auto;
+  white-space: pre-wrap;
+}
+
 .lesson-flex {
   display: flex;
   flex-direction: column;