Skip to content

Commit

Permalink
Add IO to OpenEducation methodology
Browse files Browse the repository at this point in the history
Signed-off-by: Mihnea Firoiu <[email protected]>
  • Loading branch information
Mihnea0Firoiu committed Aug 6, 2024
1 parent eb0056b commit d84d43a
Show file tree
Hide file tree
Showing 339 changed files with 1,448 additions and 4,461 deletions.
File renamed without changes.
33 changes: 33 additions & 0 deletions chapters/io/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
RVMD = reveal-md
MDPP = markdown-pp
FFMPEG = ffmpeg

SLIDES ?= slides.mdpp
SLIDES_OUT ?= slides.md
MEDIA_DIR ?= media
SITE ?= _site
OPEN ?= xdg-open

.PHONY: all html clean videos

all: videos html

html: $(SITE)

$(SITE): $(SLIDES)
$(MDPP) $< -o $(SLIDES_OUT)
$(RVMD) $(SLIDES_OUT) --static $@

videos:
for TARGET in $(TARGETS); do \
$(FFMPEG) -framerate 0.5 -f image2 -y \
-i "$(MEDIA_DIR)/$$TARGET/$$TARGET-%d.svg" -vf format=yuv420p $(MEDIA_DIR)/$$TARGET-generated.gif; \
done

open: $(SITE)
$(OPEN) $</index.html

clean:
-rm -f $(MEDIA_DIR)/*-generated.gif
-rm -f *~
-rm -fr $(SITE) $(SLIDES_OUT)
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct file_operations {
OK, there will be no Blackjack...
for now at least.
But there **will** be pipes.
Navigate back to `support/mini-shell/mini_shell.c` and add support for piping 2 commands together like this:
Navigate back to `mini-shell/support/mini_shell.c` and add support for piping 2 commands together like this:

```console
> cat bosses.txt | head -n 5
Expand All @@ -71,7 +71,7 @@ Ancient Dragon

## To Drop or Not to Drop?

Remember `support/buffering/benchmark_buffering.sh` or `support/file-mappings/benchmark_cp.sh`.
Remember `buffering/support/benchmark_buffering.sh` or `file-mappings/support/benchmark_cp.sh`.
They both used this line:

```bash
Expand All @@ -96,12 +96,12 @@ This makes I/O faster.
The line from which this discussion started invalidates those caches and forces the OS to perform I/O operations "the slow way" by interrogating the disk.
The scripts use it to benchmark only the C code, not the OS.

To see just how much faster this type of caching is, navigate to `support/buffering/benchmark_buffering.sh` once again and comment-out the line with `sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"`.
To see just how much faster this type of caching is, navigate to `buffering/support/benchmark_buffering.sh` once again and comment-out the line with `sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"`.
Now run the script **a few times** and compare the results.
You should see some drastic improvements in the running times, such as:

```console
student@os:/.../support/file-mappings$ ./benchmark_cp.sh
student@os:/.../file-mappings/support$ ./benchmark_cp.sh
make: Nothing to be done for 'all'.
Benchmarking mmap_cp on a 1 GB file...

Expand All @@ -115,7 +115,7 @@ user 0m0,013s
sys 0m1,301s


student@os:/.../support/file-mappings$ ./benchmark_cp.sh
student@os:/.../file-mappings/support$ ./benchmark_cp.sh
make: Nothing to be done for 'all'.
Benchmarking mmap_cp on a 1 GB file...

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ In case of asynchronous I/O, the "backend" used to implement the operations may

## Practice

Enter the `support/async/` folder for some implementations of a simple request-reply server in Python or in C.
Enter the `async/support/` folder for some implementations of a simple request-reply server in Python or in C.
The server gets requests and serves them in different ways: synchronous, multiprocess-based, multi-threading-based, asynchronous.

We use two implementations, in Python and in C.
Expand Down Expand Up @@ -57,19 +57,19 @@ We use two implementations, in Python and in C.
To start the server, run each of these commands (one at a time to test the respective server type):

```console
student@os:/.../support/async/python$ ./server.py 2999
student@os:/.../async/support/python$ ./server.py 2999

student@os:/.../support/async/python$ ./mp_server.py 2999
student@os:/.../async/support/python$ ./mp_server.py 2999

student@os:/.../support/async/python$ ./mt_server.py 2999
student@os:/.../async/support/python$ ./mt_server.py 2999

student@os:/.../support/async/python$ ./async_server_3.6.py 2999
student@os:/.../async/support/python$ ./async_server_3.6.py 2999
```

For each server, in a different console, we can test to see how well it behaves by running:

```console
student@os:/.../support/async/python$ time ./client_bench.sh
student@os:/.../async/support/python$ time ./client_bench.sh
```

You will see a time duration difference between `mp_server.py` and the others, `mp_server.py` runs requests faster.
Expand All @@ -96,17 +96,17 @@ We use two implementations, in Python and in C.
Same as with Python, to start the server, run each of these commands (one at a time to test the respective server type):

```console
student@os:/.../support/async/c$ ./server 2999
student@os:/.../async/support/c$ ./server 2999

student@os:/.../support/async/c$ ./mp_server 2999
student@os:/.../async/support/c$ ./mp_server 2999

student@os:/.../support/async/c$ ./mt_server 2999
student@os:/.../async/support/c$ ./mt_server 2999
```

For each server, in a different console, we can test to see how well it behaves by running:

```console
student@os:/.../support/async/python$ time client_bench.sh
student@os:/.../async/support/python$ time client_bench.sh
```

We draw 2 conclusions from using the C variant:
Expand All @@ -122,7 +122,7 @@ When aiming for performance, asynchronous I/O operations are part of the game.
And it's very useful having a good understanding of what's happening behind the scenes.

For example, for the Python `async_server_3.6.py` server, a message `asyncio: Using selector: EpollSelector` is provided.
This means that the backend relies on the use of the [`epoll()` function](https://man7.org/linux/man-pages/man7/epoll.7.html) that's part of the [I/O Multiplexing section](./io-multiplexing.md).
This means that the backend relies on the use of the [`epoll()` function](https://man7.org/linux/man-pages/man7/epoll.7.html) that's part of the [I/O Multiplexing section](../../io-multiplexing/reading/io-multiplexing.md).

Also, for Python, the use of the GIL may be an issue when the operations are CPU intensive.

Expand All @@ -131,4 +131,4 @@ It's rare that servers or programs using asynchronous operations are CPU intensi
It's more likely that they are I/O intensive, and the challenge is avoiding blocking points in multiple I/O channels;
not avoiding doing a lot of processing, as is the case with the `fibonacci()` function.
In that particular case, having thread-based asynchronous I/O and the GIL will be a good option, as you rely on the thread scheduler to be able to serve multiple I/O channels simultaneously.
This later approach is called I/O multiplexing, discussed in [the next section](./io-multiplexing.md).
This later approach is called I/O multiplexing, discussed in [the next section](../../io-multiplexing/reading/io-multiplexing.md).
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Beyond Network Sockets

Up until this point, we first learned how to use the [Berkeley Sockets API](./remote-io.md#api---hail-berkeley-sockets), then we learned about the [client-server model](./client-server-model.md), based on this API.
Up until this point, we first learned how to use the [Berkeley Sockets API](./remote-io.md#api---hail-berkeley-sockets), then we learned about the [client-server model](../../client-server-model/reading/client-server-model.md), based on this API.
So now we know that sockets offer a ubiquitous interface for inter-process communication, which is great.
A program written in Python can easily send data to another one written in C, D, Java, Haskell, you name it.
However, in the [section dedicated to networking](./networking-101.md), we saw that it takes a whole stack of protocols to send this message from one process to the other.
However, in the [section dedicated to networking](../../networking-101/reading/networking-101.md), we saw that it takes a whole stack of protocols to send this message from one process to the other.
As you might imagine, this is **much slower even than local I/O using files**.

So far we've only used sockets for local communication, but in practice it is a bit counterproductive to use network sockets for local IPC due to their high latency.
Expand All @@ -13,7 +13,7 @@ Well, there is a way and it's called **UNIX sockets**.
## UNIX Sockets

UNIX sockets are created using the `socket()` syscall and are bound **TO A FILE** instead of an IP and port using `bind()`.
You may already see a similarity with [named pipes](./pipes.md#named-pipes---mkfifo).
You may already see a similarity with [named pipes](../../pipes/reading/pipes.md#named-pipes---mkfifo).
Just like them, UNIX sockets don't work by writing data to the file (that would be slow), but instead the kernel retains the data they send internally so that `send()` and `recv()` can read it from the kernel's storage.
You can use `read()` and `write()` to read/write data from/to both network and UNIX sockets as well, by the way.
The differences between using `send()`/`recv()` or `write()`/`read()` are rather subtle and are described in [this Stack Overflow thread](https://stackoverflow.com/questions/1790750/what-is-the-difference-between-read-and-recv-and-between-send-and-write).
Expand All @@ -23,7 +23,7 @@ However, there are [third-party libraries](https://crates.io/crates/uds_windows)

### Practice: Receive from UNIX Socket

Navigate to `support/receive-challenges/receive_unix_socket.c`.
Navigate to `receive-challenges/support/receive_unix_socket.c`.
Don't write any code yet.
Let's compare UNIX sockets with network sockets first:

Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ What's different from UDP is that this connection is **bidirectional**, so we ca
Notice that the syscalls have changed.
We were using `sendto()` and `recvfrom()` for UDP, and now we're using `send()` and `recv()` for TCP.
And yes, despite the fact that we're using Python, these are syscalls.
You saw them in C when you solved the [challenge](./remote-io.md#practice-network-sockets-challenge).
You saw them in C when you solved the [challenge](../../remote-io/reading/remote-io.md#practice-network-sockets-challenge).

## Server vs Client

Expand All @@ -24,7 +24,7 @@ Either way, it is **listening** for connections.

The client is the active actor, being the one who initiates the connection.

[Quiz](../quiz/client-server-sender-receiver.md)
[Quiz](../drills/questions/client-server-sender-receiver.md)

## Establishing the Connection

Expand Down Expand Up @@ -65,7 +65,7 @@ Below is an image summarising the steps above:

### Practice: Client

Navigate to `support/client-server/`.
Navigate to `client-server/support/`.
Here you will find a minimalistic server implementation in `server.py`.

1. Read the code and identify the steps outlined above.
Expand All @@ -77,7 +77,7 @@ Run multiple clients.

## Practice: Just a Little Bit More Deluge

We've already said that Deluge uses an [abstraction over TCP](./networking-101.md#practice-encapsulation-example-deluge-revived) to handle socket operations, so we don't have the luxury of seeing it perform remote I/O "manually".
We've already said that Deluge uses an [abstraction over TCP](../../networking-101/reading/networking-101.md#practice-encapsulation-example-deluge-revived) to handle socket operations, so we don't have the luxury of seeing it perform remote I/O "manually".
However, there are a few instances where Deluge uses socket operations itself, mostly for testing purposes.

Deluge saves its PIDs (it can spawn multiple processes) and ports in a file.
Expand Down
File renamed without changes.
Empty file.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# File Descriptors

After running the code in the ["File Handlers" section](./file-handlers.md), you saw that `open()` returns a **number**.
After running the code in the ["File Handlers" section](../../file-handlers/reading/file-handlers.md), you saw that `open()` returns a **number**.
This number is a **file descriptor**.
Run the code above multiple times.
You'll always get file descriptor 3.
Expand All @@ -21,10 +21,10 @@ student@os:~$ ls 2> /dev/null

The command above uses `2> /dev/null` to **redirect** `stderr` to /dev/null.

[Quiz](../quiz/stderr-fd.md)
[Quiz](../drills/questions/stderr-fd.md)

Good, so we know that `stderr` is file descriptor 2.
The code in `support/file-descriptors/open_directory.c` opened the directory as file descriptor 3.
The code in `file-descriptors/support/open_directory.c` opened the directory as file descriptor 3.
So what are file descriptors 0 and 1?
They are `stdin` and `stdout`, respectively.

Expand All @@ -43,7 +43,7 @@ And remember that the file descriptor table is bound to the process, so all its

![File Descriptors](../media/file-descriptors.svg)

To find out more about the contents of these structures, check out [this section in the Arena](./arena.md#open-file-structure-in-the-kernel)
To find out more about the contents of these structures, check out [this section in the Arena](../../arena/reading/arena.md#open-file-structure-in-the-kernel)

## Creating New File Descriptors

Expand All @@ -59,7 +59,7 @@ An optional `mode` parameter that denotes the file's permissions if the `open` m
We'll revisit `open()`'s `mode` parameter in the future.

From now, you should consult [`open()`'s man page](https://man7.org/linux/man-pages/man2/open.2.html) whenever you encounter a new argument to this syscall.
Navigate to `support/file-descriptors/open_read_write.c`.
Navigate to `file-descriptors/support/open_read_write.c`.
The function `open_file_for_reading()` calls `open()` with the `O_RDONLY` flag, which is equivalent to opening the file with `fopen()` and setting `"r"` as the `mode`.
This means read-only.
Note that the file descriptor we get is 3, just like before.
Expand Down Expand Up @@ -103,7 +103,7 @@ Anyway, they are likely not the default permissions for a regular file (`rw-r--r
It is not mandatory that you get the same output as below.

```console
student@os:~/.../lab/support/file-descriptors$ ls -l
student@os:~/.../file-descriptors/support$ ls -l
total 11
drwxrwxr-x 1 student student 4096 Nov 20 18:26 ./
drwxrwxr-x 1 student student 0 Nov 20 14:11 ../
Expand All @@ -115,7 +115,7 @@ drwxrwxr-x 1 student student 0 Nov 20 14:11 ../
---------- 1 student student 45 Nov 20 18:26 write_file.txt
```

[Quiz](../quiz/write-file-permissions.md)
[Quiz](../drills/questions/write-file-permissions.md)

**Remember:**
**It is mandatory that we pass a `mode` argument to `open()` when using the `O_CREAT` flag.**
Expand All @@ -135,14 +135,14 @@ Remember to use a loop to make sure your data is fully written to the file.

### Practice: Write Once More

Change the message written to `write_file.txt` by `support/file-descriptors/open_read_write.c` **to a shorter one**.
Change the message written to `write_file.txt` by `file-descriptors/support/open_read_write.c` **to a shorter one**.
It is important that the new message be shorter than the first one.
Now recompile the code, then run it, and then inspect the contents of the `write_file.txt` file.

If the new message were `"Something short"`, the contents of `write_file.txt` should be:

```console
student@os:~/.../lab/support/file-descriptors$ cat ../../support/file-descriptors/write_file.txt
student@os:~/.../file-descriptors/support$ cat write_file.txt
Something shorte_file.txt: What's up, Doc?
```

Expand All @@ -154,9 +154,9 @@ We haven't talked about how redirections work in the terminal (we'll get there,
Let's write data to a file twice and observe the behaviour:

```console
student@os:~/.../lab/support/file-descriptors$ ls -l > file.txt
student@os:~/.../file-descriptors/support$ ls -l > file.txt

student@os:~/.../lab/support/file-descriptors$ cat file.txt
student@os:~/.../file-descriptors/support$ cat file.txt
total 6
-rw-rw-r-- 1 student student 0 Nov 20 21:11 file.txt
-rw-rw-r-- 1 student student 125 Nov 20 18:26 Makefile
Expand All @@ -165,9 +165,9 @@ total 6
-rw-rw-r-- 1 student student 34 Nov 20 18:26 read_file.txt
-rw-r--r-- 1 student student 45 Nov 20 20:56 write_file.txt

student@os:~/.../lab/support/file-descriptors$ ls > file.txt
student@os:~/.../file-descriptors/support$ ls > file.txt

student@os:~/.../lab/support/file-descriptors$ cat file.txt
student@os:~/.../file-descriptors/support$ cat file.txt
file.txt
Makefile
open_directory.c
Expand All @@ -182,15 +182,15 @@ Well, the reason is another flag being passed to `open()`: `O_TRUNC`.
At this point, you should be accustomed to looking for this flag in `open()`'s man page.
Go ahead and do it.

[Quiz 1](../quiz/o-trunc.md)
[Quiz 1](../drills/questions/o-trunc.md)

[Quiz 2](../quiz/fopen-w.md)
[Quiz 2](../drills/questions/fopen-w.md)

### Practice: Close'em All

Just like you use `open()` to create new file descriptors, you can use [`close()`](https://man7.org/linux/man-pages/man2/close.2.html) to destroy them.
This clears and frees the open file structure to which that entry in the file descriptor table is pointing.
Use `close()` on the file descriptors you've opened so far in `support/file-descriptors/open_read_write.c`.
Use `close()` on the file descriptors you've opened so far in `file-descriptors/support/open_read_write.c`.

Note that you don't have to close file descriptors 0, 1 and 2 manually.
The standard streams are meant to stay alive throughout the lifetime of the process.
Expand Down
Empty file.
Empty file.
Loading

0 comments on commit d84d43a

Please sign in to comment.