Possible bug in the scheduling algorithm #318

przybyszewskiw · 2021-03-31T16:53:21Z

I was analyzing Minix code responsible for scheduling. In minix/kernel/proc.h there is a definition of RTS_UNSET macro which "clears flag and enqueues if the process was not runnable but is now". I wonder why this macro always uses enqueue function. I think it should use enqueue_head in case when (rp)->p_cpu_time_left is positive. Otherwise, when a process is blocked on I/O before using its whole cpu time, it goes to the very end of its queue once it is runnable again.

The text was updated successfully, but these errors were encountered:

stux2000 · 2021-06-20T11:59:50Z

Notes for future research:

The line in question is this:

minix/minix/kernel/proc.h

Line 222 in 4db99f4

enqueue(rp); \
enqueue is defined here:

minix/minix/kernel/proc.c

Line 1595 in 4db99f4

void enqueue(
enqueue_head is defined here and the comments reflect the use case defined above:

minix/minix/kernel/proc.c

Line 1670 in 4db99f4

static void enqueue_head(struct proc *rp)
It is also used here in the same manner as described in the issue:

minix/minix/kernel/proc.c

Line 325 in 4db99f4

if (p->p_cpu_time_left)

Personally, I don't know if there's a technical reason why the process would be sent to the back of the queue instead of the front. I'd imagine that in some cases this could lead to process starvation. However, nothing in the comments indicate that using enqueue instead of enqueue_head was intended. I don't know if there's code from other OSes that can be referenced, otherwise this might be a good update to make.

Another thing I'm thinking: if this is a common pattern/issue. It might be best to create a new function and/or macro that tests to see if the process still has time and calls enqueue_head instead of enqueue. Even if it's only used twice it might be a good abstraction to create.

stux2000 · 2021-06-20T12:35:11Z

I'd also like to note (as it was brought to my attention) that any such change can only be made after thorough testing and proper understanding of the process scheduler is acquired. Otherwise, other subtle performance bugs might arise from said fix.

przybyszewskiw · 2021-06-22T10:11:58Z

An example of a starvation I observed is when a completely new process is stopped due to a pagefault. After the pagefault is handled, the process isn't started immediately. Therefore, chances are that other processes on the same priority will eventually remove that page from cache and the process will encounter exactly the same pagefault over and over again. If there's a need, I can prepare an example, which can be reproduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in the scheduling algorithm #318

Possible bug in the scheduling algorithm #318

przybyszewskiw commented Mar 31, 2021

stux2000 commented Jun 20, 2021

stux2000 commented Jun 20, 2021

przybyszewskiw commented Jun 22, 2021

Possible bug in the scheduling algorithm #318

Possible bug in the scheduling algorithm #318

Comments

przybyszewskiw commented Mar 31, 2021

stux2000 commented Jun 20, 2021

stux2000 commented Jun 20, 2021

przybyszewskiw commented Jun 22, 2021