Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented the clone fallback when clone3 returns ENOSYS #2203

Merged
merged 2 commits into from
Aug 1, 2023

Conversation

yihuaf
Copy link
Collaborator

@yihuaf yihuaf commented Jul 24, 2023

For a number of reasons, platforms can choose to block clone3 and force return ENOSYS. We implement a clone fallback in the case that we can't use clone3.

Also, clone3 has no libc wrapper at this point. The current implementation calls the kernel version of the syscall directly. There are undefined behaviors potentially when we create process bypassing the libc. However, we have not observed any issue with our tests. This is likely because youki runs short lived process and calls exec or exit in the end. Nonetheless, we should have a backup plan and this change is our way out in the case that we discover clone3 has issue as the default code path.

Remove the use of the clone3 crate. We use clone3 is a very specific way to create a process. We don't have to support the many other flags and usecases of the clone3 call. So it is simpler for us to use the libc crate directly for the syscall. This avoids an extra dependency and reduces our binary size.

For a number of reasons, platforms can choose to block clone3 and force
return ENOSYS. We implement a clone fallback in the case that we can't
use clone3.

Also, clone3 has no libc wrapper at this point. The current
implementation calls the kernel version of the syscall directly. There
are undefined behaviors potentially when we create process bypassing the
libc. However, we have not observed any issue with our tests. This is
likely because `youki` runs short lived process and calls exec or exit
in the end. Nonetheless, we should have a backup plan and this change is
our way out in the case that we discover clone3 has issue as the default
code path.

Remove the use of the clone3 crate. We use `clone3` is a very specific
way to create a process. We don't have to support the many other flags
and usecases of the `clone3` call. So it is simpler for us to use the
libc crate directly for the syscall. This avoids an extra dependency and
reduces our binary size.

Signed-off-by: yihuaf <[email protected]>
@yihuaf yihuaf requested review from utam0k, YJDoc2 and a team July 24, 2023 00:50
@yihuaf yihuaf added enhancement New feature or request kind/feature and removed enhancement New feature or request labels Jul 24, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2023

Codecov Report

Merging #2203 (a67d180) into main (f3da56e) will increase coverage by 0.11%.
Report is 42 commits behind head on main.
The diff coverage is 64.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2203      +/-   ##
==========================================
+ Coverage   64.93%   65.05%   +0.11%     
==========================================
  Files         129      129              
  Lines       14979    15139     +160     
==========================================
+ Hits         9727     9848     +121     
- Misses       5252     5291      +39     

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Jul 24, 2023

Hey @yihuaf is it possible that we do this as a clone3 feature which is enabled by default? Maybe then we can disable this for musl, and if someone don't want / knows that clone3 cannot be used, they can disable it via features instead of allowing it to fallback.

To be clear, I am saying that we should keep the current fallback mechanism even with feature, but if the feature is disabled, we will directly use clone, without trying clone3 first. wdyt?

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 24, 2023

Hey @yihuaf is it possible that we do this as a clone3 feature which is enabled by default? Maybe then we can disable this for musl, and if someone don't want / knows that clone3 cannot be used, they can disable it via features instead of allowing it to fallback.

To be clear, I am saying that we should keep the current fallback mechanism even with feature, but if the feature is disabled, we will directly use clone, without trying clone3 first. wdyt?

I think this is a great idea, but I would want to implement this in a different PR as a follow up. This will also make the testing of this feature easier, otherwise I have to resort to seccomp to force clone3 to return enosys.

@utam0k
Copy link
Member

utam0k commented Jul 25, 2023

Great challenge 👍 Is it possible to have it benchmarked?

@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 31, 2023

Great challenge 👍 Is it possible to have it benchmarked?

So there is no visible performance difference between clone and clone3. We benchmarked it before and here is the simple benchmark I did with this PR, using the clone and clone3 path.

This is using clone3

[eng@stardust youki]$ just hack-benchmark
Benchmark 1: sudo /home/eng/private/youki/youki create -b tutorial a && sudo /home/eng/private/youki/youki start a && sudo /home/eng/private/youki/youki delete -f a
  Time (mean ± σ):     928.4 ms ± 141.5 ms    [User: 20.8 ms, System: 232.5 ms]
  Range (min … max):   725.0 ms … 1383.5 ms    100 runs
 
[eng@stardust youki]$ 

This is using clone

[eng@stardust youki]$ just hack-benchmark
Benchmark 1: sudo /home/eng/private/youki/youki create -b tutorial a && sudo /home/eng/private/youki/youki start a && sudo /home/eng/private/youki/youki delete -f a
  Time (mean ± σ):     797.2 ms ±  42.7 ms    [User: 20.3 ms, System: 233.0 ms]
  Range (min … max):   709.1 ms … 1069.1 ms    100 runs
 
[eng@stardust youki]$ 

With the fallback path, we do need to make an extra clone3 call before making the clone call, but this is one extra syscall that will return ENOSYS. I don't anticipate there is any performance difference tbh.

@yihuaf yihuaf self-assigned this Jul 31, 2023
@yihuaf
Copy link
Collaborator Author

yihuaf commented Jul 31, 2023

If there is no blocking issue, I would like to get this merged as is and address any review comments in a follow up PR along with clone3 as a feature.

We often use this to quickly get a sense of perf of a PR change.

Signed-off-by: yihuaf <[email protected]>
Copy link
Collaborator

@YJDoc2 YJDoc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍 I think @utam0k didn't have any issues with this, but if there are any problems, we can fix them in follow up PRs. Merging this for now --- After the CI is done :)

Comment on lines +104 to +105
#!/usr/bin/env bash
set -euo pipefail
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, given that this is a single command, I don't thing we need to specify the shell here, we can just run it. Also I think this is the best way we have currently to run benchmarks, and this is not in hacks, so we can just name it benchmark instead of hack benchmark.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihuaf take a look at this in your next PR of clone regarding changes. Merging this one as not a blocking issue.

@YJDoc2 YJDoc2 merged commit 0fc4c87 into youki-dev:main Aug 1, 2023
13 checks passed
@utam0k
Copy link
Member

utam0k commented Aug 2, 2023

Thanks a lot @yihuaf @YJDoc2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants