Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ReadLimiter #201

Merged
merged 4 commits into from
Dec 19, 2020
Merged

Sync ReadLimiter #201

merged 4 commits into from
Dec 19, 2020

Conversation

appaquet
Copy link
Contributor

As mentioned in #191 and #121, capnp::message::Reader<capnp::serialize::OwnedSegments> isn't Sync because of the Cell<u64> in the ReadLimiter.

This solution will only work with platforms that have AtomicU64 and only works from 1.34 onward. This modification could be put behind a feature if this affects any user of the crate.

Let me know !

@dwrensha
Copy link
Member

Thanks!

Do you know which platforms support AtomicU64? This async-std pull request seems to suggest the Arm, MIPS, and Powerpc do not support it. I would prefer not to break capnp on those platforms.

}

#[inline]
pub fn can_read(&self, amount: u64) -> Result<()> {
let current = self.limit.get();
if amount > current {
let read = self.read.fetch_add(amount, Ordering::Relaxed) + amount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This potentially allows self.read to overflow and wrap around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but it also return an error at first if it reaches the limit, so I would expect this to be an undefined behaviour to call it afterward.

My first implementation was a load + cas, but that would be definitely less performant.

We could also have a second AtomicBool that indicates that we reached the limit and return right away. That could potentially be over the limit, but decreases odds of overflow.

@appaquet
Copy link
Contributor Author

Makes sense. I couldn't find the list of targets that this crate was tested against. Could be a good idea to add tests, or even just check compilation against them (ex: using cross) in CI.

How you would go about it ? We could have a different implementation for platforms that don't support it. It could also be a feature to support Sync that would need to be explicitly enabled, or fallback to non-sync implementation if it's not a supported platform.

Let me know !

@dwrensha
Copy link
Member

Some options:

  1. We could use some kind of conditional compilation (feature flag or platform detection) to control when the AtomicU64 is used. I worry that this approach might add unpleasant surprises for downstream users, with things becoming !Sync in unexpected situations.

  2. Assuming that AtomicU32 is sufficiently widely supported, we could use it instead of AtomicU64. The downside is that the read limiter would then capped at 32 gigabytes, but, come to think of it, that doesn't seem too bad, as long as we provide an explicit way to opt out of read limiting. It seems reasonable to expect that all untrusted messages people care about will be smaller than that (though maybe there are some use cases that I have not imagined).

  3. We could make read limiting customizable via a trait, perhaps folding it into the ReaderSegments trait. Then message::Reader<S> could be Sync or not depending on whether S is Sync. It's unclear to me exactly what the new API would look like, but this feels doable to me.

@appaquet
Copy link
Contributor Author

Another option would be to use AtomicUsize instead. It has the benefit from being potentially of u64 size on 64 bits platforms while being the most portable type according to https://doc.rust-lang.org/std/sync/atomic/index.html.

In both AtomicU32 and AtomicUsize option, we will have to change the type of the limit in ReaderOptions to the appropriate type though, which is an API breaking change. It could be left to a u64 and checked at runtime to make sure it doesn't overflow the u32 or usize.

In the 3rd option, if I understand correctly, you'd move the read limiting responsibility to the ReaderSegments implementations? Or would you see it more like a ReaderSegments implementation that wraps an inner ReaderSegments but with the read limiting capability ? Nevertheless, I see this 3rd option as being probably more work and API breaking changes since the we're moving the limiting functionally out of the reader.

@appaquet
Copy link
Contributor Author

@dwrensha Pushed an updated version that uses AtomicUsize behind a feature flag. I would have liked to use proper conditional compilation and compile the sync version when the AtomicUsize is available, but rust-lang/rust#32976 will need to be stabilized before. Radium could also be used as a replacement to AtomicUsize that fallbacks to a Cell<usize> if it isn't available, but I don't think the extra dependency is worth it for this sole usage.

@dwrensha
Copy link
Member

Thanks!

Having thought about this some more, I think I prefer option 2 above, i.e. switching to AtomicU32. You are correct that it would be a breaking change, so it would require a version bump. We would probably want to wait until we have other pending breaking changes, so things can be batched together and we minimize version churn. I'm not sure when that would be, but historically it has been something like every six months.

If we want to quickly land a fix/workaround, then maybe your feature flag is the way to go. I hesitate because I don't like adding this kind of complexity, but maybe it's worth it.

Are you able to workaround the problem on your end by sharing an OwnedSegments or SegmentArray object rather than a message::Reader object?

@appaquet
Copy link
Contributor Author

Unfortunately, sharing segments and creating a Reader for every access makes it impossible for me to have a proper lifetime on the message. When reading a message, its lifetime is tied to the Reader, not to the underlying segments, which means that I can't carry around that message or its data without keeping the Reader. Self-referential structs would have helped me here by providing me a way to wrap the Reader and my message together in a struct to be carried around.

@dwrensha
Copy link
Member

dwrensha commented Oct 3, 2020

I was curious about which targets support which atomics, so I tried cross compiling the following program on a sampling of targets:

#![no_std]

pub fn foo() {
    let x = core::sync::atomic::AtomicUsize::new(0);
    let x = core::sync::atomic::AtomicU64::new(0);
    let x = core::sync::atomic::AtomicU32::new(0);
}

targets that support AtomicUsize, AtomicU64, and AtomicU32

x86_64-unknown-linux-gnu
arm-unknown-linux-gnueabi
i686-unknown-linux-gnu
wasm32-unknown-unknown
aarch64-unknown-linux-gnu

targets that support AtomicUsize and AtomicU32, but not AtomicU64

arm-linux-androideabi
powerpc-unknown-linux-gnu
thumbv7em-none-eabihf

targets that do not support AtomicUsize, AtomicU64, or AtomicU32

riscv32i-unknown-none-elf


These results are more or less what I expected: while the most important targets support AtomicU64, there are some targets that we should care about that don't support it.

I'm not sure what's going on with riscv32i-unknown-none-elf.

@appaquet
Copy link
Contributor Author

appaquet commented Oct 3, 2020

I think riscv32i-unknown-none-elf just doesn't support atomics in fact: https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/riscv32i_unknown_none_elf.rs#L20

Looks like Radium also requires disabling atomics for these targets: ferrilab/radium#3

That's why I think having a feature flag to disable the sync reader is the best option to support all cases correctly, at the expense of conditional code.

Let me know if you want me to switch to AtomicU32 instead of AtomicUsize in this PR. Otherwise, if you don't want to support sync readers, just let me know too. You're the maintainer afterall. I'll try to work around it on my side or perhaps find an alternative.

@appaquet
Copy link
Contributor Author

@dwrensha What's your thoughts on this? Do you think we could get this done? Otherwise I need to plan a refactor or an alternative on my side since I can't publish without this.

@dwrensha
Copy link
Member

My current feeling is that the best solution to this and a bunch of other problems will be to expose inner Arena type as a type parameter. So instead of having a foo::Reader<'a> as you do today, you would have a foo::Reader<'a, A>, where A: ReaderArena. That way, if your arena implementation is Sync, then the typesystem will remember it, and you can do the sharing-between-threads that you need to. Moreover, we will be able to avoid a bunch of dynamic dispatch, which seems to be slowing down performance, and we'll be able make alignment checking configurable via the trait system, rather than through feature flags.

The main obstacle is that getting this to work (particularly with capnp::traits::Owned) probably requires generic associated types, and it's unclear when those will land on stable rust.

In the meantime, it could make sense to land a targeted temporary fix, like what you've proposed here. I don't like adding complexity when it seems like there is a simplifying solution within reach, but maybe it's worth it in this case.

@appaquet
Copy link
Contributor Author

Sounds good for the arena as a type parameter. Eager to see GATs landing too at some point, as it will also help simplify some code on my side too.

As for this temporary fix, from my opinion, since it's pretty constrained and isn't exposed to the users (other than the new feature), I think it's a good trade-off to get the reader Sync, but I'm probably biased ;) Let me know if there is anything you'd like me to change in my changes, or feel free to do them if it can get them to land earlier.


#[inline]
pub fn can_read(&self, amount: usize) -> Result<()> {
let read = self.read.load(Ordering::Relaxed) + amount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this can overflow if self.read.load(Ordering::Relaxed) + amount > core::usize::MAX.

Why not write this is the same way as the non-Sync version, with a single limit: AtomicUsize field that decreases on each call? I think overflows and underflows are easier to avoid in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, unless I don't see an obvious solution here, the way it's done in the non-sync version is possible because it's single threaded. In a multi-threaded version, I need to make sure that the limit underflowing is handled correctly. If I was to get then sub without checking underflow, the limit would wrap around and allows the next readers to read usize::MAX.

I think the version I pushed is better and handle this better than my previous version. I use an extra flag to indicate that the limit has been reached when the limit got to 0 or when it underflows.

capnp/Cargo.toml Outdated
@@ -25,7 +25,7 @@ quickcheck = { version = "0.9", optional = true }
quickcheck = "0.9"

[features]
default = ["std"]
default = ["std", "sync_reader"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make the new feature not a default feature, so that we minimize the chance of breaking anyone's code.


pub struct ReadLimiter {
pub limit: usize,
pub read: AtomicUsize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this an AtomicU64 would more closely match the current implementation, which uses a u64. I'm worried that someone might be setting the current limit to a large u64 value, and when this new feature gets enabled their code will break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with AtomicU64, though, is that it is supported on fewer platforms. I don't see a nice solution here...

}

let prev_limit = self.limit.fetch_sub(amount, Ordering::Relaxed);
if prev_limit == amount {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this branch necessary? When prev_limit == amount, the current limit will be 0, and therefore we'll get an error on the next attempt to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's not necessary

if prev_limit == amount {
self.limit_reached.store(true, Ordering::Relaxed);
} else if prev_limit < amount {
self.limit_reached.store(true, Ordering::Relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to cause future reads to fail too, then I think we could do

set.fetch_add(amount, Ordering::Relaxed)

here and not need to worry about adding a limit_reached flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this just resets the value to what it was right before the fetch_sub ?

If we want to use a single variable, we could set the value to 0 here when fully consumed (since the new value after fetch_sub may have underflow) and check if we have enough remaining limit before reading on each call.

@appaquet
Copy link
Contributor Author

Friendly ping @dwrensha. Let me know if there is anything I can do to land this faster :)

@dwrensha
Copy link
Member

I would like get this pull request landed, but I think we first need to resolve the usize vs u64 discrepancy. Imagine if a crate today is setting traversal_limit_in_words to u64::MAX to effectively mean "no traversal limit". If the sync_reader feature is enabled, the crate will start to panic on 32-bit systems in the ReadLimiter::new() function. Note that crates don't always get to control when features are enabled -- if the crate imports some other library that uses capnp, and that other library enables the feature, then the feature gets globally enabled for the full crate build.

I think a satisfactory path forward would be:

  1. make ReaderOptions::traversal_limit_in_words have type Option<usize>, where None means "no limit".
  2. when the sync_reader feature flag is not enabled, ReadLimiter has a limit: Cell<usize> field.
  3. when the sync_reader feature flag is enabled, ReadLimiter instead has a limit: AtomicUsize field.
  4. ReadLimiter also gets a panic_on_limit_exceeded: bool field, for when traversal_limit_in_words is None
  5. To avoid excessive branching, limit always gets updated on can_read(). panic_on_limit_exceeded is only checked when the limit is exceeded.
  6. We bump the release version to 0.14, because this is a breaking change.

How does that sound?

Relatedly, I've been working on the arena type parameter approach (work in progress in #210), and while I've made some progress and learned some things, I would no longer describe a solution as "within reach" using that approach, because there are a lot of details to be worked out, and things generally look complicated.

@appaquet
Copy link
Contributor Author

I agree for the u64 vs usize problems for 32-bit systems. The initial proposal was to prevent breaking changes, but I agree that making things right is more beneficial that the cost of the breaking change.

Steps 1-2-3 makes sense.

Step 4-5 are confusing me here. Which limit are we comparing against if the traversal_limit_in_words is none? usize::MAX? Shouldn't the semantic of a None value in ReaderOptions::traversal_limit_in_words rather be that the read limiting is disabled completely? Also, should we panic or simply return an error?

Quite massive refactor you have in #210. I like the removal of the higher-ranked trait bounds. I also don't think it is within reach because there is no clear roadmap for when GATs will actually be stabilized. There are still a few issues to be tackled according to the tracking issue.

@dwrensha
Copy link
Member

dwrensha commented Dec 15, 2020

Shouldn't the semantic of a None value in ReaderOptions::traversal_limit_in_words rather be that the read limiting is disabled completely?

To an outside observer, yes. What I'm trying to describe is an optimization.

Sorry, panic_on_limit_exceeded is not the right name for the field. I don't actually intend to panic. Better would be error_on_limit_exceeded: bool.
For the non-Sync case, I'm imagining something like this:

    pub fn can_read(&self, amount: usize) -> Result<()> {
        let current = self.limit.get();
        if amount > current && self.error_on_limit_exceeded {
            Err(Error::failed(format!("read limit exceeded")))
        } else {
            self.limit.set(current.wrapping_sub(amount));
            Ok(())
        }
    }

So when no limit is enabled, the self.limit variable still gets updated, it just never causes an error.
The point is to avoid having two branches on every call to can_read() when a limit is enabled.

Compare to this naive version, where self.limit stores a Cell<Option<usize>>:

    pub fn can_read(&self, amount: usize) -> Result<()> {
        if let Some(current) = self.limit.get() {
            if amount > current {
                Err(Error::failed(format!("read limit exceeded")))
            } else {
                self.limit.set(current - amount);
                 Ok(())
            }
        }
     }

(Of course, it's possible that this optimization won't have a noticeable effect. It seems easy enough, though, to be worth doing.)

@appaquet
Copy link
Contributor Author

I'm wondering if the sync version should do the same though. The atomic operations are probably most costly than having the extra branching. So disabling the limit should probably try to avoid the atomic load and store.

Which benchmark do you think would be impact the most here? I could probably bench the 2 alternatives.

@dwrensha
Copy link
Member

I suppose I mostly care about avoiding performance regression on the non-sync version.

You could try run_all_benchmarks in the benchmark folder or https://github.com/llogiq/serdebench.

@dwrensha
Copy link
Member

(also, fyi, I've bumped the version numbers in preparation for landing this and other breaking changes: f40b6dc)

@dwrensha
Copy link
Member

Thanks for the pull request and discussion! I am going to merge this and then add a few changes.

@dwrensha dwrensha merged commit 83c76be into capnproto:master Dec 19, 2020
@appaquet
Copy link
Contributor Author

appaquet commented Dec 19, 2020

Sorry for the radio silence. Was planning to do the changes over the weekend. But I'm happy that this got merged!

Saw the other changes you merged on master too. Can't wait for the release!

@dwrensha
Copy link
Member

I looked at a few benchmarks myself. My findings were:

  1. fetch_sub() was significantly slower than sequentially doing load() then store():
    pub fn can_read(&self, amount: usize) -> Result<()> {
    // We use separate AtomicUsize::load() and AtomicUsize::store() steps, which may
    // result in undercounting reads if multiple threads are reading at the same.
    // That's okay -- a denial of service attack will eventually hit the limit anyway.
    //
    // We could instead do a single fetch_sub() step, but that seems to be slower.
    let current = self.limit.load(Ordering::Relaxed);
    if amount > current && self.error_on_limit_exceeded {
    return Err(Error::failed(format!("read limit exceeded")));
    } else {
    // The common case is current >= amount. Note that we only branch once in that case.
    // If we combined the fields into an Option<AtomicUsize>, we would
    // need to branch twice in the common case.
    self.limit.store(current.wrapping_sub(amount), Ordering::Relaxed);
    }
    Ok(())
    }
    .
  2. I was unable to detect a performance difference between Option<AtomicUsize> and AtomicUsize,bool. I went with the latter because I feel like minimizing branches is important.

You are welcome to look into benchmarks too and to propose changes.

My plan is to release 0.14 later today or tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants