Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AsciiSet::EMPTY and boolean operators #969

Merged
merged 3 commits into from
Sep 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 83 additions & 1 deletion percent_encoding/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ use alloc::{
string::String,
vec::Vec,
};
use core::{fmt, mem, slice, str};
use core::{fmt, mem, ops, slice, str};

/// Represents a set of characters or bytes in the ASCII range.
///
Expand All @@ -66,6 +66,7 @@ use core::{fmt, mem, slice, str};
/// /// https://url.spec.whatwg.org/#fragment-percent-encode-set
/// const FRAGMENT: &AsciiSet = &CONTROLS.add(b' ').add(b'"').add(b'<').add(b'>').add(b'`');
/// ```
#[derive(Debug, PartialEq, Eq)]
pub struct AsciiSet {
mask: [Chunk; ASCII_RANGE_LEN / BITS_PER_CHUNK],
}
Expand All @@ -77,6 +78,11 @@ const ASCII_RANGE_LEN: usize = 0x80;
const BITS_PER_CHUNK: usize = 8 * mem::size_of::<Chunk>();

impl AsciiSet {
/// An empty set.
pub const EMPTY: AsciiSet = AsciiSet {
Copy link

@ForsakenHarmony ForsakenHarmony Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's now inconsistent with the existing constants and functions taking &'static AsciiSet.

Copy link
Contributor Author

@joshka joshka Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't 100% sure about that. I went with EMPTY being a constant on the AsciiSet as empty seems like an inherent property of a type, but the other constants seem like usages of AsciiSet. I was 70/30% on this being right, so wouldn't object to this being changed to be consistent with the other constants.

The rationale for making the constants references rather than just values all seemed odd to me. What was that necessary for?

Edit: let's disuss on #970 instead of here.

mask: [0; ASCII_RANGE_LEN / BITS_PER_CHUNK],
};

/// Called with UTF-8 bytes rather than code points.
/// Not used for non-ASCII bytes.
const fn contains(&self, byte: u8) -> bool {
Expand All @@ -100,6 +106,39 @@ impl AsciiSet {
mask[byte as usize / BITS_PER_CHUNK] &= !(1 << (byte as usize % BITS_PER_CHUNK));
AsciiSet { mask }
}

/// Return the union of two sets.
pub const fn union(&self, other: Self) -> Self {
let mask = [
self.mask[0] | other.mask[0],
self.mask[1] | other.mask[1],
self.mask[2] | other.mask[2],
self.mask[3] | other.mask[3],
];
AsciiSet { mask }
}

/// Return the negation of the set.
pub const fn complement(&self) -> Self {
let mask = [!self.mask[0], !self.mask[1], !self.mask[2], !self.mask[3]];
AsciiSet { mask }
}
}

impl ops::Add for AsciiSet {
type Output = Self;

fn add(self, other: Self) -> Self {
self.union(other)
}
}

impl ops::Not for AsciiSet {
type Output = Self;

fn not(self) -> Self {
self.complement()
}
}

/// The set of 0x00 to 0x1F (C0 controls), and 0x7F (DEL).
Expand Down Expand Up @@ -478,3 +517,46 @@ fn decode_utf8_lossy(input: Cow<'_, [u8]>) -> Cow<'_, str> {
}
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn add_op() {
let left = AsciiSet::EMPTY.add(b'A');
let right = AsciiSet::EMPTY.add(b'B');
let expected = AsciiSet::EMPTY.add(b'A').add(b'B');
assert_eq!(left + right, expected);
}

#[test]
fn not_op() {
let set = AsciiSet::EMPTY.add(b'A').add(b'B');
let not_set = !set;
assert!(!not_set.contains(b'A'));
assert!(not_set.contains(b'C'));
}

/// This test ensures that we can get the union of two sets as a constant value, which is
/// useful for defining sets in a modular way.
#[test]
fn union() {
const A: AsciiSet = AsciiSet::EMPTY.add(b'A');
const B: AsciiSet = AsciiSet::EMPTY.add(b'B');
const UNION: AsciiSet = A.union(B);
const EXPECTED: AsciiSet = AsciiSet::EMPTY.add(b'A').add(b'B');
assert_eq!(UNION, EXPECTED);
}

/// This test ensures that we can get the complement of a set as a constant value, which is
/// useful for defining sets in a modular way.
#[test]
fn complement() {
const BOTH: AsciiSet = AsciiSet::EMPTY.add(b'A').add(b'B');
const COMPLEMENT: AsciiSet = BOTH.complement();
assert!(!COMPLEMENT.contains(b'A'));
assert!(!COMPLEMENT.contains(b'B'));
assert!(COMPLEMENT.contains(b'C'));
}
}
Loading