Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further sequester Group/Tag code #568

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/raw/bitmask.rs → src/control/bitmask.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use super::imp::{
use super::group::{
BitMaskWord, NonZeroBitMaskWord, BITMASK_ITER_MASK, BITMASK_MASK, BITMASK_STRIDE,
};

Expand Down
3 changes: 1 addition & 2 deletions src/raw/generic.rs → src/control/group/generic.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
use super::bitmask::BitMask;
use super::Tag;
use super::super::{BitMask, Tag};
use core::{mem, ptr};

// Use the native word size as the group size. Using a 64-bit group size on
Expand Down
35 changes: 35 additions & 0 deletions src/control/group/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
cfg_if! {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just copied directly from the raw module, without any additional changes.

// Use the SSE2 implementation if possible: it allows us to scan 16 buckets
// at once instead of 8. We don't bother with AVX since it would require
// runtime dispatch and wouldn't gain us much anyways: the probability of
// finding a match drops off drastically after the first few buckets.
//
// I attempted an implementation on ARM using NEON instructions, but it
// turns out that most NEON instructions have multi-cycle latency, which in
// the end outweighs any gains over the generic implementation.
if #[cfg(all(
target_feature = "sse2",
any(target_arch = "x86", target_arch = "x86_64"),
not(miri),
))] {
mod sse2;
use sse2 as imp;
} else if #[cfg(all(
target_arch = "aarch64",
target_feature = "neon",
// NEON intrinsics are currently broken on big-endian targets.
// See https://github.com/rust-lang/stdarch/issues/1484.
target_endian = "little",
not(miri),
))] {
mod neon;
use neon as imp;
} else {
mod generic;
use generic as imp;
}
}
pub(crate) use self::imp::Group;
pub(super) use self::imp::{
BitMaskWord, NonZeroBitMaskWord, BITMASK_ITER_MASK, BITMASK_MASK, BITMASK_STRIDE,
};
3 changes: 1 addition & 2 deletions src/raw/neon.rs → src/control/group/neon.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
use super::bitmask::BitMask;
use super::Tag;
use super::super::{BitMask, Tag};
use core::arch::aarch64 as neon;
use core::mem;
use core::num::NonZeroU64;
Expand Down
3 changes: 1 addition & 2 deletions src/raw/sse2.rs → src/control/group/sse2.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
use super::bitmask::BitMask;
use super::Tag;
use super::super::{BitMask, Tag};
use core::mem;
use core::num::NonZeroU16;

Expand Down
10 changes: 10 additions & 0 deletions src/control/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
mod bitmask;
mod group;
mod tag;

use self::bitmask::BitMask;
pub(crate) use self::{
bitmask::BitMaskIter,
group::Group,
tag::{Tag, TagSliceExt},
};
81 changes: 81 additions & 0 deletions src/control/tag.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
use core::{fmt, mem};

/// Single tag in a control group.
#[derive(Copy, Clone, PartialEq, Eq)]
#[repr(transparent)]
pub(crate) struct Tag(pub(super) u8);
impl Tag {
/// Control tag value for an empty bucket.
pub(crate) const EMPTY: Tag = Tag(0b1111_1111);

/// Control tag value for a deleted bucket.
pub(crate) const DELETED: Tag = Tag(0b1000_0000);

/// Checks whether a control tag represents a full bucket (top bit is clear).
#[inline]
pub(crate) const fn is_full(self) -> bool {
self.0 & 0x80 == 0
}

/// Checks whether a control tag represents a special value (top bit is set).
#[inline]
pub(crate) const fn is_special(self) -> bool {
self.0 & 0x80 != 0
}

/// Checks whether a special control value is EMPTY (just check 1 bit).
#[inline]
pub(crate) const fn special_is_empty(self) -> bool {
debug_assert!(self.is_special());
self.0 & 0x01 != 0
}

/// Creates a control tag representing a full bucket with the given hash.
#[inline]
#[allow(clippy::cast_possible_truncation)]
pub(crate) const fn full(hash: u64) -> Tag {
// Constant for function that grabs the top 7 bits of the hash.
const MIN_HASH_LEN: usize = if mem::size_of::<usize>() < mem::size_of::<u64>() {
mem::size_of::<usize>()
} else {
mem::size_of::<u64>()
};

// Grab the top 7 bits of the hash. While the hash is normally a full 64-bit
// value, some hash functions (such as FxHash) produce a usize result
// instead, which means that the top 32 bits are 0 on 32-bit platforms.
// So we use MIN_HASH_LEN constant to handle this.
let top7 = hash >> (MIN_HASH_LEN * 8 - 7);
Tag((top7 & 0x7f) as u8) // truncation
}
}
impl fmt::Debug for Tag {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if self.is_special() {
if self.special_is_empty() {
f.pad("EMPTY")
} else {
f.pad("DELETED")
}
} else {
f.debug_tuple("full").field(&(self.0 & 0x7F)).finish()
}
}
}
Comment on lines +52 to +64
Copy link
Contributor Author

@clarfonthey clarfonthey Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to replace the derived debug to help me debug the code I'm going to add in a future PR, so it's a bit easier to read. I figure that if we care about the additional compile time added by this one debug impl so much we can cfg(debug_assertions) gate it later.


/// Extension trait for slices of tags.
pub(crate) trait TagSliceExt {
/// Fills the control with the given tag.
fn fill_tag(&mut self, tag: Tag);

/// Clears out the control.
fn fill_empty(&mut self) {
self.fill_tag(Tag::EMPTY)
}
}
impl TagSliceExt for [Tag] {
fn fill_tag(&mut self, tag: Tag) {
// SAFETY: We have access to the entire slice, so, we can write to the entire slice.
unsafe { self.as_mut_ptr().write_bytes(tag.0, self.len()) }
}
}
Comment on lines +66 to +81
Copy link
Contributor Author

@clarfonthey clarfonthey Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new code.

The TagSliceExt was a mostly impulse decision to be able to do slice.fill_empty() instead of Tag::EMPTY.fill(slice), and since it's solely an internal API anyway it isn't a big deal. The most import part is that this lets us make the inside of a tag private relative to the raw module.

We're not able to just do slice.fill(Tag::EMPTY) and have it optimise into memset by itself, which is why this exists.

2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,9 @@ doc_comment::doctest!("../README.md");
#[macro_use]
mod macros;

mod control;
mod raw;
mod util;

mod external_trait_impls;
mod map;
Expand Down
130 changes: 15 additions & 115 deletions src/raw/mod.rs
Original file line number Diff line number Diff line change
@@ -1,68 +1,19 @@
use crate::alloc::alloc::{handle_alloc_error, Layout};
use crate::control::{BitMaskIter, Group, Tag, TagSliceExt};
use crate::scopeguard::{guard, ScopeGuard};
use crate::util::{invalid_mut, likely, unlikely};
use crate::TryReserveError;
use core::array;
use core::iter::FusedIterator;
use core::marker::PhantomData;
use core::mem;
use core::ptr::NonNull;
use core::slice;
use core::{hint, ptr};

cfg_if! {
// Use the SSE2 implementation if possible: it allows us to scan 16 buckets
// at once instead of 8. We don't bother with AVX since it would require
// runtime dispatch and wouldn't gain us much anyways: the probability of
// finding a match drops off drastically after the first few buckets.
//
// I attempted an implementation on ARM using NEON instructions, but it
// turns out that most NEON instructions have multi-cycle latency, which in
// the end outweighs any gains over the generic implementation.
if #[cfg(all(
target_feature = "sse2",
any(target_arch = "x86", target_arch = "x86_64"),
not(miri),
))] {
mod sse2;
use sse2 as imp;
} else if #[cfg(all(
target_arch = "aarch64",
target_feature = "neon",
// NEON intrinsics are currently broken on big-endian targets.
// See https://github.com/rust-lang/stdarch/issues/1484.
target_endian = "little",
not(miri),
))] {
mod neon;
use neon as imp;
} else {
mod generic;
use generic as imp;
}
}

mod alloc;
pub(crate) use self::alloc::{do_alloc, Allocator, Global};

mod bitmask;

use self::bitmask::BitMaskIter;
use self::imp::Group;

// Branch prediction hint. This is currently only available on nightly but it
// consistently improves performance by 10-15%.
#[cfg(not(feature = "nightly"))]
use core::convert::{identity as likely, identity as unlikely};
#[cfg(feature = "nightly")]
use core::intrinsics::{likely, unlikely};

// FIXME: use strict provenance functions once they are stable.
// Implement it with a transmute for now.
#[inline(always)]
#[allow(clippy::useless_transmute)] // clippy is wrong, cast and transmute are different here
fn invalid_mut<T>(addr: usize) -> *mut T {
unsafe { core::mem::transmute(addr) }
}

#[inline]
unsafe fn offset_from<T>(to: *const T, from: *const T) -> usize {
to.offset_from(from) as usize
Expand Down Expand Up @@ -102,56 +53,6 @@ trait SizedTypeProperties: Sized {

impl<T> SizedTypeProperties for T {}

/// Single tag in a control group.
#[derive(Copy, Clone, PartialEq, Eq, Debug)]
#[repr(transparent)]
struct Tag(u8);
impl Tag {
/// Control tag value for an empty bucket.
const EMPTY: Tag = Tag(0b1111_1111);

/// Control tag value for a deleted bucket.
const DELETED: Tag = Tag(0b1000_0000);

/// Checks whether a control tag represents a full bucket (top bit is clear).
#[inline]
const fn is_full(self) -> bool {
self.0 & 0x80 == 0
}

/// Checks whether a control tag represents a special value (top bit is set).
#[inline]
const fn is_special(self) -> bool {
self.0 & 0x80 != 0
}

/// Checks whether a special control value is EMPTY (just check 1 bit).
#[inline]
const fn special_is_empty(self) -> bool {
debug_assert!(self.is_special());
self.0 & 0x01 != 0
}

/// Creates a control tag representing a full bucket with the given hash.
#[inline]
#[allow(clippy::cast_possible_truncation)]
const fn full(hash: u64) -> Tag {
// Constant for function that grabs the top 7 bits of the hash.
const MIN_HASH_LEN: usize = if mem::size_of::<usize>() < mem::size_of::<u64>() {
mem::size_of::<usize>()
} else {
mem::size_of::<u64>()
};

// Grab the top 7 bits of the hash. While the hash is normally a full 64-bit
// value, some hash functions (such as FxHash) produce a usize result
// instead, which means that the top 32 bits are 0 on 32-bit platforms.
// So we use MIN_HASH_LEN constant to handle this.
let top7 = hash >> (MIN_HASH_LEN * 8 - 7);
Tag((top7 & 0x7f) as u8) // truncation
}
}

/// Primary hash function, used to select the initial bucket to probe from.
#[inline]
#[allow(clippy::cast_possible_truncation)]
Expand Down Expand Up @@ -1577,13 +1478,12 @@ impl RawTableInner {
let buckets =
capacity_to_buckets(capacity).ok_or_else(|| fallibility.capacity_overflow())?;

let result = Self::new_uninitialized(alloc, table_layout, buckets, fallibility)?;
let mut result =
Self::new_uninitialized(alloc, table_layout, buckets, fallibility)?;
// SAFETY: We checked that the table is allocated and therefore the table already has
// `self.bucket_mask + 1 + Group::WIDTH` number of control bytes (see TableLayout::calculate_layout_for)
// so writing `self.num_ctrl_bytes() == bucket_mask + 1 + Group::WIDTH` bytes is safe.
result
.ctrl(0)
.write_bytes(Tag::EMPTY.0, result.num_ctrl_bytes());
result.ctrl_slice().fill_empty();

Ok(result)
}
Expand Down Expand Up @@ -2576,6 +2476,12 @@ impl RawTableInner {
self.ctrl.as_ptr().add(index).cast()
}

/// Gets the slice of all control bytes.
fn ctrl_slice(&mut self) -> &mut [Tag] {
// SAFETY: We've intiailized all control bytes, and have the correct number.
unsafe { slice::from_raw_parts_mut(self.ctrl.as_ptr().cast(), self.num_ctrl_bytes()) }
}

#[inline]
fn buckets(&self) -> usize {
self.bucket_mask + 1
Expand Down Expand Up @@ -3111,10 +3017,7 @@ impl RawTableInner {
#[inline]
fn clear_no_drop(&mut self) {
if !self.is_empty_singleton() {
unsafe {
self.ctrl(0)
.write_bytes(Tag::EMPTY.0, self.num_ctrl_bytes());
}
self.ctrl_slice().fill_empty();
}
self.items = 0;
self.growth_left = bucket_mask_to_capacity(self.bucket_mask);
Expand Down Expand Up @@ -4292,7 +4195,7 @@ mod test_map {
unsafe {
// SAFETY: The `buckets` is power of two and we're not
// trying to actually use the returned RawTable.
let table =
let mut table =
RawTable::<(u64, Vec<i32>)>::new_uninitialized(Global, 8, Fallibility::Infallible)
.unwrap();

Expand All @@ -4301,10 +4204,7 @@ mod test_map {
// SAFETY: We checked that the table is allocated and therefore the table already has
// `self.bucket_mask + 1 + Group::WIDTH` number of control bytes (see TableLayout::calculate_layout_for)
// so writing `table.table.num_ctrl_bytes() == bucket_mask + 1 + Group::WIDTH` bytes is safe.
table
.table
.ctrl(0)
.write_bytes(Tag::EMPTY.0, table.table.num_ctrl_bytes());
table.table.ctrl_slice().fill_empty();

// SAFETY: table.capacity() is guaranteed to be smaller than table.buckets()
table.table.ctrl(0).write_bytes(0, table.capacity());
Expand Down
14 changes: 14 additions & 0 deletions src/util.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// FIXME: Branch prediction hint. This is currently only available on nightly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't feeling particularly creative and will need this in the control module too in the future, so, I decided to separate them into their own module.

Maybe nightly would be a better name, but I don't think it matters that much.

// but it consistently improves performance by 10-15%.
#[cfg(not(feature = "nightly"))]
pub(crate) use core::convert::{identity as likely, identity as unlikely};
#[cfg(feature = "nightly")]
pub(crate) use core::intrinsics::{likely, unlikely};

// FIXME: use strict provenance functions once they are stable.
// Implement it with a transmute for now.
#[inline(always)]
#[allow(clippy::useless_transmute)] // clippy is wrong, cast and transmute are different here
pub(crate) fn invalid_mut<T>(addr: usize) -> *mut T {
unsafe { core::mem::transmute(addr) }
}
Loading