-
-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared memory hash table #1379
base: develop
Are you sure you want to change the base?
Shared memory hash table #1379
Conversation
|
@thomcc Hi Thom, I'm getting a segfault here and the backtrace (comment above) is showing that the linker picked up the wrong To repro:
Thank you! |
not this shit again. |
Yeah I also lied about this being infinitely large...still hitting |
we got #1351 over that, it's weird. |
Please feel free to make PgSpinLock::new a |
Will do! |
Yes, I should finish this... |
I just kicked all the open PRs for CI reasons, don't worry~ |
Looks like |
Whoops.
|
Ah forgot a |
Look at that build queue! $$$ |
Now I feel bad! Maybe we could have just one job sniff testing basics before launching the whole test suite? |
pgrx/src/shmem_hash.rs
Outdated
|
||
#[derive(Debug, Eq, PartialEq)] | ||
#[non_exhaustive] | ||
pub enum Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A public error type shall be given a name that does not collide with the name of the Error trait.
pgrx/src/shmem_hash.rs
Outdated
/// A shared memory HashMap using Postgres' `HTAB`. | ||
/// This HashMap is used for `pg_stat_statements` and Postgres | ||
/// internals to store key/value pairs in shared memory. | ||
pub struct PgHashMap<K: Copy + Clone, V: Copy + Clone> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type for new APIs shall not be prefixed with "Pg". It's Postgres, everything's Pg everywhere, all these types are namespaced via pgrx::
, and this style is inconsistently used at best (Array vs. PgHeapTuple, for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShmemHashMap would be fine.
pgrx/src/shmem_hash.rs
Outdated
impl<K: Copy + Clone, V: Copy + Clone> PgHashMap<K, V> { | ||
/// Create new `PgHashMap`. This still needs to be allocated with | ||
/// `pg_shmem_init!` just like any other shared memory structure. | ||
pub const fn new(size: u64) -> PgHashMap<K, V> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's common for ::new()
to be semantically equivalent to ::default()
, so the size parameter's purpose should be documented rather than left implicit, if we require it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be made clear why it's a u64 instead of the more common usize. Yes, the difference "doesn't matter", but the difference between those two types informs what purpose it serves.
pgrx/src/spinlock.rs
Outdated
@@ -28,6 +28,7 @@ use std::{cell::UnsafeCell, marker::PhantomData}; | |||
/// [`storage/spin.h`]: | |||
/// https://github.com/postgres/postgres/blob/1f0c4fa255253d223447c2383ad2b384a6f05854/src/include/storage/spin.h | |||
#[doc(alias = "slock_t")] | |||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will just say
PgSpinLock { item: UnsafeCell { .. }, lock: UnsafeCell { .. } }
Please impl Debug explicitly with something more concise.
I certainly would not reject PRs improving our test suite's economy of CPU cycles! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I realize reviewing twice on entirely different parts of the PR like this may be slightly unusual(?), and I appreciate you bearing with me here. My first pass often is about smaller nits because once they are addressed, it's easier to focus on the finer details.
/// * `size` - Maximum number of elements in the HashMap. This is allocated | ||
/// at server start and cannot be changed. `i64` is the expected type | ||
/// for `pg_sys::ShmemInitHash`, so we don't attempt runtime conversions | ||
/// unnecessarily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...okay, "you only can allocate this once ever" is pretty coherent as to why we need it, yeah. Though I believe a runtime conversion that happens once ever is... fine?
/// Get the number of elements in the HashMap. | ||
pub fn len(&self) -> u64 { | ||
pub fn len(&self) -> i64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me, this needs capacity(&self)
as well.
/// Remove the value from the `ShmemHashMap` and return it. | ||
/// If the key doesn't exist, return None. | ||
pub fn remove(&self, key: K) -> Option<V> { | ||
if let Some(value) = self.get(key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably use let-else, like so:
if let Some(value) = self.get(key) { | |
let Some(value) = self.get(key) else { return None }; |
} else { | ||
let value_ptr = value_ptr!(entry); | ||
let value = unsafe { std::ptr::read(value_ptr) }; | ||
return Some(value.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Some(value.value); | |
Some(value.value) |
}; | ||
|
||
if entry.is_null() { | ||
return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None; | |
None |
return Some(value); | ||
} else { | ||
return None; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you do use let-else this should probably be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this, but then it will want you to hit it with rustfmt.
return Some(value); | |
} else { | |
return None; | |
} | |
Some(value) |
let value_ptr = value_ptr!(entry); | ||
let value = unsafe { std::ptr::read(value_ptr) }; | ||
return Some(value.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this returns V
and not &V
? Because we cannot rely on concurrent changes not altering the data in the K/V pair? It would be good to understand the soundness conditions better here and why things look the way they do.
/// A shared memory HashMap using Postgres' `HTAB`. | ||
/// This HashMap is used for `pg_stat_statements` and Postgres | ||
/// internals to store key/value pairs in shared memory. | ||
pub struct ShmemHashMap<K: Copy + Clone, V: Copy + Clone> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please weaken the bounds for the struct and instead keep them on the implementation. It should be made clear why certain functions require the bounds they do in order to be sound, instead of plastering the bounds across the entire API. Aside from leaking implementation details, it makes it unclear where the soundness requirements actually start, which means later refactoring that attempts to weaken the bounds can be made unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it's fine if the bounds are overeagerly applied in cases where it's not clear, but e.g. I have done a lot of this refactoring on Array<'_, T>
and List<T>
recently, and in general I tried to pull the scope tighter.
let htab_ptr = unsafe { | ||
pg_sys::ShmemInitHash( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconditions, postconditions, "why is this sound?", etc.
/// HTAB protected by a SpinLock. | ||
htab: OnceCell<PgSpinLock<ShmemHashMapInner>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the use of OnceCell remain correct in the event of this being fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine if the answer is "no", I just want to check to make sure if this should get a note relating it back to that issue.
@@ -11,22 +11,35 @@ use pgrx::prelude::*; | |||
use pgrx::{pg_shmem_init, PgAtomic, PgLwLock, PgSharedMemoryInitialization}; | |||
use std::sync::atomic::AtomicBool; | |||
|
|||
#[cfg(feature = "cshim")] | |||
use pgrx::PgHashMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pgrx::PgHashMap; | |
use pgrx::ShmemHashMap; |
static ATOMIC: PgAtomic<AtomicBool> = PgAtomic::new(); | ||
static LWLOCK: PgLwLock<bool> = PgLwLock::new(); | ||
|
||
#[cfg(feature = "cshim")] | ||
static HASH_MAP: PgHashMap<i64, i64> = PgHashMap::new(500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static HASH_MAP: PgHashMap<i64, i64> = PgHashMap::new(500); | |
static HASH_MAP: ShmemHashMap<i64, i64> = ShmemHashMap::new(500); |
Shared memory hash table. Implementation based (pretty much copied from) on
pg_stat_statements
. Subject to the same shared memory limitations as the rest of shared memory data structures, but still cool to have a native Postgres HashMap implementation in shared memory.