-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External binary representation support (SEND/RECEIVE for custom types) #887
base: develop
Are you sure you want to change the base?
External binary representation support (SEND/RECEIVE for custom types) #887
Conversation
0564fc4
to
0b5a248
Compare
For this reason, Postgres allows types to have an external binary representation. Also, some clients insist on using binary representation. Solution: introduce SendRecvFuncs trait and `sendrecvfuncs` attribute These are used to specify how external binary representation encoding is accomplished.
0b5a248
to
91d0d68
Compare
Several traits all reusing the same pattern suggests we should be attempting to create a more generic interface instead of copypasta. |
I am not against having a "more generic" interface, but since we don't have any specific proposal for one, can we work on getting this one since it's of very practical use as is? Even if under a feature gate if so desired. I've spent a good amount of time getting it done following the current approach found in the library (in/out type of traits), and it feels that saying "we should now find a common interface" is an unfair treatment of the PR. So I would propose to get it into a mergeable shape first and get it in. We can work on generalizing interfaces for these things as a follow-up. What do you think? |
My concern is not necessarily a blocking concern, just something of note. |
As discussed in private, I understand the concern of
You suggested that perhaps in most cases users will not need different serialization (between varlena and send/receive). While I tend to agree, I don't think this should mean we should not make it possible to specify a send/receive-specific implementation. I feel like this can be a very productive discussion that can help us design the next iteration of API and make this part of pgx much smoother. I'd love to be a part of the conversation. My only concern is making external binary format (as a feature) depend on the timeline of this discussion. I suggest getting it done first (with the caveat of some potentially changing APIs; but we're pre 1.0 so we can afford some refining) and then, once there's a good consensus, transition to a better design. |
Yes
I think it probably should be. We're at the point where a better approach is necessary. I've been thinking about this over the holiday and I'd like to take a stab at making all of this better. I have some ideas, but don't quite yet have the time to focus on it. |
If you don't have time for this, perhaps it's a case of "better is the enemy of good," and we should consider something like this PR to provide the functionality first, and improve it later? |
I don't quite have the time yet. I will soon and I'm not all that excited about adding some code to paper over design flaws now, just to remove it all in say, a week. |
In practice, almost all implementations and users of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually two changesets here, and we can accept one if it takes a slightly different shape.
/// Reads a range of bytes, modifying the underlying cursor to reflect what was read | ||
/// | ||
/// Returns None if the underlying remaining binary is smaller than requested with the range. | ||
/// | ||
/// Ranges can start from an offset, resulting in skipped information. | ||
/// | ||
/// Most common use-case for this is reading the underlying data in full (`read(..)`) | ||
pub fn read<R: RangeBounds<usize>>(&mut self, range: R) -> Option<&[u8]> { | ||
use std::ffi::c_int; | ||
let remaining = unsafe { (*self.sid).len - (*self.sid).cursor } as usize; | ||
let start = match range.start_bound() { | ||
Bound::Included(bound) => *bound, | ||
Bound::Excluded(bound) => *bound + 1, | ||
Bound::Unbounded => 0, | ||
}; | ||
let end = match range.end_bound() { | ||
Bound::Included(bound) => *bound, | ||
Bound::Excluded(bound) => *bound - 1, | ||
Bound::Unbounded => remaining, | ||
}; | ||
let total = end - start; | ||
|
||
if total > remaining { | ||
return None; | ||
} | ||
|
||
// safe: self.sid will never be null | ||
Some(unsafe { | ||
if (*self.sid).data.is_null() { | ||
&[] | ||
} else { | ||
(*self.sid).cursor += start as c_int; | ||
let result = std::slice::from_raw_parts( | ||
(*self.sid).data.add((*self.sid).cursor as usize) as *const u8, | ||
total, | ||
); | ||
(*self.sid).cursor += total as c_int; | ||
result | ||
} | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably just impl std::io::Read
using similar underlying code, which can be extracted into another PR. str
implements a similar .get(..)
but it doesn't modify cursors. I wouldn't mind seeing this function with a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can totally do this, but I think it would make most sense to do this after #903
/* | ||
Portions Copyright 2019-2021 ZomboDB, LLC. | ||
Portions Copyright 2021-2022 Technology Concepts & Design, Inc. <[email protected]> | ||
|
||
All rights reserved. | ||
|
||
Use of this source code is governed by the MIT license that can be found in the LICENSE file. | ||
*/ | ||
|
||
#[cfg(any(test, feature = "pg_test"))] | ||
#[pgx::pg_schema] | ||
mod tests { | ||
#[allow(unused_imports)] | ||
use crate as pgx_tests; | ||
|
||
use pgx::*; | ||
|
||
#[pg_test] | ||
fn test_string_info_read_full() { | ||
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]); | ||
assert_eq!(string_info.read(..), Some(&[1, 2, 3, 4, 5][..])); | ||
assert_eq!(string_info.read(..), Some(&[][..])); | ||
assert_eq!(string_info.read(..=1), None); | ||
} | ||
|
||
#[pg_test] | ||
fn test_string_info_read_offset() { | ||
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]); | ||
assert_eq!(string_info.read(1..), Some(&[2, 3, 4, 5][..])); | ||
assert_eq!(string_info.read(..), Some(&[][..])); | ||
} | ||
|
||
#[pg_test] | ||
fn test_string_info_read_cap() { | ||
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]); | ||
assert_eq!(string_info.read(..=1), Some(&[1][..])); | ||
assert_eq!(string_info.read(1..=2), Some(&[3][..])); | ||
assert_eq!(string_info.read(..), Some(&[4, 5][..])); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests, of course, can also go in to the other PR.
This is not convenient, especially because it requires a forked version of `cargo-pgx` Solution: backport pgcentralfoundation/pgrx#887 into pg_crdt pgcentralfoundation/pgrx#887 has stalled and it is unknown when and if it'll make it into the mainline. The approach may change.
Any update on getting this merged for #1364? |
Problem: text type representation is not always efficient
For this reason, Postgres allows types to have an external binary representation. Also, some clients insist on using binary representation.
Solution: introduce SendRecvFuncs trait and
sendrecvfuncs
attributeThese are used to specify how external binary representation encoding is accomplished.