-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/rpc endpoints to fetch data from key #4997
base: develop
Are you sure you want to change the base?
Conversation
|
||
fn path_regex(&self) -> Regex { | ||
Regex::new(&format!( | ||
r"^/v2/clarity_marf_value/(?P<clarity_marf_key>(vm-epoch::epoch-version)|({})|({}))$", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005. In addition, this would effectively require the RPC endpoint to determine whether or not the key is well-formed, since passing an ill-formed key would require a different HTTP error code than HTTP 404 (which is not something I think you want to waste your time doing).
Instead, you should do the following:
- Accept the hash of the key as input. Then, you don't have to worry about verifying that it is well-formed, and you don't have to worry about explaining to the RPC API consumer how to construct it. Furthermore, it's much safer and easier to determine if a string is a well-formed hash instead of a well-formed Clarity MARF key.
- Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I dislike the path you chose, since it doesn't leave room for semantic expansion. Instead, can you use the following: /v2/clarity/marf/{key-hash}
, where key-hash
is the hash of the SIP-005 key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my use case (of Clarinet consuming this API), I already know the key (but not the hash). But I agree that it's not ideal to have the key as a use input.
Can you guide me to some code where I could see how to get the key <> key_hash conversion?
Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).
Not sure to follow here, where should I perform this change?
I think I just lack knowledge about this repo to fully address this comment, thanks for providing extra context 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure -- the (string) key's bytes are hashed via SHA512/256 to produce the MARF key.
Not sure to follow here, where should I perform this change?
The trait ClarityBackingStore
needs to be modified to add a method like get_data_with_proof()
, but which takes an implementation of the MarfTrieId
trait instead of a &str
key. There currently isn't a way to query the MARF via the ClarityDB with a bare hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005.
Why would getting the hash of this key be any easier for a user of this API? Wouldn't that be even harder? Is there some scenario where a user would already have the hash and then would want to use this endpoint?
Note that adding a new version of get_data_with_proof
that takes in the TriePath
would also require some significant changes to the MemoryBackingStore
, since it currently directly uses the key to access the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hard to follow but I've some questions based on my understading;
"any 32-byte value is a valid MARF key, so HTTP 400 would only be necessary if the value was not a hex-encoded 32-byte hash"
We would return a 404 if the hash encodes an invalid key right? So the difference for the client seems to be quite the same to me. It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.
"so it's not a very big lift to add a trait method for loading a serialized value by
TrieHash
"
I'm actually struggling to implement it in the MemoryBackingStore (around here), because I don't have access TrieHash::from_data
here.
I don't see how to get from the key to the Hash.
Maybe it's trivial but I just don't know the repository good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would return a 404 if the hash encodes an invalid key right?
If this API endpoint only took a hash, then the only application responses to expect are 200 (with the serialized data) or 404 (if the hash does not map to a value). The question of whether or not the hash corresponds to a semantically-valid but absent key or a semantically-invalid key does not need to be answered by the API endpoint.
It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.
That's the thing -- I think it's plausible that this makes a significant difference to the client. If the client is an application that loads Clarity data, then the difference between receiving a 400 versus a 404 is that the former requires a bugfix (the client made an invalid key) and would likely result in the user being presented with an error condition to report, whereas the latter indicates that the submitted key is valid but not mapped. Treating both as 404's would lead to a poorer user experience, since users would receive false negatives -- a bug in the application's key construction would be treated the same as the user submitting a query for nonexistent data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then the difference between receiving a 400 versus a 404 is that the former requires a bugfix
Ok I get the point. Didn't see it as a big deal at first (especially for my current use case), but that's fair.
I'm happy to add the requisite Clarity DB patches to this PR
That would be super helpful. I'm worried about the time it would take me to get it rigth, and would probably lead to more questions / reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added the requisite ClarityDB functionality, and plumbed it through into this RPC endpoint.
let contract_identifier = self.contract_identifier.take().ok_or(NetError::SendError( | ||
"`contract_identifier` not set".to_string(), | ||
))?; | ||
let clarity_metadata_key = self.clarity_metadata_key.take().ok_or(NetError::SendError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarity metadata keys have a well-defined structure, so the RPC endpoint should return HTTP 400 if the metadata key is ill-formed. You will need to expand this method (or try_parse_request
-- up to you) to validate the structure of the Clarity metadata key, including determining whether or not the key's StoreType
and var_name
values are supported values. var_name
will, unfortunately, take some effort because the Clarity DB codebase uses bare string literals in its calls to ClarityDatabase::make_metadata_key()
instead of enums, so your PR should address this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently the metadata var_name
can be an arbitrary string, where here it's a variable_name, a map_name, a token_name).
Can we really use an enum here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; however, some reserved var_name
strings can only be paired with certain StoreType
s. For example, the use of StoreType::Contract
would require var_name
to be contract-size
, contract-src
, contract-data-size
, or contract
, but nothing else.
One way to implement these constraints could be to implement an enum to capture all of the valid StoreType
/ var-name
pairings, and permit only the StoreType
variants which allow arbitrary var_name
values to have arbitrary var_name
values.
dca20bd
to
6260c16
Compare
…o, add `MARF::get_by_path()` to look up `MARFValue`s by `TrieHash` instead of by `&str` keys, and add the relevant `ClarityBackingStore` implementation to the stackslib's read-only and writeable MARF stores
…arityBackingStore::get_data_from_path`
…key hash, instead of by key
…ithub.com/stacks-network/stacks-blockchain into feat/rpc-endpoints-to-fetch-data-from-key
|
||
lazy_static! { | ||
static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String = | ||
"[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is very problematic because it matches strings of unbounded length, which opens a DoS vector on the node.
static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String = | ||
"[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into(); | ||
static ref METADATA_KEY_REGEX_STRING: String = format!( | ||
r"vm-metadata::\d+::(contract|contract-size|contract-src|contract-data-size|({}))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here with \d+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the \d+
must be only one of a handful of values, you should just match them explicitly.
I went ahead and made the requisite changes to the MARF, My comments on the getter for Clarity metadata still stand, however. If we're going to take metadata keys as input, we need to validate them and return HTTP 400 if they're semantically invalid. |
…nd the same value by get_by_hash(hash(key))
…ithub.com/stacks-network/stacks-blockchain into feat/rpc-endpoints-to-fetch-data-from-key
The reason we *didn't* do that is because that invokes the allocator too
much, per perf.
…On Sat, Nov 9, 2024, 8:48 AM Brice Dobry ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In stacks-common/src/types/chainstate.rs
<#4997 (comment)>
:
> + let s = format!("{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}",
+ self.0[0], self.0[1], self.0[2], self.0[3],
+ self.0[4], self.0[5], self.0[6], self.0[7],
+ self.0[8], self.0[9], self.0[10], self.0[11],
+ self.0[12], self.0[13], self.0[14], self.0[15],
+ self.0[16], self.0[17], self.0[18], self.0[19],
+ self.0[20], self.0[21], self.0[22], self.0[23],
+ self.0[24], self.0[25], self.0[26], self.0[27],
+ self.0[28], self.0[29], self.0[30], self.0[31]);
+ s
pub fn to_string(&self) -> String {
self.0.iter().map(|byte| format!("{:02x}", byte)).collect()}
—
Reply to this email directly, view it on GitHub
<#4997 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQJK4FVL5QOLAJUWV5NRDZ7YHEJAVCNFSM6AAAAABLKR7LICVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMRVGM2DQMBUHA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Ah, okay. Thanks for explaining. |
Description
Add two new endpoints:
/v2/clarity_marf_value/:clarity_marf_key
/v2/clarity_metadata/:principal/:contract_name/:clarity_metadata_key
I'm currently working on a Clarinet feature that allows to simulate running (or, said differently, to fork the mainnet state in the simnet data store). This requires the ability to fetch values from marf and metadata keys.
These new endpoints are similar to already existing endpoint (such as
getdatavar
,getmapentry
,getcontractsrc
, etc).Applicable issues
N/A
Additional info (benefits, drawbacks, caveats)
Read more about this feature in the Clarinet issue: hirosystems/clarinet#1503
I confirmed that the 2 new endpoints allow to achieve the desired goal by running a local devnet with these changes and forking the Clarinet simnet state from it.
Checklist
docs/rpc/openapi.yaml
andrpc-endpoints.md
for v2 endpoints,event-dispatcher.md
for new events)