Feat/rpc endpoints to fetch data from key #4997

hugocaillard · 2024-07-23T14:28:43Z

Description

Add two new endpoints:

/v2/clarity_marf_value/:clarity_marf_key
/v2/clarity_metadata/:principal/:contract_name/:clarity_metadata_key

I'm currently working on a Clarinet feature that allows to simulate running (or, said differently, to fork the mainnet state in the simnet data store). This requires the ability to fetch values from marf and metadata keys.

These new endpoints are similar to already existing endpoint (such as getdatavar, getmapentry, getcontractsrc, etc).

Applicable issues

N/A

Additional info (benefits, drawbacks, caveats)

Read more about this feature in the Clarinet issue: hirosystems/clarinet#1503

I confirmed that the 2 new endpoints allow to achieve the desired goal by running a local devnet with these changes and forking the Clarinet simnet state from it.

Checklist

Test coverage for new or modified code paths
Changelog is updated
Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)

clarity/src/vm/representations.rs

jcnelson · 2024-07-23T17:58:14Z

stackslib/src/net/api/getclaritymarfvalue.rs

+
+    fn path_regex(&self) -> Regex {
+        Regex::new(&format!(
+            r"^/v2/clarity_marf_value/(?P<clarity_marf_key>(vm-epoch::epoch-version)|({})|({}))$",


I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005. In addition, this would effectively require the RPC endpoint to determine whether or not the key is well-formed, since passing an ill-formed key would require a different HTTP error code than HTTP 404 (which is not something I think you want to waste your time doing).

Instead, you should do the following:

Accept the hash of the key as input. Then, you don't have to worry about verifying that it is well-formed, and you don't have to worry about explaining to the RPC API consumer how to construct it. Furthermore, it's much safer and easier to determine if a string is a well-formed hash instead of a well-formed Clarity MARF key.

Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).

Also, I dislike the path you chose, since it doesn't leave room for semantic expansion. Instead, can you use the following: /v2/clarity/marf/{key-hash}, where key-hash is the hash of the SIP-005 key.

In my use case (of Clarinet consuming this API), I already know the key (but not the hash). But I agree that it's not ideal to have the key as a use input.
Can you guide me to some code where I could see how to get the key <> key_hash conversion?

Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).

Not sure to follow here, where should I perform this change?

I think I just lack knowledge about this repo to fully address this comment, thanks for providing extra context 🙏

Sure -- the (string) key's bytes are hashed via SHA512/256 to produce the MARF key.

Not sure to follow here, where should I perform this change?

The trait ClarityBackingStore needs to be modified to add a method like get_data_with_proof(), but which takes an implementation of the MarfTrieId trait instead of a &str key. There currently isn't a way to query the MARF via the ClarityDB with a bare hash.

I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005.

Why would getting the hash of this key be any easier for a user of this API? Wouldn't that be even harder? Is there some scenario where a user would already have the hash and then would want to use this endpoint?

Note that adding a new version of get_data_with_proof that takes in the TriePath would also require some significant changes to the MemoryBackingStore, since it currently directly uses the key to access the database.

It's a bit hard to follow but I've some questions based on my understading;

"any 32-byte value is a valid MARF key, so HTTP 400 would only be necessary if the value was not a hex-encoded 32-byte hash"

We would return a 404 if the hash encodes an invalid key right? So the difference for the client seems to be quite the same to me. It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.

"so it's not a very big lift to add a trait method for loading a serialized value by TrieHash"

I'm actually struggling to implement it in the MemoryBackingStore (around here), because I don't have access TrieHash::from_data here.
I don't see how to get from the key to the Hash.
Maybe it's trivial but I just don't know the repository good enough.

We would return a 404 if the hash encodes an invalid key right?

If this API endpoint only took a hash, then the only application responses to expect are 200 (with the serialized data) or 404 (if the hash does not map to a value). The question of whether or not the hash corresponds to a semantically-valid but absent key or a semantically-invalid key does not need to be answered by the API endpoint.

It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.

That's the thing -- I think it's plausible that this makes a significant difference to the client. If the client is an application that loads Clarity data, then the difference between receiving a 400 versus a 404 is that the former requires a bugfix (the client made an invalid key) and would likely result in the user being presented with an error condition to report, whereas the latter indicates that the submitted key is valid but not mapped. Treating both as 404's would lead to a poorer user experience, since users would receive false negatives -- a bug in the application's key construction would be treated the same as the user submitting a query for nonexistent data.

I'm happy to add the requisite Clarity DB patches to this PR, once I'm done helping @obycode with #5420.

then the difference between receiving a 400 versus a 404 is that the former requires a bugfix

Ok I get the point. Didn't see it as a big deal at first (especially for my current use case), but that's fair.

I'm happy to add the requisite Clarity DB patches to this PR

That would be super helpful. I'm worried about the time it would take me to get it rigth, and would probably lead to more questions / reviews

I went ahead and added the requisite ClarityDB functionality, and plumbed it through into this RPC endpoint.

stackslib/src/net/api/getclaritymetadata.rs

jcnelson · 2024-07-23T18:07:18Z

stackslib/src/net/api/getclaritymetadata.rs

+        let contract_identifier = self.contract_identifier.take().ok_or(NetError::SendError(
+            "`contract_identifier` not set".to_string(),
+        ))?;
+        let clarity_metadata_key = self.clarity_metadata_key.take().ok_or(NetError::SendError(


Clarity metadata keys have a well-defined structure, so the RPC endpoint should return HTTP 400 if the metadata key is ill-formed. You will need to expand this method (or try_parse_request -- up to you) to validate the structure of the Clarity metadata key, including determining whether or not the key's StoreType and var_name values are supported values. var_name will, unfortunately, take some effort because the Clarity DB codebase uses bare string literals in its calls to ClarityDatabase::make_metadata_key() instead of enums, so your PR should address this as well.

Apparently the metadata var_name can be an arbitrary string, where here it's a variable_name, a map_name, a token_name).

Can we really use an enum here?

Yes; however, some reserved var_name strings can only be paired with certain StoreTypes. For example, the use of StoreType::Contract would require var_name to be contract-size, contract-src, contract-data-size, or contract, but nothing else.

One way to implement these constraints could be to implement an enum to capture all of the valid StoreType / var-name pairings, and permit only the StoreType variants which allow arbitrary var_name values to have arbitrary var_name values.

stackslib/src/net/httpcore.rs

clarity/src/vm/representations.rs

…o, add `MARF::get_by_path()` to look up `MARFValue`s by `TrieHash` instead of by `&str` keys, and add the relevant `ClarityBackingStore` implementation to the stackslib's read-only and writeable MARF stores

…arityBackingStore::get_data_from_path`

…key hash, instead of by key

…ithub.com/stacks-network/stacks-blockchain into feat/rpc-endpoints-to-fetch-data-from-key

jcnelson · 2024-11-08T20:32:47Z

stackslib/src/net/api/getclaritymetadata.rs

+
+lazy_static! {
+    static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String =
+        "[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into();


This regex is very problematic because it matches strings of unbounded length, which opens a DoS vector on the node.

jcnelson · 2024-11-08T20:32:58Z

stackslib/src/net/api/getclaritymetadata.rs

+    static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String =
+        "[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into();
+    static ref METADATA_KEY_REGEX_STRING: String = format!(
+        r"vm-metadata::\d+::(contract|contract-size|contract-src|contract-data-size|({}))",


Same problem here with \d+

Since the \d+ must be only one of a handful of values, you should just match them explicitly.

jcnelson · 2024-11-08T20:38:43Z

I went ahead and made the requisite changes to the MARF, ClarityBackingStore and all its implementations, and ClarityDB so you can load values by TrieHash. I also removed TriePath from the codebase, since it's used exactly the same way that TrieHash is.

My comments on the getter for Clarity metadata still stand, however. If we're going to take metadata keys as input, we need to validate them and return HTTP 400 if they're semantically invalid.

…nd the same value by get_by_hash(hash(key))

…ithub.com/stacks-network/stacks-blockchain into feat/rpc-endpoints-to-fetch-data-from-key

stacks-common/src/types/chainstate.rs

jcnelson · 2024-11-09T16:35:24Z

The reason we *didn't* do that is because that invokes the allocator too much, per perf.

…

On Sat, Nov 9, 2024, 8:48 AM Brice Dobry ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In stacks-common/src/types/chainstate.rs <#4997 (comment)> : > + let s = format!("{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}", + self.0[0], self.0[1], self.0[2], self.0[3], + self.0[4], self.0[5], self.0[6], self.0[7], + self.0[8], self.0[9], self.0[10], self.0[11], + self.0[12], self.0[13], self.0[14], self.0[15], + self.0[16], self.0[17], self.0[18], self.0[19], + self.0[20], self.0[21], self.0[22], self.0[23], + self.0[24], self.0[25], self.0[26], self.0[27], + self.0[28], self.0[29], self.0[30], self.0[31]); + s pub fn to_string(&self) -> String { self.0.iter().map(|byte| format!("{:02x}", byte)).collect()} — Reply to this email directly, view it on GitHub <#4997 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQJK4FVL5QOLAJUWV5NRDZ7YHEJAVCNFSM6AAAAABLKR7LICVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMRVGM2DQMBUHA> . You are receiving this because your review was requested.Message ID: ***@***.***>

obycode · 2024-11-09T16:41:38Z

The reason we didn't do that is because that invokes the allocator too much, per perf.

Ah, okay. Thanks for explaining.

lgalabru and others added 2 commits July 23, 2024 16:15

feat: given a MARF key constructed client side, retrieve clarity value

26cf0da

feat: rpc endpoint to retrieve clarity metadata

931f315

hugocaillard requested a review from obycode July 23, 2024 14:28

hugocaillard self-assigned this Jul 23, 2024

hugocaillard added 2 commits July 23, 2024 17:20

docs: add get_clarity_mark_value and get_clarity_metadata documentation

ccf60f1

refactor: improve clarity_mark_key and clarity_metadata request parsing

26c4487

hugocaillard marked this pull request as ready for review July 23, 2024 16:35

hugocaillard requested review from a team as code owners July 23, 2024 16:35