Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/rpc endpoints to fetch data from key #4997

Draft
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

hugocaillard
Copy link
Collaborator

Description

Add two new endpoints:

  • /v2/clarity_marf_value/:clarity_marf_key
  • /v2/clarity_metadata/:principal/:contract_name/:clarity_metadata_key

I'm currently working on a Clarinet feature that allows to simulate running (or, said differently, to fork the mainnet state in the simnet data store). This requires the ability to fetch values from marf and metadata keys.

These new endpoints are similar to already existing endpoint (such as getdatavar, getmapentry, getcontractsrc, etc).

Applicable issues

N/A

Additional info (benefits, drawbacks, caveats)

Read more about this feature in the Clarinet issue: hirosystems/clarinet#1503

I confirmed that the 2 new endpoints allow to achieve the desired goal by running a local devnet with these changes and forking the Clarinet simnet state from it.

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)

@hugocaillard hugocaillard self-assigned this Jul 23, 2024
@hugocaillard hugocaillard marked this pull request as ready for review July 23, 2024 16:35
@hugocaillard hugocaillard requested review from a team as code owners July 23, 2024 16:35

fn path_regex(&self) -> Regex {
Regex::new(&format!(
r"^/v2/clarity_marf_value/(?P<clarity_marf_key>(vm-epoch::epoch-version)|({})|({}))$",
Copy link
Member

@jcnelson jcnelson Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005. In addition, this would effectively require the RPC endpoint to determine whether or not the key is well-formed, since passing an ill-formed key would require a different HTTP error code than HTTP 404 (which is not something I think you want to waste your time doing).

Instead, you should do the following:

  • Accept the hash of the key as input. Then, you don't have to worry about verifying that it is well-formed, and you don't have to worry about explaining to the RPC API consumer how to construct it. Furthermore, it's much safer and easier to determine if a string is a well-formed hash instead of a well-formed Clarity MARF key.
  • Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).

Copy link
Member

@jcnelson jcnelson Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I dislike the path you chose, since it doesn't leave room for semantic expansion. Instead, can you use the following: /v2/clarity/marf/{key-hash}, where key-hash is the hash of the SIP-005 key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my use case (of Clarinet consuming this API), I already know the key (but not the hash). But I agree that it's not ideal to have the key as a use input.
Can you guide me to some code where I could see how to get the key <> key_hash conversion?

Update the Clarity DB to take the hash as an argument, instead of the key (in support of the above).

Not sure to follow here, where should I perform this change?


I think I just lack knowledge about this repo to fully address this comment, thanks for providing extra context 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- the (string) key's bytes are hashed via SHA512/256 to produce the MARF key.

Not sure to follow here, where should I perform this change?

The trait ClarityBackingStore needs to be modified to add a method like get_data_with_proof(), but which takes an implementation of the MarfTrieId trait instead of a &str key. There currently isn't a way to query the MARF via the ClarityDB with a bare hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with using the MARF key as input. To do so would require an explanation to an RPC API consumer of how to construct a Clarity MARF key, which essentially requires them to read the majority of SIP-005.

Why would getting the hash of this key be any easier for a user of this API? Wouldn't that be even harder? Is there some scenario where a user would already have the hash and then would want to use this endpoint?

Note that adding a new version of get_data_with_proof that takes in the TriePath would also require some significant changes to the MemoryBackingStore, since it currently directly uses the key to access the database.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to follow but I've some questions based on my understading;

"any 32-byte value is a valid MARF key, so HTTP 400 would only be necessary if the value was not a hex-encoded 32-byte hash"

We would return a 404 if the hash encodes an invalid key right? So the difference for the client seems to be quite the same to me. It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.

"so it's not a very big lift to add a trait method for loading a serialized value by TrieHash"

I'm actually struggling to implement it in the MemoryBackingStore (around here), because I don't have access TrieHash::from_data here.
I don't see how to get from the key to the Hash.
Maybe it's trivial but I just don't know the repository good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would return a 404 if the hash encodes an invalid key right?

If this API endpoint only took a hash, then the only application responses to expect are 200 (with the serialized data) or 404 (if the hash does not map to a value). The question of whether or not the hash corresponds to a semantically-valid but absent key or a semantically-invalid key does not need to be answered by the API endpoint.

It's semantically sligthly more correct to return a 404 for a random 32-byte value than it is for an invalid key, but the result for the client is the same.

That's the thing -- I think it's plausible that this makes a significant difference to the client. If the client is an application that loads Clarity data, then the difference between receiving a 400 versus a 404 is that the former requires a bugfix (the client made an invalid key) and would likely result in the user being presented with an error condition to report, whereas the latter indicates that the submitted key is valid but not mapped. Treating both as 404's would lead to a poorer user experience, since users would receive false negatives -- a bug in the application's key construction would be treated the same as the user submitting a query for nonexistent data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add the requisite Clarity DB patches to this PR, once I'm done helping @obycode with #5420.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the difference between receiving a 400 versus a 404 is that the former requires a bugfix

Ok I get the point. Didn't see it as a big deal at first (especially for my current use case), but that's fair.

I'm happy to add the requisite Clarity DB patches to this PR

That would be super helpful. I'm worried about the time it would take me to get it rigth, and would probably lead to more questions / reviews

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added the requisite ClarityDB functionality, and plumbed it through into this RPC endpoint.

let contract_identifier = self.contract_identifier.take().ok_or(NetError::SendError(
"`contract_identifier` not set".to_string(),
))?;
let clarity_metadata_key = self.clarity_metadata_key.take().ok_or(NetError::SendError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity metadata keys have a well-defined structure, so the RPC endpoint should return HTTP 400 if the metadata key is ill-formed. You will need to expand this method (or try_parse_request -- up to you) to validate the structure of the Clarity metadata key, including determining whether or not the key's StoreType and var_name values are supported values. var_name will, unfortunately, take some effort because the Clarity DB codebase uses bare string literals in its calls to ClarityDatabase::make_metadata_key() instead of enums, so your PR should address this as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently the metadata var_name can be an arbitrary string, where here it's a variable_name, a map_name, a token_name).

Can we really use an enum here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; however, some reserved var_name strings can only be paired with certain StoreTypes. For example, the use of StoreType::Contract would require var_name to be contract-size, contract-src, contract-data-size, or contract, but nothing else.

One way to implement these constraints could be to implement an enum to capture all of the valid StoreType / var-name pairings, and permit only the StoreType variants which allow arbitrary var_name values to have arbitrary var_name values.

@hugocaillard hugocaillard marked this pull request as draft July 29, 2024 10:27
@hugocaillard hugocaillard force-pushed the feat/rpc-endpoints-to-fetch-data-from-key branch from dca20bd to 6260c16 Compare November 5, 2024 17:55

lazy_static! {
static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String =
"[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is very problematic because it matches strings of unbounded length, which opens a DoS vector on the node.

static ref CLARITY_NAME_NO_BOUNDARIES_REGEX_STRING: String =
"[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*|[-+=/*]|[<>]=?".into();
static ref METADATA_KEY_REGEX_STRING: String = format!(
r"vm-metadata::\d+::(contract|contract-size|contract-src|contract-data-size|({}))",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here with \d+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the \d+ must be only one of a handful of values, you should just match them explicitly.

@jcnelson
Copy link
Member

jcnelson commented Nov 8, 2024

I went ahead and made the requisite changes to the MARF, ClarityBackingStore and all its implementations, and ClarityDB so you can load values by TrieHash. I also removed TriePath from the codebase, since it's used exactly the same way that TrieHash is.

My comments on the getter for Clarity metadata still stand, however. If we're going to take metadata keys as input, we need to validate them and return HTTP 400 if they're semantically invalid.

@jcnelson
Copy link
Member

jcnelson commented Nov 9, 2024 via email

@obycode
Copy link
Contributor

obycode commented Nov 9, 2024

The reason we didn't do that is because that invokes the allocator too much, per perf.

Ah, okay. Thanks for explaining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants