-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Select Queries with Unchecked Offset #365
Simple Select Queries with Unchecked Offset #365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work !! Left a few comments regarding API.
planner: &mut QueryPlanner<'a>, | ||
results: Vec<PgSqlRow>, | ||
) -> Result<()> { | ||
let mut exec_query = generate_query_execution_with_keys(&mut parsed, &planner.settings)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure this query ALWAYS has limit and offset set by default if not set already and have limit < MAX_LIMIT that we can support.
Not sure it's there already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Do you think it should be done inside Parsil when parsing the query or it's ok to let the integration test enforce it? Doing it in Parsil should be preferable I guess but we will need to somehow provide the MAX_LIMIT
constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let row_tree_info = RowInfo { | ||
satisfiying_rows: matching_rows | ||
.iter() | ||
.map(|(key, _, _)| key) | ||
.cloned() | ||
.collect(), | ||
tree: &planner.table.row, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's ok to use the tree now assuming we have this maximum small LIMIT enforced. Otherwise, we need to use the cache,the wide_lineage stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially tried to do it with wide_lineage but I thought it was kind of useless because it will yield only the keys of the nodes, but then for each node I would still have to call get_node_info
to build the path. So the code complexity didn't seem really worthy to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let sibling_hash = array::from_fn(|i| { | ||
siblings | ||
.get(i) | ||
.and_then(|sibling| { | ||
sibling | ||
.clone() | ||
.and_then(|node| Some(node.compute_node_hash(index_id.to_field()))) | ||
}) | ||
.unwrap_or(*empty_poseidon_hash()) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why requiring the sibling nodes if you only compute the hash ? We can access the hash of any node in the tree, so i can do let left_payload = tree.fetch(node.left); let left_hash = left_payload.hash;
Creating the NodeInfo
requires much more calls to SQL since we need to load the children and grand children even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized this when dealing with the integration test, but I didn't change these APIs to have the PR ready as soon as possible. But can definitely do it now, thanks for pointing it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 1d05c47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's a "template" to follow for dist-query ppl, we should at least show the optimized way when possible !
let pis_hash = QueryCircuitInput::ids_for_placeholder_hash( | ||
&planner.pis.predication_operations, | ||
&planner.pis.result, | ||
&planner.query.placeholders, | ||
&planner.pis.bounds, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to call this ? It seems unecessary from user perspective - can't we do it internally ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I added this method initially was to avoid having to pass all these inputs to RevelationCircuitInput::new_revelation_no_results_tree
, to avoid having too many parameters in the interface of the method. So, though new_revelation_unproven_offset
can already compute this internally (given that all these inputs need to be provided anyhow), I though to leave the computation of this outside so that there is no difference in this regard between the 2 types of queries. If you prefer to remove it, I can add these input values to new_revelation_no_results_tree
and compute everything internally. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking is we should pass directly the pis in RevelationCircuitInput::new_revelation_unproven_offset(
so user don't need to do anything on top ? wdyt ?
the PI or a local struct in vdb that contains all the elements that pis needs to give.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit 5fd774f
let input = RevelationCircuitInput::new_revelation_unproven_offset( | ||
indexing_proof, | ||
matching_rows_input, | ||
&planner.pis.bounds, | ||
&planner.query.placeholders, | ||
pis_hash, | ||
&column_ids, | ||
&planner.pis.predication_operations, | ||
&planner.pis.result, | ||
planner.query.limit.unwrap(), | ||
planner.query.offset.unwrap(), | ||
false, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, the planner is like more than 50% of the argument, seems like it would just be better to skip the previous ids_for_placeholder_hash
, give the planner inside here and thus reduce a lot of complexity on both of these calls, wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified revelation circuits APIs in commit 5fd774f to skip computation of ids_for_placeholder_hash
outside of the APIs
index_tree_path, | ||
index_tree_siblings, | ||
); | ||
matching_rows_input.push(MatchingRow::new(row_proof, path, result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why do you require to pass in the results for the revelation proof ? We usually recursion, public_inputs to pass information to upper layers, so why can't we do it now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sorry, forgot that our circuits are actually computing digest of result when doing non aggregated functions, so we need the raw values to recompute digest in final revelation.
b.select_hash( | ||
is_row_node_leaf[i], | ||
&row_proof.tree_hash_target(), | ||
&row_node_hash, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why enforcing that select if we anyway compute it and we have the data ? Can't we just use the one computed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to use the hash fetched from the row proof, generated with the universal circuit. As an optimization for the other type of queries, this circuit will already compute the hash of the node if the row is stored in a leaf node, otherwise the hash exposed as public input will be the hash of the embedded cells tree. So, when processing this proof, we need to enforce that we are recomputing the hash of the node correctly from the hash exposed as public input by the row proof
max_result = if let Some(res) = &max_result { | ||
let current_result: [UInt256Target; S] = | ||
get_result(i).to_vec().try_into().unwrap(); | ||
let is_smaller = b.is_less_than_or_equal_to_u256_arr(res, ¤t_result).0; | ||
// flag specifying whether we must enforce DISTINCT for the current result or not | ||
let must_be_enforced = b.and(is_matching_row, distinct); | ||
let is_smaller = b.and(must_be_enforced, is_smaller); | ||
b.connect(is_smaller.target, must_be_enforced.target); | ||
Some(current_result) | ||
} else { | ||
Some(get_result(i).to_vec().try_into().unwrap()) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered using the digest of the results and just making sure they're all different ? Since checking the digest would only entail 1 digest selection accross the S rows, instead of checking all the results for each row.
Might be worse or better depending on the constants used i guess ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I considered it but I didn't do it eventually because I would have needed to add a comparison gadget for digests, while for vector of u256
it was already available. So, given that we had a strict timeline, I decided to cut the scope and do like this. But should be actually more efficient to implement this logic over digests rather than all the results, so if you think it's ok to spend some time doing it (i.e., should take like half a day at most), I can definitely do it. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realized that to have a comparison for digests, we would need to split each of the 11 field elements of the digest in 32 bit limbs, to employ the same order relationship we currently employ for slices of 32 bit limbs. This split would be done with this method, which however requires about 2 rows per field element (due to range-checks). So, to split a digest into 32 bit limbs, we would need 22 Plonky2 rows. Then, computing the comparison with another digest (already split in 32 bit limbs) will require other 8 rows, so 30 rows in total.
On the other hand, with the current comparison of the results, we need about 3 rows per each result item, so with the same number of rows we would be able to compare up to 10 result items, which looks more than enough to me.
So, TLDR, comparing digests instead of the actual results would become convenient only if we have more than 10 result items per row (i.e., more than 10 items specified in the SELECT
statement of the query), so it doesn't seem really worthy to me, given that it will also require additional work to implement this comparison. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's problem solved :) THanks !
// First, we compute the digest of the results corresponding to this row, as computed in the universal | ||
// query circuit, to check that the results correspond to the one computed by that circuit | ||
let cells_tree_hash = | ||
build_cells_tree(b, &get_result(i)[2..], &ids[2..], &is_item_included[2..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the results could be stuff like balance * 10
for example, it wouldn't match the cell tree as stored in the tree, would it ? A bit lost on why are we re-creating that cell tree hash here since we pass in the results and not the raw values stored in our db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the universal circuit, which is employed to generate the proof for a single row, is still assuming that a results tree will be built for these type of queries. So it accumulates all the results related to this row in a cells tree, and the digest is computed from this cells tree. So here we are recomputing the cells tree in order to correctly recompute the digest. This allows to avoid introducing another variant of the universal circuit just for these type of queries, and instead to re-use the variant we will employ once we will have the results tree integrated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh makes sense thanks for the explanation. Could you put that quickly in a comment above ? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in commit c89fc0f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks ! Just asked for an additional small comment.
Let's maybe hold off for merging until we get #362 in main and integrated already ?
This PR introduces a circuit to expose the results of simple
SELECT
queries without aggregation functions, avoiding the need to build a results tree.