Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimization): Perform array gets as early as possible #5762

Closed
wants to merge 27 commits into from

Conversation

jfecher
Copy link
Contributor

@jfecher jfecher commented Aug 19, 2024

Description

Problem*

Resolves #5027

(#5501 is a separate issue)

Summary*

Due to other optimizations out of the user's control, programs can often end up with sequences resembling the following:

    v90208 = array_set v90198, index v90207, value u8 0
    v90210 = add v90203, u64 1
    v90211 = mul v90098, v90204
    v90212 = array_get v90208, index v90207
    v90213 = array_get v90198, index v90207

Where we have an array_set that cannot be made mutable because there is still an array_get later to the unmutated array.

To fix this, I've added another pass to move array gets as early as possible in the program. The earliest they can be moved is always just after their last ValueId dependency was inserted. This usually frees up any array_sets to be made mutable afterward.

Additional Context

The program

fn main(foo: [u8; 95], toggle: bool) {
    let size: Field = 93 + toggle as Field * 2;
    let hash = sha256::sha256_var(foo, size as u64);
    println(f"{hash}");
}   

as well as the new test in execution_success/array_get_optimization now have 0 non-mutable array sets.

Not much change from most tests. I think this pattern was mostly a result of the array merge optimization to only merge changed indices since this is the only optimization that inserts array sets.

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@jfecher jfecher requested a review from a team August 19, 2024 20:14
@jfecher jfecher enabled auto-merge August 19, 2024 20:59
@jfecher jfecher disabled auto-merge August 19, 2024 21:00
@jfecher jfecher enabled auto-merge August 19, 2024 21:05
Copy link
Member

@TomAFrench TomAFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes just for some discussion.

I'm a little concerned about this PR as it's similar to my "bubble up" optimisation which does the same operation but for constrain instructions. This resulted in situations where we ended up getting tests failing due to the noir code not being executed in the order which a user would expect.

I'm unsure how long we can maintain this guarantee however so we may need to open discussions on breaking this.

@jfecher
Copy link
Contributor Author

jfecher commented Aug 20, 2024

@TomAFrench I'm going to try changing the array merge optimization directly for now to delay the array sets there instead of moving the array gets later. This is the first approach I tried and thought it didn't work initially but I think now that the noir program was just getting cached from a previous version of the compiler. So I'll try again and hopefully we'll see similar improvements.

I'll also note that the conditional_1 test failure did not occur when I used a smaller constant value for that array merge optimization, like the previous 2. Not sure why that'd be causing the failure.

Copy link
Contributor

github-actions bot commented Aug 20, 2024

Changes to circuit sizes

Generated at commit: 9007d60710a572ada3a0e5105be3e44985058a65, compared to commit: 29bd125314b58e2eac23742ff1de022a97dcc60a

🧾 Summary (10% most significant diffs)

Program ACIR opcodes (+/-) % Circuit size (+/-) %
fold_numeric_generic_poseidon +30 ❌ +375.00% +103 ❌ +792.31%
fold_call_witness_condition +24 ❌ +480.00% +80 ❌ +727.27%
regression +359 ❌ +227.22% +1,390 ❌ +37.86%

Full diff report 👇
Program ACIR opcodes (+/-) % Circuit size (+/-) %
fold_numeric_generic_poseidon 38 (+30) +375.00% 116 (+103) +792.31%
fold_call_witness_condition 29 (+24) +480.00% 91 (+80) +727.27%
regression 517 (+359) +227.22% 5,061 (+1,390) +37.86%
hashmap 65,664 (-27,719) -29.68% 198,747 (+48,086) +31.92%
debug_logs 63 (+17) +36.96% 73 (+12) +19.67%
no_predicates_numeric_generic_poseidon 83 (+30) +56.60% 1,303 (+103) +8.58%
nested_array_in_slice 993 (-12) -1.19% 5,789 (+340) +6.24%
sha256_var_size_regression 18,472 (-762) -3.96% 77,660 (+3,059) +4.10%
regression_5252 76,794 (-204) -0.26% 86,968 (+3,106) +3.70%
bigint 1,248 (+63) +5.32% 8,349 (+261) +3.23%
conditional_1 11,077 (+6,452) +139.50% 39,785 (+881) +2.26%
array_dynamic 125 (+18) +16.82% 3,757 (+71) +1.93%
conditional_2 35 (+13) +59.09% 2,813 (+46) +1.66%
conditional_regression_661 34 (+4) +13.33% 2,798 (+16) +0.58%
sha256_regression 35,167 (-4,014) -10.24% 204,093 (+988) +0.49%
nested_array_dynamic 3,220 (-602) -15.75% 12,968 (+46) +0.36%
conditional_regression_short_circuit 860 (+11) +1.30% 41,812 (0) 0.00%
sha256 2,393 (-17) -0.71% 47,628 (0) 0.00%
sha256_var_witness_const_regression 1,966 (-17) -0.86% 44,720 (0) 0.00%

@jfecher
Copy link
Contributor Author

jfecher commented Aug 21, 2024

Running into some difficulties experimenting with this PR. Most notably is a difference in semantics between SSA Pre and Post-Flattening.

When optimizing array sets we optimize them if the array and index are known, but we never check the current enable side effects value to see if the array set may be disabled. This is fine Pre-flattening since control flow prevents referring to these directly if it is not what is wanted. Post flattening however it should be valid to optimize an array merge of the following:

b0(v1: [Field; 2], v2: u1):
  enable_side_effects v2
  v3 = array_set v1, index 0, value 1
  v4 = not v2
  enable_side_effects v4
  v5 = array_set v1, index 1, value 2
  enable_side_effects u1 1
  v6 = if v2 then v3 else if v4 then v5

to:

b0(v1: [Field; 2], v2: u1):
  enable_side_effects v2
  v3 = array_set v1, index 0, value 1
  v4 = not v2
  enable_side_effects v4
  v5 = array_set v3, index 1, value 2  // changed v1 to v3
  enable_side_effects u1 1
  // v6 = v5

Since the then branch should be disabled in the else case, we should expect v3 = v1, and can set to that directly instead. Likewise, since the else branch is disabled in the then case, we don't have to worry about v5 changing the result of the merge in that case. This also lets us write mutably in both branches. But I believe this doesn't currently work when v1 is constant because the else branch will be optimized to a constant [_, 2] even if it was not meant to be executed.

@jfecher
Copy link
Contributor Author

jfecher commented Aug 21, 2024

Going to pause work on this PR until #5782 is fixed. I'm also running into it with the current approach in this PR which is fairly similar to what we currently use. The nested dynamic arrays test is failing with this panic currently.

@jfecher
Copy link
Contributor Author

jfecher commented Aug 23, 2024

Hmm I'm surprised changing the array merge optimization to use [array1, array2][condition as u32] is so much less efficient than merging every index of the arrays and creating a new array.

@jfecher
Copy link
Contributor Author

jfecher commented Oct 7, 2024

I'm closing this for now since the results are currently negative. We can retry similar optimizations in another PR later if wanted.

@jfecher jfecher closed this Oct 7, 2024
auto-merge was automatically disabled October 7, 2024 14:05

Pull request was closed

@TomAFrench TomAFrench deleted the jf/move-array-gets branch October 7, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conditional RAM writes scale nonlinearly in loops
3 participants