Witness-based pruning #22

qhho · 2024-07-29T04:39:16Z

This implements witness-based pruning to resolve #7 and makes NativeSARSOP closer to C++ SARSOP. It is not exactly the one described in the original paper, but is the algorithm implemented in the source code of C++ SARSOP (as much as I am able to parse from it). Performance wise, it is slower for small problems but seems to be just slightly better for larger problems, and consistently has fewer alpha vectors. There is more overhead and some inefficiencies in handling witnesses in the alpha vector struct (as sets) and checking for duplicate alpha vectors during backups. There should be better ways of implementing these changes if anyone has suggestions.

TigerPOMDP

Benchmarks

Old

BenchmarkTools.Trial: 68 samples with 1 evaluation.
 Range (min … max):  62.845 ms … 167.386 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     69.542 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.133 ms ±  13.920 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▅▅█ ▂  ▅                                                 
  ▅█▅▅████▅█████▁▅▁▅██▅█▁█▅▁▁▅▁█▁▁▅▅█▅▁▅▁▁▁▅▅▅▅█▁▁▁▁▁▁▅▅▅▅▁▁▁▅ ▁
  62.8 ms         Histogram: frequency by time         92.4 ms <

 Memory estimate: 4.62 MiB, allocs estimate: 38825.

New

BenchmarkTools.Trial: 42 samples with 1 evaluation.
 Range (min … max):  111.048 ms … 341.475 ms  ┊ GC (min … max): 0.00% … 66.82%
 Time  (median):     114.236 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   120.121 ms ±  35.050 ms  ┊ GC (mean ± σ):  4.52% ± 10.31%

  █▄                                                             
  ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  111 ms           Histogram: frequency by time          341 ms <

 Memory estimate: 6.92 MiB, allocs estimate: 46114.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.03       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.04       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.05       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.06       50         19.3713674   19.3833253   0.0119579046    6          212       
--------------------------------------------------------------------------------------
 0.07       57         19.3713684   19.3722266   0.0008581957    6          467       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   6          44        
 0.01       10         14.2571940   51.5049954   37.2478014705   5          464       
 0.03       20         18.6212128   33.7498003   15.1285874225   5          649       
 0.04       30         19.3534022   21.8312080   2.4778057485    5          657       
 0.06       40         19.3709833   19.6684906   0.2975072542    5          463       
 0.07       50         19.3713674   19.3833253   0.0119579050    5          212       
--------------------------------------------------------------------------------------
 0.09       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

BabyPOMDP

Benchmarks

Old

BenchmarkTools.Trial: 2987 samples with 1 evaluation.
 Range (min … max):  1.168 ms … 203.059 ms  ┊ GC (min … max): 0.00% … 99.16%
 Time  (median):     1.478 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.664 ms ±   3.873 ms  ┊ GC (mean ± σ):  5.13% ±  5.00%

  ▄█▄▃▂▃▇▇▆▃▁                                                  
  ███████████▇▆▇▇▆▆▄▅▄▄▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▁▂ ▄
  1.17 ms         Histogram: frequency by time        3.61 ms <

 Memory estimate: 316.27 KiB, allocs estimate: 2914.

New

BenchmarkTools.Trial: 1235 samples with 1 evaluation.
 Range (min … max):  3.438 ms … 255.760 ms  ┊ GC (min … max): 0.00% … 98.23%
 Time  (median):     3.608 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.030 ms ±   7.191 ms  ┊ GC (mean ± σ):  6.68% ±  7.00%

  ▆█▃                                                          
  ███▇▅▄▄▄▄▄▄▅▄▅▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▂▂▂▂▁▂▁▂▁▁▁▁▂▁▂▁▂▁▂ ▃
  3.44 ms         Histogram: frequency by time        6.81 ms <

 Memory estimate: 1.20 MiB, allocs estimate: 6195.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
--------------------------------------------------------------------------------------
 0.00       15         -16.3054861  -16.2964195  0.0090665468    2          107       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
--------------------------------------------------------------------------------------
 0.00       15         -16.3054861  -16.2964195  0.0090665468    2          107       
--------------------------------------------------------------------------------------

TMaze

Benchmarks

Old

BenchmarkTools.Trial: 397 samples with 1 evaluation.
 Range (min … max):   9.103 ms … 222.194 ms  ┊ GC (min … max): 0.00% … 93.96%
 Time  (median):     11.392 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.590 ms ±  11.510 ms  ┊ GC (mean ± σ):  5.88% ±  7.59%

   ▅▆▅▂█▃▆▃▅▅▆▂  ▂   ▂▅                                         
  ▅████████████▅▇█▅█▅███▇▅▁▃▁▄▃▃▃▃▁▃▃▃▄▁▁▁▁▁▁▃▁▃▁▃▁▁▃▁▁▃▁▃▁▁▃▃ ▄
  9.1 ms          Histogram: frequency by time         22.3 ms <

 Memory estimate: 2.05 MiB, allocs estimate: 18813.

New

BenchmarkTools.Trial: 474 samples with 1 evaluation.
 Range (min … max):   9.441 ms … 237.742 ms  ┊ GC (min … max): 0.00% … 95.73%
 Time  (median):      9.888 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.535 ms ±  10.514 ms  ┊ GC (mean ± σ):  6.15% ±  7.63%

  ▄▇██▇▆▃                                                       
  ████████▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▅▄▁▁▅▁▁▁▁▅ ▇
  9.44 ms       Histogram: log(frequency) by time        17 ms <

 Memory estimate: 1.74 MiB, allocs estimate: 16497.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          1.6841814    3.7931544    2.1089729930    8          47        
 0.00       10         3.4796452    3.5431555    0.0635102340    31         126       
 0.01       20         3.4796452    3.5077083    0.0280630970    21         22        
 0.01       30         3.4796452    3.4924134    0.0127681326    28         33        
 0.01       40         3.4796452    3.4923511    0.0127058956    28         49        
 0.01       50         3.4796452    3.4923511    0.0127058956    29         53        
--------------------------------------------------------------------------------------
 0.01       56         3.4861897    3.4868059    0.0006162392    47         64        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          1.6841814    3.7931544    2.1089729930    10         47        
 0.00       10         3.4764057    3.5431555    0.0667497306    44         127       
 0.00       20         3.4764057    3.5077083    0.0313025936    22         68        
 0.01       30         3.4764057    3.4924134    0.0160076292    26         25        
 0.01       40         3.4861897    3.4868059    0.0006162392    40         48        
--------------------------------------------------------------------------------------
 0.01       41         3.4861897    3.4868059    0.0006162392    40         48        
--------------------------------------------------------------------------------------

RockSamplePOMDP(4,4)

Benchmarks

Old

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min … max):  186.135 ms … 414.985 ms  ┊ GC (min … max): 0.00% … 50.48%
 Time  (median):     192.365 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   207.395 ms ±  46.439 ms  ┊ GC (mean ± σ):  4.04% ± 10.10%

  ██  ▂                                                          
  ██▆▁█▄▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  186 ms           Histogram: frequency by time          415 ms <

 Memory estimate: 11.94 MiB, allocs estimate: 70926.

New

BenchmarkTools.Trial: 20 samples with 1 evaluation.
 Range (min … max):  245.301 ms … 321.758 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     257.086 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   261.746 ms ±  16.178 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁      ▄█    ▁▁                                                
  █▁▁▁▁▆▁██▁▆▆▆██▁▁▁▁▁▁▆▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  245 ms           Histogram: frequency by time          322 ms <

 Memory estimate: 11.91 MiB, allocs estimate: 77265.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.01       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.02       30         16.9264164   17.5412846   0.6148682461    75         138       
 0.03       40         16.9264164   17.5262480   0.5998316116    85         183       
 0.03       50         16.9264164   17.4972456   0.5708292429    93         216       
 0.04       60         16.9264164   17.3934666   0.4670502289    100        251       
 0.05       70         16.9264164   17.2978152   0.3713987950    106        270       
 0.06       80         16.9264164   17.2435452   0.3171288444    112        301       
 0.07       90         16.9264164   17.1895543   0.2631379708    115        332       
 0.08       100        16.9264164   17.1524727   0.2260562745    123        336       
 0.09       110        16.9264164   17.1327606   0.2063442023    132        352       
 0.10       120        16.9264164   17.1447989   0.2183825490    123        366       
 0.11       130        16.9264164   17.1027266   0.1763102212    133        381       
 0.12       140        16.9264164   17.0683429   0.1419265621    125        397       
 0.14       150        16.9264164   17.0584752   0.1320588038    134        411       
 0.15       160        16.9264164   17.1499481   0.2235316880    131        419       
 0.16       170        16.9264164   17.0295747   0.1031583490    137        379       
 0.17       180        16.9264164   17.0108753   0.0844589292    125        391       
 0.18       190        16.9264164   17.0056131   0.0791966858    130        399       
 0.19       200        16.9264164   16.9873975   0.0609810952    128        391       
 0.21       210        16.9264164   16.9797309   0.0533145265    131        400       
 0.22       220        16.9264164   16.9631314   0.0367150318    125        357       
 0.23       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.23       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.24       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.25       260        16.9264164   16.9318942   0.0054778577    125        95        
--------------------------------------------------------------------------------------
 0.25       269        16.9264164   16.9264164   -0.0000000000   127        42        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    12         14        
 0.01       10         16.9264164   18.1564052   1.2299888405    41         86        
 0.01       20         16.9264164   17.6433978   0.7169814681    56         115       
 0.02       30         16.9264164   17.5412846   0.6148682461    70         140       
 0.03       40         16.9264164   17.5262480   0.5998316116    82         185       
 0.03       50         16.9264164   17.4972456   0.5708292429    86         218       
 0.04       60         16.9264164   17.3934666   0.4670502289    90         253       
 0.05       70         16.9264164   17.3167306   0.3903141979    101        276       
 0.06       80         16.9264164   17.2502914   0.3238750701    105        308       
 0.07       90         16.9264164   17.1895543   0.2631379708    108        340       
 0.08       100        16.9264164   17.1493766   0.2229601837    111        351       
 0.09       110        16.9264164   17.1319633   0.2055469699    109        362       
 0.10       120        16.9264164   17.1176998   0.1912833743    111        383       
 0.11       130        16.9264164   17.1022122   0.1757958335    115        399       
 0.12       140        16.9264164   17.0764918   0.1500754354    116        425       
 0.13       150        16.9264164   17.0646959   0.1382795324    116        444       
 0.14       160        16.9264164   17.0499211   0.1235047253    116        456       
 0.15       170        16.9264164   17.0609044   0.1344880352    116        456       
 0.16       180        16.9264164   17.0247853   0.0983689543    116        434       
 0.17       190        16.9264164   17.0000852   0.0736688062    116        446       
 0.18       200        16.9264164   16.9944011   0.0679846793    116        450       
 0.19       210        16.9264164   16.9791179   0.0527014995    116        414       
 0.20       220        16.9264164   16.9571341   0.0307177075    116        416       
 0.21       230        16.9264164   16.9500348   0.0236184556    116        427       
 0.22       240        16.9264164   16.9425263   0.0161098746    116        413       
 0.23       250        16.9264164   16.9388451   0.0124287118    116        347       
 0.23       260        16.9264164   16.9319422   0.0055258671    116        258       
 0.24       270        16.9264164   16.9305489   0.0041325721    116        258       
 0.24       280        16.9264164   16.9291465   0.0027301069    116        159       
--------------------------------------------------------------------------------------
 0.25       288        16.9264164   16.9273809   0.0009645004    116        143       
--------------------------------------------------------------------------------------

RockSample(10,10)

Benchmarks

Old

BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  2.407 s …   2.517 s  ┊ GC (min … max): 19.71% … 13.22%
 Time  (median):     2.418 s              ┊ GC (median):    19.62%
 Time  (mean ± σ):   2.447 s ± 60.454 ms  ┊ GC (mean ± σ):  17.97% ±  4.24%

  █    █                                                  █  
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.41 s         Histogram: frequency by time        2.52 s <

 Memory estimate: 554.44 MiB, allocs estimate: 410948.

New

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.343 s …    3.661 s  ┊ GC (min … max): 0.00% … 12.22%
 Time  (median):     3.502 s               ┊ GC (median):    6.39%
 Time  (mean ± σ):   3.502 s ± 225.087 ms  ┊ GC (mean ± σ):  6.39% ±  8.64%

  █                                                        █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.34 s         Histogram: frequency by time         3.66 s <

 Memory estimate: 558.52 MiB, allocs estimate: 412010.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.26       0          16.6787251   27.4351214   10.7563963155   37         41        
 24.13      10         19.8417349   26.0633653   6.2216303645    352        389       
 74.91      20         19.8964883   26.0633653   6.1668770392    632        729       
--------------------------------------------------------------------------------------
 123.37     28         19.9578503   25.5052713   5.5474209984    827        942       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.38       0          16.6787251   27.4351214   10.7563963155   37         41        
 23.07      10         19.8417349   26.0633653   6.2216303645    341        390       
 68.03      20         19.8964883   26.0633653   6.1668770392    589        730       
--------------------------------------------------------------------------------------
 120.98     29         19.9578503   25.4649131   5.5070627736    786        975       
--------------------------------------------------------------------------------------

RockSample(15,10)

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 2.09       0          12.9959395   19.2029357   6.2069962050    32         34        
 46.92      10         15.5546534   18.6907124   3.1360589608    367        457       
 115.83     20         15.7358298   18.5206812   2.7848514072    629        808       
--------------------------------------------------------------------------------------
 127.09     22         15.7358298   18.5206812   2.7848514072    620        840       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 2.36       0          12.9959395   19.2029357   6.2069962050    31         34        
 47.64      10         15.5546534   18.6907124   3.1360589608    361        457       
 113.82     20         15.7358298   18.5206812   2.7848514072    590        809       
--------------------------------------------------------------------------------------
 124.41     22         15.7358298   18.5206812   2.7848514072    588        841       
--------------------------------------------------------------------------------------

zsunberg · 2024-07-29T15:30:36Z

Is it possible to keep multiple pruning strategies as options? (I have not looked at the code, so perhaps you have already implemented that?)

qhho · 2024-07-30T01:32:31Z

Is it possible to keep multiple pruning strategies as options? (I have not looked at the code, so perhaps you have already implemented that?)

We can definitely do that. I briefly discussed with @WhiffleFish and we thought that having multiple pruning strategies isn't necessary if one performs better. I think with some code optimization, we can get witness-based pruning to generally perform better than the current pruning technique. That said, the original SARSOP paper does not mention witness-based pruning even though the code implementation uses it, so it might be a good idea to have both as options for completeness.

qhho added 2 commits July 27, 2024 13:30

witness pruning with sets in alpha vectors

ba8c452

duplicate check during backup

3e10a8c

qhho requested a review from WhiffleFish July 29, 2024 04:39

fix bracket in pruning check short-circuiting

f6a2fcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Witness-based pruning #22

Witness-based pruning #22

qhho commented Jul 29, 2024 •

edited

Loading

zsunberg commented Jul 29, 2024

qhho commented Jul 30, 2024

Witness-based pruning #22

Are you sure you want to change the base?

Witness-based pruning #22

Conversation

qhho commented Jul 29, 2024 • edited Loading

TigerPOMDP

Benchmarks

Old

New

Policy Run

Old

New

BabyPOMDP

Benchmarks

Old

New

Policy Run

Old

New

TMaze

Benchmarks

Old

New

Policy Run

Old

New

RockSamplePOMDP(4,4)

Benchmarks

Old

New

Policy Run

Old

New

RockSample(10,10)

Benchmarks

Old

New

Policy Run

Old

New

RockSample(15,10)

Policy Run

Old

New

zsunberg commented Jul 29, 2024

qhho commented Jul 30, 2024

qhho commented Jul 29, 2024 •

edited

Loading