Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witness-based pruning #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Witness-based pruning #22

wants to merge 3 commits into from

Conversation

qhho
Copy link

@qhho qhho commented Jul 29, 2024

This implements witness-based pruning to resolve #7 and makes NativeSARSOP closer to C++ SARSOP. It is not exactly the one described in the original paper, but is the algorithm implemented in the source code of C++ SARSOP (as much as I am able to parse from it). Performance wise, it is slower for small problems but seems to be just slightly better for larger problems, and consistently has fewer alpha vectors. There is more overhead and some inefficiencies in handling witnesses in the alpha vector struct (as sets) and checking for duplicate alpha vectors during backups. There should be better ways of implementing these changes if anyone has suggestions.

TigerPOMDP

Benchmarks

Old
BenchmarkTools.Trial: 68 samples with 1 evaluation.
 Range (min  max):  62.845 ms  167.386 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     69.542 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.133 ms ±  13.920 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂▅▅█ ▂  ▅                                                 
  ▅█▅▅████▅█████▁▅▁▅██▅█▁█▅▁▁▅▁█▁▁▅▅█▅▁▅▁▁▁▅▅▅▅█▁▁▁▁▁▁▅▅▅▅▁▁▁▅ ▁
  62.8 ms         Histogram: frequency by time         92.4 ms <

 Memory estimate: 4.62 MiB, allocs estimate: 38825.
New
BenchmarkTools.Trial: 42 samples with 1 evaluation.
 Range (min  max):  111.048 ms  341.475 ms  ┊ GC (min  max): 0.00%  66.82%
 Time  (median):     114.236 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   120.121 ms ±  35.050 ms  ┊ GC (mean ± σ):  4.52% ± 10.31%

  █▄                                                             
  ██▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  111 ms           Histogram: frequency by time          341 ms <

 Memory estimate: 6.92 MiB, allocs estimate: 46114.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.03       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.04       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.05       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.06       50         19.3713674   19.3833253   0.0119579046    6          212       
--------------------------------------------------------------------------------------
 0.07       57         19.3713684   19.3722266   0.0008581957    6          467       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   6          44        
 0.01       10         14.2571940   51.5049954   37.2478014705   5          464       
 0.03       20         18.6212128   33.7498003   15.1285874225   5          649       
 0.04       30         19.3534022   21.8312080   2.4778057485    5          657       
 0.06       40         19.3709833   19.6684906   0.2975072542    5          463       
 0.07       50         19.3713674   19.3833253   0.0119579050    5          212       
--------------------------------------------------------------------------------------
 0.09       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

BabyPOMDP

Benchmarks

Old

BenchmarkTools.Trial: 2987 samples with 1 evaluation.
 Range (min  max):  1.168 ms  203.059 ms  ┊ GC (min  max): 0.00%  99.16%
 Time  (median):     1.478 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.664 ms ±   3.873 ms  ┊ GC (mean ± σ):  5.13% ±  5.00%

  ▄█▄▃▂▃▇▇▆▃▁                                                  
  ███████████▇▆▇▇▆▆▄▅▄▄▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▁▂ ▄
  1.17 ms         Histogram: frequency by time        3.61 ms <

 Memory estimate: 316.27 KiB, allocs estimate: 2914.

New

BenchmarkTools.Trial: 1235 samples with 1 evaluation.
 Range (min  max):  3.438 ms  255.760 ms  ┊ GC (min  max): 0.00%  98.23%
 Time  (median):     3.608 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.030 ms ±   7.191 ms  ┊ GC (mean ± σ):  6.68% ±  7.00%

  ▆█▃                                                          
  ███▇▅▄▄▄▄▄▄▅▄▅▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▂▂▂▂▁▂▁▂▁▁▁▁▂▁▂▁▂▁▂ ▃
  3.44 ms         Histogram: frequency by time        6.81 ms <

 Memory estimate: 1.20 MiB, allocs estimate: 6195.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
--------------------------------------------------------------------------------------
 0.00       15         -16.3054861  -16.2964195  0.0090665468    2          107       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
--------------------------------------------------------------------------------------
 0.00       15         -16.3054861  -16.2964195  0.0090665468    2          107       
--------------------------------------------------------------------------------------

TMaze

Benchmarks

Old

BenchmarkTools.Trial: 397 samples with 1 evaluation.
 Range (min  max):   9.103 ms  222.194 ms  ┊ GC (min  max): 0.00%  93.96%
 Time  (median):     11.392 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.590 ms ±  11.510 ms  ┊ GC (mean ± σ):  5.88% ±  7.59%

   ▅▆▅▂█▃▆▃▅▅▆▂  ▂   ▂▅                                         
  ▅████████████▅▇█▅█▅███▇▅▁▃▁▄▃▃▃▃▁▃▃▃▄▁▁▁▁▁▁▃▁▃▁▃▁▁▃▁▁▃▁▃▁▁▃▃ ▄
  9.1 ms          Histogram: frequency by time         22.3 ms <

 Memory estimate: 2.05 MiB, allocs estimate: 18813.

New

BenchmarkTools.Trial: 474 samples with 1 evaluation.
 Range (min  max):   9.441 ms  237.742 ms  ┊ GC (min  max): 0.00%  95.73%
 Time  (median):      9.888 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.535 ms ±  10.514 ms  ┊ GC (mean ± σ):  6.15% ±  7.63%

  ▄▇██▇▆▃                                                       
  ████████▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▅▄▁▁▅▁▁▁▁▅ ▇
  9.44 ms       Histogram: log(frequency) by time        17 ms <

 Memory estimate: 1.74 MiB, allocs estimate: 16497.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          1.6841814    3.7931544    2.1089729930    8          47        
 0.00       10         3.4796452    3.5431555    0.0635102340    31         126       
 0.01       20         3.4796452    3.5077083    0.0280630970    21         22        
 0.01       30         3.4796452    3.4924134    0.0127681326    28         33        
 0.01       40         3.4796452    3.4923511    0.0127058956    28         49        
 0.01       50         3.4796452    3.4923511    0.0127058956    29         53        
--------------------------------------------------------------------------------------
 0.01       56         3.4861897    3.4868059    0.0006162392    47         64        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          1.6841814    3.7931544    2.1089729930    10         47        
 0.00       10         3.4764057    3.5431555    0.0667497306    44         127       
 0.00       20         3.4764057    3.5077083    0.0313025936    22         68        
 0.01       30         3.4764057    3.4924134    0.0160076292    26         25        
 0.01       40         3.4861897    3.4868059    0.0006162392    40         48        
--------------------------------------------------------------------------------------
 0.01       41         3.4861897    3.4868059    0.0006162392    40         48        
--------------------------------------------------------------------------------------

RockSamplePOMDP(4,4)

Benchmarks

Old

BenchmarkTools.Trial: 25 samples with 1 evaluation.
 Range (min  max):  186.135 ms  414.985 ms  ┊ GC (min  max): 0.00%  50.48%
 Time  (median):     192.365 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   207.395 ms ±  46.439 ms  ┊ GC (mean ± σ):  4.04% ± 10.10%

  ██  ▂                                                          
  ██▆▁█▄▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  186 ms           Histogram: frequency by time          415 ms <

 Memory estimate: 11.94 MiB, allocs estimate: 70926.

New

BenchmarkTools.Trial: 20 samples with 1 evaluation.
 Range (min  max):  245.301 ms  321.758 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     257.086 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   261.746 ms ±  16.178 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁      ▄█    ▁▁                                                
  █▁▁▁▁▆▁██▁▆▆▆██▁▁▁▁▁▁▆▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  245 ms           Histogram: frequency by time          322 ms <

 Memory estimate: 11.91 MiB, allocs estimate: 77265.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.01       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.02       30         16.9264164   17.5412846   0.6148682461    75         138       
 0.03       40         16.9264164   17.5262480   0.5998316116    85         183       
 0.03       50         16.9264164   17.4972456   0.5708292429    93         216       
 0.04       60         16.9264164   17.3934666   0.4670502289    100        251       
 0.05       70         16.9264164   17.2978152   0.3713987950    106        270       
 0.06       80         16.9264164   17.2435452   0.3171288444    112        301       
 0.07       90         16.9264164   17.1895543   0.2631379708    115        332       
 0.08       100        16.9264164   17.1524727   0.2260562745    123        336       
 0.09       110        16.9264164   17.1327606   0.2063442023    132        352       
 0.10       120        16.9264164   17.1447989   0.2183825490    123        366       
 0.11       130        16.9264164   17.1027266   0.1763102212    133        381       
 0.12       140        16.9264164   17.0683429   0.1419265621    125        397       
 0.14       150        16.9264164   17.0584752   0.1320588038    134        411       
 0.15       160        16.9264164   17.1499481   0.2235316880    131        419       
 0.16       170        16.9264164   17.0295747   0.1031583490    137        379       
 0.17       180        16.9264164   17.0108753   0.0844589292    125        391       
 0.18       190        16.9264164   17.0056131   0.0791966858    130        399       
 0.19       200        16.9264164   16.9873975   0.0609810952    128        391       
 0.21       210        16.9264164   16.9797309   0.0533145265    131        400       
 0.22       220        16.9264164   16.9631314   0.0367150318    125        357       
 0.23       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.23       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.24       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.25       260        16.9264164   16.9318942   0.0054778577    125        95        
--------------------------------------------------------------------------------------
 0.25       269        16.9264164   16.9264164   -0.0000000000   127        42        
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    12         14        
 0.01       10         16.9264164   18.1564052   1.2299888405    41         86        
 0.01       20         16.9264164   17.6433978   0.7169814681    56         115       
 0.02       30         16.9264164   17.5412846   0.6148682461    70         140       
 0.03       40         16.9264164   17.5262480   0.5998316116    82         185       
 0.03       50         16.9264164   17.4972456   0.5708292429    86         218       
 0.04       60         16.9264164   17.3934666   0.4670502289    90         253       
 0.05       70         16.9264164   17.3167306   0.3903141979    101        276       
 0.06       80         16.9264164   17.2502914   0.3238750701    105        308       
 0.07       90         16.9264164   17.1895543   0.2631379708    108        340       
 0.08       100        16.9264164   17.1493766   0.2229601837    111        351       
 0.09       110        16.9264164   17.1319633   0.2055469699    109        362       
 0.10       120        16.9264164   17.1176998   0.1912833743    111        383       
 0.11       130        16.9264164   17.1022122   0.1757958335    115        399       
 0.12       140        16.9264164   17.0764918   0.1500754354    116        425       
 0.13       150        16.9264164   17.0646959   0.1382795324    116        444       
 0.14       160        16.9264164   17.0499211   0.1235047253    116        456       
 0.15       170        16.9264164   17.0609044   0.1344880352    116        456       
 0.16       180        16.9264164   17.0247853   0.0983689543    116        434       
 0.17       190        16.9264164   17.0000852   0.0736688062    116        446       
 0.18       200        16.9264164   16.9944011   0.0679846793    116        450       
 0.19       210        16.9264164   16.9791179   0.0527014995    116        414       
 0.20       220        16.9264164   16.9571341   0.0307177075    116        416       
 0.21       230        16.9264164   16.9500348   0.0236184556    116        427       
 0.22       240        16.9264164   16.9425263   0.0161098746    116        413       
 0.23       250        16.9264164   16.9388451   0.0124287118    116        347       
 0.23       260        16.9264164   16.9319422   0.0055258671    116        258       
 0.24       270        16.9264164   16.9305489   0.0041325721    116        258       
 0.24       280        16.9264164   16.9291465   0.0027301069    116        159       
--------------------------------------------------------------------------------------
 0.25       288        16.9264164   16.9273809   0.0009645004    116        143       
--------------------------------------------------------------------------------------

RockSample(10,10)

Benchmarks

Old

BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  2.407 s    2.517 s  ┊ GC (min  max): 19.71%  13.22%
 Time  (median):     2.418 s              ┊ GC (median):    19.62%
 Time  (mean ± σ):   2.447 s ± 60.454 ms  ┊ GC (mean ± σ):  17.97% ±  4.24%

  █    █                                                  █  
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.41 s         Histogram: frequency by time        2.52 s <

 Memory estimate: 554.44 MiB, allocs estimate: 410948.

New

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  3.343 s     3.661 s  ┊ GC (min  max): 0.00%  12.22%
 Time  (median):     3.502 s               ┊ GC (median):    6.39%
 Time  (mean ± σ):   3.502 s ± 225.087 ms  ┊ GC (mean ± σ):  6.39% ±  8.64%

  █                                                        █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.34 s         Histogram: frequency by time         3.66 s <

 Memory estimate: 558.52 MiB, allocs estimate: 412010.

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.26       0          16.6787251   27.4351214   10.7563963155   37         41        
 24.13      10         19.8417349   26.0633653   6.2216303645    352        389       
 74.91      20         19.8964883   26.0633653   6.1668770392    632        729       
--------------------------------------------------------------------------------------
 123.37     28         19.9578503   25.5052713   5.5474209984    827        942       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.38       0          16.6787251   27.4351214   10.7563963155   37         41        
 23.07      10         19.8417349   26.0633653   6.2216303645    341        390       
 68.03      20         19.8964883   26.0633653   6.1668770392    589        730       
--------------------------------------------------------------------------------------
 120.98     29         19.9578503   25.4649131   5.5070627736    786        975       
--------------------------------------------------------------------------------------

RockSample(15,10)

Policy Run

Old

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 2.09       0          12.9959395   19.2029357   6.2069962050    32         34        
 46.92      10         15.5546534   18.6907124   3.1360589608    367        457       
 115.83     20         15.7358298   18.5206812   2.7848514072    629        808       
--------------------------------------------------------------------------------------
 127.09     22         15.7358298   18.5206812   2.7848514072    620        840       
--------------------------------------------------------------------------------------

New

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 2.36       0          12.9959395   19.2029357   6.2069962050    31         34        
 47.64      10         15.5546534   18.6907124   3.1360589608    361        457       
 113.82     20         15.7358298   18.5206812   2.7848514072    590        809       
--------------------------------------------------------------------------------------
 124.41     22         15.7358298   18.5206812   2.7848514072    588        841       
--------------------------------------------------------------------------------------

@qhho qhho requested a review from WhiffleFish July 29, 2024 04:39
@zsunberg
Copy link
Member

Is it possible to keep multiple pruning strategies as options? (I have not looked at the code, so perhaps you have already implemented that?)

@qhho
Copy link
Author

qhho commented Jul 30, 2024

Is it possible to keep multiple pruning strategies as options? (I have not looked at the code, so perhaps you have already implemented that?)

We can definitely do that. I briefly discussed with @WhiffleFish and we thought that having multiple pruning strategies isn't necessary if one performs better. I think with some code optimization, we can get witness-based pruning to generally perform better than the current pruning technique. That said, the original SARSOP paper does not mention witness-based pruning even though the code implementation uses it, so it might be a good idea to have both as options for completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Witness-based pruning
2 participants