Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added belief space binning #18

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Added belief space binning #18

wants to merge 9 commits into from

Conversation

dylan-asmar
Copy link
Member

@dylan-asmar dylan-asmar commented Jun 27, 2024

Addresses/closes #3

The only real improvement happens on the larger problems (Tag and RockSample(15, 10)) with longer solve times. Here are some benchmarks/performance data. I set the default to use_binning=true, but I don't have any strong feelings about that setting.

Delta = 0.1

BabyPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 3062 samples with 1 evaluation.
 Range (min  max):  1.506 ms   44.457 ms  ┊ GC (min  max): 0.00%  96.18%
 Time  (median):     1.597 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.631 ms ± 803.055 μs  ┊ GC (mean ± σ):  1.93% ±  5.36%

              ▁▄▅▃▆▆▃▇▆▆█▆▆▄▂▁▁                                
  ▂▃▆▇▆▅▇▅▆▅▆▇███████████████████▇▆▆▅▆▄▄▄▅▄▃▃▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁ ▄
  1.51 ms         Histogram: frequency by time        1.76 ms <

 Memory estimate: 431.91 KiB, allocs estimate: 3870.

With Binning

BenchmarkTools.Trial: 3441 samples with 1 evaluation.
 Range (min  max):  1.338 ms   45.109 ms  ┊ GC (min  max): 0.00%  96.38%
 Time  (median):     1.416 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.451 ms ± 781.139 μs  ┊ GC (mean ± σ):  1.99% ±  5.35%

   ▁ ▂▃▃▄▂▂▄▂▇▄▆▆█▇▆▆▆▆▇▆▄▃▄▂ ▁                                
  ▆██████████████████████████▇█▇▇▆▅▆▆▅▄▃▃▃▃▂▃▂▂▂▁▂▁▁▁▁▂▁▁▁▁▁▁ ▅
  1.34 ms         Histogram: frequency by time         1.6 ms <

 Memory estimate: 398.05 KiB, allocs estimate: 3148.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   3          30        
 0.00       10         -16.3078063  -16.2760282  0.0317781382    2          67        
 0.00       20         -16.3054842  -16.2999674  0.0055167873    2          86        
 0.00       30         -16.3054833  -16.3043757  0.0011076031    2          96        
--------------------------------------------------------------------------------------
 0.00       32         -16.3054833  -16.3045054  0.0009778971    2          97        
--------------------------------------------------------------------------------------

TigerPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 168 samples with 1 evaluation.
 Range (min  max):  28.393 ms  69.467 ms  ┊ GC (min  max): 0.00%  57.36%
 Time  (median):     29.336 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   29.768 ms ±  3.185 ms  ┊ GC (mean ± σ):  1.29% ±  4.80%

        ▁▄█   ▄▁  ▂▁▃                                          
  ▃▃▃▃▄▇████▆████▆█████▄▅▃▄▁▃▄▁▃▅▁▁▃▃▃▁▁▃▁▁▁▁▁▃▁▁▃▁▁▃▁▁▁▃▁▁▃▅ ▃
  28.4 ms         Histogram: frequency by time        32.5 ms <

 Memory estimate: 4.02 MiB, allocs estimate: 30937.

With Binning

BenchmarkTools.Trial: 145 samples with 1 evaluation.
 Range (min  max):  32.466 ms  72.687 ms  ┊ GC (min  max): 0.00%  53.34%
 Time  (median):     34.143 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   34.539 ms ±  3.316 ms  ┊ GC (mean ± σ):  1.28% ±  4.76%

               ▂▆▂▁▃▁█▃▁▂                                      
  ▃▄▁▁▁▁▃▃▃▇▆█▇██████████▅▄▄▃▁▁▁▁▃▄▁▃▁▁▃▁▁▁▁▁▁▁▁▄▃▁▄▁▃▁▁▁▁▃▁▃ ▃
  32.5 ms         Histogram: frequency by time          38 ms <

 Memory estimate: 4.84 MiB, allocs estimate: 35918.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.03       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.03       50         19.3713674   19.3833253   0.0119579046    6          212       
--------------------------------------------------------------------------------------
 0.04       57         19.3713684   19.3722266   0.0008581957    6          467       
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.03       40         19.3708839   19.8157400   0.4448561113    6          700       
 0.04       50         19.3713655   19.3836390   0.0122735599    6          408       
--------------------------------------------------------------------------------------
 0.06       58         19.3713683   19.3722964   0.0009280409    6          213       
--------------------------------------------------------------------------------------

RockSamplePOMDP(5,5)

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 262 samples with 1 evaluation.
 Range (min  max):  18.404 ms   23.235 ms  ┊ GC (min  max): 0.00%  17.13%
 Time  (median):     18.959 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.136 ms ± 777.933 μs  ┊ GC (mean ± σ):  1.07% ±  3.04%

   ██▆    ▁ ▂▂      ▂                                           
  ▆███▆▄▆▄█████▆▃▃▄▄█▆▆▅▂▂▁▂▂▁▁▂▂▂▂▁▃▁▁▁▂▂▁▃▄▁▁▂▂▁▂▁▁▁▁▂▂▂▁▁▁▃ ▃
  18.4 ms         Histogram: frequency by time         21.9 ms <

 Memory estimate: 4.57 MiB, allocs estimate: 28634.

With Binning

BenchmarkTools.Trial: 253 samples with 1 evaluation.
 Range (min  max):  18.860 ms  30.679 ms  ┊ GC (min  max): 0.00%  9.04%
 Time  (median):     19.460 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.765 ms ±  1.023 ms  ┊ GC (mean ± σ):  1.11% ± 3.07%

       ▂▆▄█▃                                                   
  ▃▃▃▅▅██████▄▄▆▄▅▆▃▁▃▃▄▃▁▁▁▁▁▁▂▁▃▄▂▃▂▂▃▃▂▂▁▁▁▂▂▁▁▂▂▁▁▁▁▂▁▁▁▂ ▃
  18.9 ms         Histogram: frequency by time        22.9 ms <

 Memory estimate: 4.66 MiB, allocs estimate: 28351.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    75         138       
...      
 0.11       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.11       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.12       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.12       260        16.9264164   16.9318942   0.0054778577    125        95        
--------------------------------------------------------------------------------------
 0.12       269        16.9264164   16.9264164   -0.0000000000   127        42        
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         84        
 0.01       20         16.9264164   17.6457466   0.7193302497    65         110       
 0.01       30         16.9264164   17.5514629   0.6250465481    81         136       
...      
 0.13       270        16.9264164   16.9318942   0.0054778577    128        176       
 0.13       280        16.9264164   16.9306333   0.0042169595    128        171       
 0.14       290        16.9264164   16.9280962   0.0016798214    132        169       
 0.14       300        16.9264164   16.9276488   0.0012324281    129        152       
--------------------------------------------------------------------------------------
 0.14       309        16.9264164   16.9273809   0.0009645004    130        108       
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  2.074 s   2.083 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.079 s             ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.078 s ± 4.134 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                           █                          █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.07 s        Histogram: frequency by time        2.08 s <

 Memory estimate: 151.24 MiB, allocs estimate: 847828.

With Binning

BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min  max):  1.762 s   1.768 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.766 s             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.765 s ± 3.045 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                    █                 █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.76 s        Histogram: frequency by time        1.77 s <

 Memory estimate: 135.30 MiB, allocs estimate: 752265.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.14       10         -12.1605923  -4.9155579   7.2450343269    192        345       
 0.42       20         -11.9445215  -5.0085441   6.9359773437    285        545       
 0.81       30         -11.5468985  -5.0615592   6.4853393203    361        773       
...     
 2.52       60         -11.2755644  -5.1570304   6.1185340320    568        1297      
 3.08       70         -11.2132678  -5.2020310   6.0112367924    617        1446      
 3.87       80         -11.1744215  -5.2321641   5.9422574180    632        1578      
 4.31       90         -11.1744215  -5.2645196   5.9099018818    703        1675      
--------------------------------------------------------------------------------------
 5.02       99         -11.1403313  -5.2894235   5.8509077878    730        1810      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   19         47        
 0.13       10         -12.1966225  -4.9149882   7.2816342861    191        312       
 0.41       20         -11.7904714  -5.0001015   6.7903698305    296        505       
 0.81       30         -11.6792231  -5.0526929   6.6265301495    371        675       
...     
 2.85       80         -11.5218232  -5.2266499   6.2951732908    504        1293      
 3.37       90         -11.4785210  -5.2482303   6.2302906807    508        1416      
 4.00       100        -11.4415221  -5.2732583   6.1682637954    529        1563      
 4.52       110        -11.3926590  -5.2948544   6.0978046278    579        1677      
--------------------------------------------------------------------------------------
 5.16       119        -11.3385556  -5.3150586   6.0234969935    587        1788      
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 180.0 (for policy run)

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   18         48        
 0.19       10         -12.1605923  -4.9155579   7.2450343269    192        345       
 0.47       20         -11.9445215  -5.0085441   6.9359773437    285        545       
 0.86       30         -11.5468985  -5.0615592   6.4853393203    361        773       
 ...    
 163.26     660        -10.8677176  -5.8409155   5.0268020533    1802       8315      
 168.18     670        -10.8677176  -5.8439653   5.0237522546    1779       8425      
 171.76     680        -10.8677176  -5.8518134   5.0159041272    1932       8544      
 176.67     690        -10.8677176  -5.8548322   5.0128853608    1897       8635      
--------------------------------------------------------------------------------------
 180.02     696        -10.8677176  -5.8560566   5.0116610030    1788       8687      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   19         47        
 0.13       10         -12.1966225  -4.9149882   7.2816342861    191        312       
 0.41       20         -11.7904714  -5.0001015   6.7903698305    296        505       
 0.73       30         -11.6792231  -5.0526929   6.6265301495    371        675       
 ...    
 165.20     750        -10.8623554  -5.8514410   5.0109144371    1768       8299      
 170.07     760        -10.8623554  -5.8564002   5.0059551836    1736       8390      
 175.11     770        -10.8623554  -5.8607945   5.0015608763    1716       8495      
 178.47     780        -10.8623554  -5.8666526   4.9957028166    1839       8582      
--------------------------------------------------------------------------------------
 181.38     785        -10.8623554  -5.8674862   4.9948692190    1713       8610      
--------------------------------------------------------------------------------------

RockSamplePOMDP(15,10)

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.1
  • max_steps: 50 (for benchmarking)
  • max_time: 180.0 (for policy run)
  • init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])
  • init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.16       0          14.9526422   18.9521036   3.9994614354    33         36        
 20.54      10         15.5252008   18.4262322   2.9010313613    301        472       
 50.05      20         15.6821109   18.2229256   2.5408147418    531        819       
 95.07      30         15.7936995   18.0692991   2.2755996353    771        1124      
 156.85     40         15.8384434   18.0184855   2.1800421431    1049       1502      
--------------------------------------------------------------------------------------
 184.67     45         15.8634089   18.0044985   2.1410895386    1191       1659      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.11       0          14.1789401   18.9521036   4.7731635410    29         32        
 21.63      10         15.3435740   18.4262322   3.0826581548    389        452       
 54.38      20         15.7668437   18.2229256   2.4560819561    627        775       
 101.56     30         15.9182769   18.1103209   2.1920440784    812        1075      
 169.94     40         16.0142970   18.0180518   2.0037548790    994        1437      
--------------------------------------------------------------------------------------
 185.34     44         16.0330587   18.0044985   1.9714397820    1091       1522      
--------------------------------------------------------------------------------------

Delta = 0.0001

BabyPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 3026 samples with 1 evaluation.
 Range (min  max):  1.505 ms  45.776 ms  ┊ GC (min  max): 0.00%  96.53%
 Time  (median):     1.567 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.650 ms ±  1.221 ms  ┊ GC (mean ± σ):  1.86% ±  5.20%

  █▂▁                                                         
  ███▇▆▄▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▁▂▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
  1.51 ms        Histogram: frequency by time        3.46 ms <

 Memory estimate: 431.91 KiB, allocs estimate: 3870.

With Binning

BenchmarkTools.Trial: 3466 samples with 1 evaluation.
 Range (min  max):  1.325 ms   45.407 ms  ┊ GC (min  max): 0.00%  96.55%
 Time  (median):     1.404 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.440 ms ± 772.445 μs  ┊ GC (mean ± σ):  1.95% ±  5.26%

   ▃▄▂▁▂▂▅▄▂▆▅▅█▅▄▆▂▃▁▆▃▂▂▁▃▁▁▁▁                               
  ▄█████████████████████████████▅▆▆▆▄▅▅▄▅▄▄▃▃▂▂▂▃▂▂▂▂▂▂▁▂▁▁▁▁ ▅
  1.33 ms         Histogram: frequency by time         1.6 ms <

 Memory estimate: 398.05 KiB, allocs estimate: 3148.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3057342  -16.2819897  0.0237444537    2          98        
 0.00       20         -16.3054833  -16.3024060  0.0030772894    2          133       
--------------------------------------------------------------------------------------
 0.00       28         -16.3054833  -16.3045949  0.0008883805    2          134       
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -28.3501795  -15.6037314  12.7464480985   2          30        
 0.00       10         -16.3078063  -16.2760282  0.0317781382    2          67        
 0.00       20         -16.3054842  -16.2999674  0.0055167873    2          86        
 0.00       30         -16.3054833  -16.3043757  0.0011076031    2          96        
--------------------------------------------------------------------------------------
 0.00       32         -16.3054833  -16.3045054  0.0009778971    2          97        
--------------------------------------------------------------------------------------

TigerPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 169 samples with 1 evaluation.
 Range (min  max):  28.044 ms  69.881 ms  ┊ GC (min  max): 0.00%  56.25%
 Time  (median):     29.352 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   29.742 ms ±  3.379 ms  ┊ GC (mean ± σ):  1.26% ±  4.69%

      ▂       ▂█▂▃▂ ▄▄▄▇▄▃                                     
  ▃▃▆▆█▃█▅▇▇▆▇█████████████▆▅▅▆▃▆▅▃▅▃▃▁▃▅▁▃▁▁▃▁▁▁▃▃▅▁▁▅▁▁▃▃▁▃ ▃
  28 ms           Histogram: frequency by time        32.2 ms <

 Memory estimate: 4.02 MiB, allocs estimate: 30935.

With Binning

BenchmarkTools.Trial: 146 samples with 1 evaluation.
 Range (min  max):  33.079 ms  73.473 ms  ┊ GC (min  max): 0.00%  53.42%
 Time  (median):     33.803 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   34.312 ms ±  3.365 ms  ┊ GC (mean ± σ):  1.27% ±  4.73%

      ▆▁█▂▃ ▁▄ ▁                                               
  ▄▁▅▆█████▅████▁▄▅▄▇▄▄▄▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▃▃▁▁▁▁▃▄▃▁▃▁▁▁▃▁▁▁▁▁▁▃ ▃
  33.1 ms         Histogram: frequency by time        37.8 ms <

 Memory estimate: 4.84 MiB, allocs estimate: 35916.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.03       40         19.3709835   19.6684906   0.2975071211    5          463       
 0.04       50         19.3713674   19.3833253   0.0119579046    5          212       
--------------------------------------------------------------------------------------
 0.04       57         19.3713684   19.3722266   0.0008581957    5          467       
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          -10.7779872  87.0980496   97.8760368879   7          44        
 0.01       10         14.2589622   51.5049954   37.2460332328   5          464       
 0.01       20         18.6214722   33.7498003   15.1283280569   5          649       
 0.02       30         19.3534084   21.8312080   2.4777995367    5          657       
 0.03       40         19.3708839   19.8157400   0.4448561113    5          700       
 0.04       50         19.3713655   19.3836390   0.0122735599    5          408       
--------------------------------------------------------------------------------------
 0.05       58         19.3713683   19.3722964   0.0009280409    5          213       
--------------------------------------------------------------------------------------

RockSamplePOMDP(5,5)

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 234 samples with 1 evaluation.
 Range (min  max):  20.557 ms   25.743 ms  ┊ GC (min  max): 0.00%  17.11%
 Time  (median):     21.142 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.422 ms ± 856.987 μs  ┊ GC (mean ± σ):  1.08% ±  3.03%

       ▁▄█ ▃                                                    
  ▃▃▃▅▆███▆██▄▄▆█▄▆▂▁▂▁▁▁▂▂▁▃▂▁▁▁▂▃▁▁▁▁▁▂▁▁▃▁▁▃▃▂▃▂▂▂▄▁▂▂▁▁▁▁▂ ▃
  20.6 ms         Histogram: frequency by time         24.3 ms <

 Memory estimate: 4.57 MiB, allocs estimate: 28634.

With Binning

BenchmarkTools.Trial: 228 samples with 1 evaluation.
 Range (min  max):  20.784 ms   25.888 ms  ┊ GC (min  max): 0.00%  14.61%
 Time  (median):     21.737 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.926 ms ± 824.955 μs  ┊ GC (mean ± σ):  0.99% ±  2.80%

         ▄▃▃▃▁  ▃█▄                                             
  ▃▄▃▃▄▅███████▇███▅▃▄▃▃▁▁▃▁▁▃▁▃▁▁▁▄▃▄▁▃▁▃▃▃▁▁▁▃▁▃▁▁▁▃▃▁▁▁▁▁▁▃ ▃
  20.8 ms         Histogram: frequency by time         25.5 ms <

 Memory estimate: 4.66 MiB, allocs estimate: 28351.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         85        
 0.01       20         16.9264164   17.6433978   0.7169814681    68         113       
 0.01       30         16.9264164   17.5412846   0.6148682461    75         138       
 ...      
 0.15       230        16.9264164   16.9559859   0.0295695736    125        334       
 0.15       240        16.9264164   16.9499615   0.0235451701    132        333       
 0.16       250        16.9264164   16.9352883   0.0088718960    135        212       
 0.16       260        16.9264164   16.9318942   0.0054778577    105        95        
--------------------------------------------------------------------------------------
 0.16       269        16.9264164   16.9264164   -0.0000000000   85         42        
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.00       0          15.2423183   18.3402991   3.0979807840    13         14        
 0.00       10         16.9264164   18.1564052   1.2299888405    50         84        
 0.01       20         16.9264164   17.6457466   0.7193302497    65         110       
 0.01       30         16.9264164   17.5514629   0.6250465481    81         136       
 ...     
 0.14       270        16.9264164   16.9318942   0.0054778577    127        176       
 0.14       280        16.9264164   16.9306333   0.0042169595    127        171       
 0.15       290        16.9264164   16.9280962   0.0016798214    131        169       
 0.15       300        16.9264164   16.9276488   0.0012324281    128        152       
--------------------------------------------------------------------------------------
 0.15       309        16.9264164   16.9273809   0.0009645004    129        108       
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 5.0 (for policy run)

Benchmark

No Binning

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  4.097 s   4.103 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.100 s             ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.100 s ± 4.320 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                      █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.1 s         Histogram: frequency by time         4.1 s <

 Memory estimate: 223.08 MiB, allocs estimate: 1261179.

With Binning

BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min  max):  4.028 s    4.049 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.039 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.039 s ± 15.399 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.03 s         Histogram: frequency by time        4.05 s <

 Memory estimate: 223.31 MiB, allocs estimate: 1258769.

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         48        
 0.17       10         -16.8795969  -4.9353316   11.9442653748   125        413       
 0.64       20         -16.8186935  -5.0282493   11.7904441409   234        739       
 1.42       30         -16.7359900  -5.0891495   11.6468404953   355        1093      
 2.51       40         -16.7337451  -5.1183440   11.6154011517   385        1459      
 3.89       50         -16.7282187  -5.1628562   11.5653624986   417        1801      
 4.81       60         -11.7655940  -5.1908938   6.5747002521    78         2054      
--------------------------------------------------------------------------------------
 5.00       65         -11.6514521  -5.2057404   6.4457116765    122        2132      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         47        
 0.17       10         -16.8796840  -4.9351478   11.9445362179   126        409       
 0.64       20         -16.7377465  -5.0283407   11.7094058753   250        752       
 1.54       30         -16.7167952  -5.0845684   11.6322268160   311        1117      
 2.68       40         -16.7111230  -5.1153538   11.5957691515   363        1478      
 3.82       50         -16.7054256  -5.1568402   11.5485853655   405        1796      
 4.75       60         -11.6143236  -5.1700763   6.4442473223    91         2030      
--------------------------------------------------------------------------------------
 5.05       68         -11.6140280  -5.2071470   6.4068810512    109        2119      
--------------------------------------------------------------------------------------

TagPOMDP

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 180.0 (for policy run)

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         48        
 0.18       10         -16.8795969  -4.9353316   11.9442653748   125        413       
 0.65       20         -16.8186935  -5.0282493   11.7904441409   234        739       
 1.43       30         -16.7359900  -5.0891495   11.6468404953   355        1093      
 ...    
 161.50     600        -10.8643757  -5.8139829   5.0503927242    1023       8574      
 166.07     610        -10.8643757  -5.8184209   5.0459547732    1036       8648      
 171.04     620        -10.8643757  -5.8268039   5.0375717641    1073       8755      
 175.73     630        -10.8643757  -5.8315453   5.0328303562    1110       8860      
--------------------------------------------------------------------------------------
 180.19     639        -10.8643757  -5.8339685   5.0304072086    1114       8941      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 0.01       0          -19.3713764  -4.6944443   14.6769321680   16         47        
 0.17       10         -16.8796840  -4.9351478   11.9445362179   126        409       
 0.63       20         -16.7377465  -5.0283407   11.7094058753   250        752       
 1.53       30         -16.7167952  -5.0845684   11.6322268160   311        1117      
 ...    
 166.64     710        -10.8729234  -5.8786049   4.9943184346    775        8954      
 171.18     720        -10.8729234  -5.8845024   4.9884209677    758        9039      
 175.76     730        -10.8729234  -5.8894912   4.9834321284    726        9118      
 179.82     740        -10.8729234  -5.8941603   4.9787630709    784        9216      
--------------------------------------------------------------------------------------
 180.14     742        -10.8729234  -5.8942929   4.9786304293    795        9223      
--------------------------------------------------------------------------------------

RockSamplePOMDP(15,10)

Settings:

  • epsilon: 0.1
  • precision: 0.001
  • delta: 0.0001
  • max_steps: 50 (for benchmarking)
  • max_time: 180.0 (for policy run)
  • init_lower: BlindLowerBound(9223372036854775807, 60.0, 0.001, Float64[], Float64[])
  • init_upper: FastInformedBound(9223372036854775807, 60.0, 0.001, 0.0, Float64[], Float64[])

Policy Run

No Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.16       0          14.9526422   18.9521036   3.9994614354    31         36        
 25.02      10         15.5252008   18.4262322   2.9010313613    298        472       
 73.36      20         15.6821109   18.2229256   2.5408147418    527        819       
 150.81     30         15.7936995   18.0692991   2.2755996353    766        1124      
--------------------------------------------------------------------------------------
 192.11     35         15.8237859   18.0477618   2.2239758629    867        1276      
--------------------------------------------------------------------------------------

With Binning

--------------------------------------------------------------------------------------
 Time       Iter       LB           UB           Precision       # Alphas   # Beliefs 
--------------------------------------------------------------------------------------
 1.07       0          14.1789401   18.9521036   4.7731635410    26         32        
 28.96      10         15.3435740   18.4262322   3.0826581548    386        452       
 82.15      20         15.7668437   18.2229256   2.4560819561    613        775       
 166.72     30         15.9182769   18.1103209   2.1920440784    798        1075      
--------------------------------------------------------------------------------------
 188.95     33         15.9374152   18.0702144   2.1327991877    823        1145      
--------------------------------------------------------------------------------------

@zsunberg
Copy link
Member

Wow @dylan-asmar , this looks really cool! Thanks for the contribution!

@WhiffleFish when can you give it a review?

src/tree.jl Outdated
Comment on lines 11 to 13
bin_levels_intervals::Vector{Dict{Symbol, Float64}}
bin_levels_nodes::Vector{Dict{Int, Dict{Symbol, Union{Tuple{Int, Int}, Float64}}}}
bin_levels::Vector{Dict{Symbol, Dict{Tuple{Int, Int}, Union{Float64, Int}}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These vectors of dictionaries (nested with further dictionaries) are really expensive in terms of performance. Also, the union typing results in some type instabilities.

Provided that a lot of the keys for these dictionaries are always the same (e.g. bin_value, bin_count, bin_error), can you find any way to flatten everything out into vectors, use fewer dictionaries, or just use structs/named tuples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I wasn't a big fan of the dictionary implementation when I was doing it. Reworking as vectors wouldn't be too bad since the number of bins stays the same. I can mark this as WIP until I make these changes if desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation with 3762018. In the process, I also found a bug with the dictionaries. With the binning, we are back down to the normal level of allocations vs the original implementation.

I updated the performance comparisons and added results for both delta=0.1 and delta=0.0001.

src/tree.jl Show resolved Hide resolved
@dylan-asmar dylan-asmar marked this pull request as draft July 22, 2024 22:43
@dylan-asmar dylan-asmar marked this pull request as ready for review July 24, 2024 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Belief space binning
4 participants