Skip to content

Latest commit

 

History

History

benchmarker

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

AI Script/Instruction Benchmarker

This utility benchmarks the execution speed of instructions in the TPT2 AI. Using it is a bit more involved than other scripts, because it's designed for programmers, and is mainly of interest to programmers. Here is what you'll need to do:

  1. Temporarily disable any software that runs on wakeup(), so that you have a clean slate when you start the AI with F4. This definitely includes turbo exec!
  2. Import the workspace code below into the editor at https://d0sboots.github.io/perfect-tower/. It's not an import code for the game, you can't import it directly that way.
  3. Modify benchmark_1 to test what you want. There are macros in there to help you write expressions that have lots of terms, so that effects which are normally small will be magnified. You can also create benchmark_2, benchmark_3, etc. to run at the same time if you want to compare multiple scenarios.
  4. (Optional) Tweak the constants in benchmark_import_lib if they're not to your liking. Changing NUM_BENCHMARKS is required if you added more benchmark scripts.
  5. Export the workspace and import the result into the game, and press 'b' to start the benchmark suite.

By default, it will take quite a while for the benchmark to finish, but it converges to a near-final value pretty quickly. The most relevant number is the "self" value for each benchmark_N result: This is the amount of time that is attributable to specifically the code under test, per iteration. The other values are "best", which is the best (minimum) time for the entire execution, and "last", which is the value for the last (most recent) run. "Self" = "best" - "best_of_baseline", where baseline is all the benchmark machinery run without any benchmark script.

Editor import code:

{"workspaces":{"Benchmark":[["benchmark_import_lib","; Macro for keeping all the names consistent. Changing the package name here\n; changes it for all scripts.\n#script_name(name) D0S.Benchmark v1.0:{name}\n\n; Display variable for the number of elapsed turbo cycles. As a user, you will\n; only see the final count, because that's the value it has when the frame ends.\n:global int benchmark_cycles\n\n; How many cycles to run each measurement for. The script itself will run turbo\n; for a few more cycles than this to do bookkeeping work, so expect\n; \"benchmark_cycles\" to be slightly larger.\n; Although the precision of the timer is much higher, the actual accuracy of the\n; now() function appears to be 0.5ms. So to achieve a timing accuracy of 0.1us,\n; we need (at least) 5000 cycles.\n:const int NUM_CYCLES 5000\n\n; The number of benchmark programs to run. Each one should be named \"benchmark_N\",\n; inside this package.\n:const int NUM_BENCHMARKS 1\n\n; How many times to repeat the measurement, to try to get the best, stable value.\n; Note that you'll be able to see in-progress results before this point, and\n; that you can also interrupt the process with F4, so there's little downside to\n; this being large.\n:const int NUM_REPEATS 1000\n"],["turbo_trigger",":import benchmark_import_lib\n\n:name {script_name(turbo trigger)}\n\n; Doing this triggers turbo-exec behavior.\n; We use our own turbo exec implementation, because we want complete control\n; over how it works and the timing, and no overhead from other turbo\n; implementations.\nstop(\"{script_name(turbo trigger)}\")\n"],["turbo_exec",":import benchmark_import_lib\n\n:name {script_name(turbo exec)}\n\n; Call back to main, to use it as a counting script. Because turbo hasn't started\n; yet, main will run its first instruction immediately, which is a gotoif().\n; It will be behind this script in the order, so resetting benchmark_cycles\n; will happen before it starts incrementing.\nexecute(\"{script_name(main)}\")\n\n; Run our custom turbo script. Since this is the last line, it will keep executing\n; as long as turbo is active, which will happen as long as we keep running this.\n;\n; We add cycles to this, so that turbo will continue until the next loop.\n; The actual length of the loop that is timed is controlled by main.\nexecute(if(benchmark_cycles <= NUM_CYCLES + 6, \"{script_name(turbo trigger)}\", \"\"))\n"],["main",":import benchmark_import_lib\n\n:name {script_name(main)}\n\n; This is the main program for benchmarking. There should be no need for\n; manual modifications here; all the constants that need tweaking are in\n; benchmark_import_lib. But it might be useful to understand how it works.\n\n:global int benchmark_idx\n:local double start_time\n:local double elapsed_time\n\nkey.b()\n\n#current_benchmark (benchmark_idx % (NUM_BENCHMARKS + 1))\n#bench_time ldg(\"btime\" . {current_benchmark})\n#nanos(time) round(({time}) * (100. / NUM_CYCLES))\n\n; We get executed from \"turbo exec\" if we are being used to count cycles.\ngotoif(counting, impulse() == \"{script_name(turbo exec)}\")\n\nbenchmark_idx = 0\nloop:\nexecute(\"{script_name(turbo exec)}\")\n\n; Turbo exec will start on this instruction: It takes one cycle for \"turbo exec\"\n; to execute main in counting mode (which happens immediately, on the line\n; above), and then on this line it beging the execute loop.\n; When we are looping, turbo exec will continue through the goto, ending\n; on the \"execute\". So the line above will be the last instruction of the frame,\n; followed by a frame break, followed by this being the first.\nbenchmark_cycles = 0\n\n; The first iteration is for benchmark \"0\", the base case with no payload.\nexecute(\"{script_name(benchmark_)}\" . {current_benchmark})\n\n; Record start_time after the payload is already started, for maximum stability.\nstart_time = now()\n\n; benchmark_cycles will be 3 the first time this line is run. There's\n; intrinsically 2 cycles between the two time measurements, assuming we didn't\n; wait at all. So, we subtract 1 to align these two quantities, bringing the\n; number of cycles to match the desired NUM_CYCLES.\nwaituntil(benchmark_cycles - 1 >= NUM_CYCLES)\nelapsed_time = now() - start_time\nlds(\"btime\" . {current_benchmark},\\\n  min(elapsed_time, if({bench_time} == 0., 1./0., {bench_time}))\\\n)\n; Update the result display for the benchmark run\ngss(\\\n  if({current_benchmark} == 0, \"base\", \"benchmark_\" . {current_benchmark}),\\\n  \"last: \" . {nanos(elapsed_time)} . \"nS, best: \" .\\\n  {nanos({bench_time})} .\\\n  if(\\\n    {current_benchmark} == 0,\\\n    \"\",\\\n    \"nS, self: \" . {nanos({bench_time} - ldg(\"btime0\"))}\\\n  ) . \"nS\"\\\n)\n\nbenchmark_idx += 1\ngoto(if(benchmark_idx >= (NUM_BENCHMARKS + 1) * NUM_REPEATS, end, loop))\n\ncounting:\n; As the last line, this will keep getting executed until turbo ends.\nbenchmark_cycles += 1\n\n; Jumping here will loop the previous instruction in turbo, but it will quit the\n; script if we're out of turbo.\nend:\n"],["benchmark_1",":import benchmark_import_lib\n\n:name {script_name(benchmark_1)}\n\n; Benchmarks are put in scripts named like this. You can have as many as you want,\n; but you have to modify the constant NUM_BENCHMARKS in main accordingly.\n\n; Benchmarks can be any length, but single-line ones are best/easiest. This is\n; because the time is being measured overall, so longer things are harder to\n; measure, and the time will just be an average score.\n\n; Duplicates \"term\" num_copies times.\n#dup(term, num_copies) {lua(\\\n  local acc = {}\\\n  for i = 1, {num_copies} do\\\n    acc[i] = [[{term}]]\\\n  end\\\n  return table.concat(acc)\\\n)}\n\n; The following macro is useful for amplifying infix operators, such as\n; conditional terms or math, to make effects more obvious.\n; It duplicates \"term\" num_copies times, placing \"operator\" in-between\n; each copy.\n#chain_infix(term, operator, num_copies) {lua(\\\n  local acc = {}\\\n  for i = 1, {num_copies} do\\\n    acc[i] = [[{term}]]\\\n  end\\\n  return table.concat(acc, \" {operator} \")\\\n)}\n\n; The following macros are useful for amplifying binary functions. The first\n; chains them together in pre-order traversal, the 2nd in post-order.\n; Be aware that \"num_copies\" refers to the number of copies of the *function*\n; here - there will be one more \"term\" than that, in total.\n#binary_preorder(func, term, num_copies) {lua(\\\n  local acc = {}\\\n  for i = 1, {num_copies} do\\\n    acc[i] = [[{func}(]]\\\n  end\\\n  acc[#acc+1] = [[{term}]]\\\n  for i = 1, {num_copies} do\\\n    acc[#acc+1] = [[,{term})]]\\\n  end\\\n  return table.concat(acc)\\\n)}\n#binary_postorder(func, term, num_copies) {lua(\\\n  local acc = {}\\\n  for i = 1, {num_copies} do\\\n    acc[i] = [[{func}({term}, ]]\\\n  end\\\n  acc[#acc+1] = [[{term}]]\\\n  for i = 1, {num_copies} do\\\n    acc[#acc+1] = \")\"\\\n  end\\\n  return table.concat(acc)\\\n)}\n\n;waituntil({chain_infix(true, &&, 35)})\nwaituntil(contains({binary_preorder(concat, \"test\", 9)}, \"{dup(test, 10)}\"))\n"]]}}