Skip to content
Nicolas Cannasse edited this page Feb 7, 2020 · 8 revisions

It is possible to profile HL bytecode applications in order to get accurate CPU measurements.

picture

Sampling Profiler

HashLink profiler is a sampling profiler. It runs as a separate thread that will capture the call stack of other VM threads N times per seconds. With a big N (10000 for example) this give you a 0.1 ms precision for a given measurement. If you measure over a long period of time a running program, for example if you capture a game frames, selecting several frames and looking at the average measurements will further improve the precision.

The best part about a sampling profiler is that since it does not instrument the running code but only sample it from another thread, the original code will run exactly at the same speed as it usually does, no need to compile using a Debug slower build. This also means the profiler will measure your application speed exactly the way it will run on every computer.

Running the Profiler

In order to start a profiling session, you can use the --profile command line parameter:

hl --profile 10000 myapp.hl <app args>

The application will start profiling immediately. Once it terminates, a hlprofile.dump binary file will be created that contains all the profiling information for the session.

If you are using the Visual Studio Code HashLink Debugger, you can also set "profileSamples" : 10000 in your launch.json options (requires HL Debugger 0.9.0+).

Displaying results

The hlprofile.dump obtained cannot be exploited directly, it first needs to be analysed and converted into JSON format. In order to do this, compile the ProfileGen Haxe project and run it with hl profiler.hl </path/to/hlprofile.dump>. It will transform the binary dump file into a JSON file that can be displayed. By default it overwrites the hlprofile.dump but you can add -out profile.json parameter to write it to another file.

Once you have converted the hlprofile.dump to JSON, you can open it using Chrome Profiler.

  • open google chrome
  • open developer tools
  • navigate the Performance tab
  • click on the arrow up (Load Profile) and select your hlprofile.dump file

This will give you the following result:

image

From here, you can analyse the results, select a specific frame range and see which function runs and where are the bottlenecks for optimization purposes.

Profiler API

You can use hl.Profile.event API to send a profiler event from your application runtime. These events will be inserted into the binary dump and can be processed by the ProfileGen later, allowing you to create a fully customized reporting.

For example:

hl.Profile.event(44); // insert event 44
hl.Profile.event(55,"My label"); // insert event with string data
hl.Profile.eventBytes(66, myBytesData); // insert event with bytes data

While the profiler supports profiling several threads, the hl.Profile.event API is not thread safe, use with care.

Also, the following special event code are already implemented:

  • -1 : pause the profiler for the current thread
  • -2 : resume the profiler for the current thread
  • -3 : clear all accumulated profile data
  • -4 : pause the profiler for all threads
  • -5 : resume the profiler for all threads
  • -6 : force generation of hlprofile.dump - don't wait for program exit
  • -7 : allows to start the profiler setup, you can pass optional samplesCount (default: 1000)
  • 0 : insert end-of-frame, for games and other realtime applications

Requirements

Profiler is introduced in HL 1.1 and only implemented for Windows so far, see src/profile.c to contribute other platforms.

Clone this wiki locally