-
Notifications
You must be signed in to change notification settings - Fork 279
2019 Toronto Thursday
Assigned tasks to April, May, June and then to Q3 and Q4 of 2019.
(jrmuizel, gw, kvark, nical)
FWIW, we should expect 75%+ of Android devices to support ES3.
No texture arrays:
- use big 4k by 4k textures
- Some very old devices are 2k, but should be vanishingly few.
- can still use the rect packer over smaller 512x512 portions of it
- if out of space, allocate a new texture and break batches. Hopefully, not often
Around 50% ES2 devices support half-float textures:
- texture_float
- texture_half_float
- pack data into RGBA8?
- move some stuff into vertex attributes
Vertex texel fetch:
- Mali-450(? Amazon device) doesn't support vertex texture units.
- instead of using instanced attributes, copy them over per vertex
- roughly 4x memory/bandwidth requirements for vertex buffers, could be improved
(kvark, gw)
Idea: hide internation semantics as an implementation detail. Expose the API as device.createSomeObject() -> objectHandle
. If the implementation decides to return the old handle - the client (Gecko in our case) doesn't care. Could be done gradually, object by object.
(nical, kvark)
Go through Low/High priority scene builds to a SB thread per document. Essentially doing the same thing as now, just more granular and explicit.
Still needs a way to synchronize scene builds for both documents when resizing Gecko window.
Situation: we don't get more than 2 frames ahead.
Problem: if we fire 2 frames in a row, we'll not have enough time for the 3rd frame procession through the pipeline. Big stall.
Option: don't limit the pipeline by 2 frames
- coalesce display lists on WR side when needed instead of throttling
- conflict with WebGL requirements
- can still throttle in Gecko, just to a higher number of pipeline stages
Q: why do we even have a renderer thread?
- we go through compositor because of language barrier (impl detail)
- no real reason, just convenient to implement
Idea: don't go through RB when asking for a scene build:
- texture cache isn't needed
- fonts can be shared
Tasks:
- serialize DL creation with the end of scene building
- remove the RB visit
RB needs to be Vsync synchronized, because it uses the results of inputs.
WebGL:
- the less frames in flight the better for latency
- not very clean, has half a frame in flight
- transaction = drawn frame + fence
- we only pass the transaction when the fence is reached
Idea: the best way to budget frames and to pipelining is having some heuristics that predict frame consistency.
- but we don't really want to put heuristics, web is too complex
- but we already have a heuristic to estimate the time from input sampling to VSync...
Q: how do we reproduce the scheduling problems in general?
Time is only sampled at the start of the compositor. So by the time inputs are sampled, we live in the past.
Chrome approach: DL building starts at -1 vsync, rendering starts 5ms before the vsync. Current WR approach: DL building starts at -2 vsync, rendering starts at -1 vsync.
Note: chrome has less latency but not neccessarily higher throughput. Goal is to make the input latency stable (not necessarily constant).
Idea: both of these periods before vsync-0 are not related to vsyncs, strictly speaking. We need some heuristics to know when to start that work, to finish before the vsync on the GPU at the end of the day. We need to:
- detach them from the refresh driver, at first have them fixed to current numbers
- start making the heuristics more flexible, based on the previous frames
Problem: we only know how to wake up threads on the refresh driver at the moment
- solution doesn't have to be exact: an error within 1ms is still acceptable
- need to look up the way Chrome does it
(mstange, jrmuizel, gw, kvark)
Example Intel-based macBook has:
- 720k of L2
- 8M of L3
Total byte size of the screen buffer is 20M, it doesn't fit into L3 cache, causing us to wait for RAM a lot. Solution:
- draw to tiles instead of blitting from the full screen into tiles
- either blit or direct-composit the tiles on screen
- don't wait for a picture to repeat itself a few frames, always go the tiling code path
Q: What is the best tile size?
- having it fit in 256K makes us fully within L2 cache and has some benefits
- current tiles are 4x bigger: 1024x256, still fit in the L3 cache. Can make them 2-4 times more big if we want to.
- small tiles cause a lot of batches
Q: why does drawing many instances of full-tile blends not scale linearly in GPU time?
- there is a fixed cost to load the initial framebuffer color as well as write it down at the end
- just like with tilers on mobile!