Author: MooingLemur, based on documentation written by JeffreyH
This is preliminary documentation and the specification can still change at any point.
This is a reference for the VERA FX features. It is meant to be a complement to the tutorial, currently found here.
The FX Update mainly adds "helpers" inside of VERA that can be used by the CPU. There is no "magic button" that allows you to do 3D graphics for example. It mainly helps at certain CPU time-consuming tasks, most notably the ones that are present in the (deep) inner-loop of a game/graphics engine. The FX Update does therefore not fundamentally change the architecture or nature of VERA, it extends and improves it.
In other words: the CPU is still the orchestrator of all that is done, but it is alleviated from certain operations where it is not (very) good at or does not have direct access to.
FX Update extends addressing modes, it does not add or extend renderers.
VERA is mapped as 32 8-bit registers in the memory space of the Commander X16, starting at address $9F20 and ending at $9F3F. Many of these are (fully) used, but some bits remain unused. The DCSEL bits in register $9F25 (also called CTRL) has been extended to 6-bits to allow for the 4 registers $9F29-$9F2C to have additional meanings.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F25 | CTRL | Reset | DCSEL |
ADDRSEL |
The FX features use DCSEL values 2, 3, 4, 5, and 6. This effectively gives FX 20 8-bit registers. Note that 15 of these registers are write-only, 2 of them are read-only and 3 are both readable and writable,
Important: unless DCSEL values of 2-6 are used, the behavior of VERA is exactly the same as it was before the FX update. This ensures that the FX update is backwards compatible with traditional non-FX uses of VERA.
When DCSEL=2, the main FX configuration register becomes available (FX_CTRL/$9F29), which is both readable and writable. The 2 lower bits are the addr1 mode bits, which will change the behavior of how and when ADDR1 is updated. This puts the FX helpers in a certain "role".
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes | Cache Write Enable | Cache Fill Enable | One-byte Cache Cycling | 16-bit Hop | 4-bit Mode | Addr1 Mode |
Addr1 Mode | Description |
---|---|
0 | Traditional VERA behavior |
1 | Line draw helper |
2 | Polygon filler helper |
3 | Affine helper |
By default, Addr1 Mode is set to 0 (=00b), which is the normal and already-known behavior of ADDR1
.
When Addr1 Mode is set to 1 (=01b) the line draw helper is enabled.
- Set
ADDR1
to the address of the starting pixel - Determine the octant (see below) you are going to draw in, which will inform your
ADDR0
andADDR1
increments.- Set
ADDR1
increment in the direction you will always increment each step- For 8-bit mode: (+1, -1, -320, or +320)
- For 4-bit mode: (-0.5, +0.5, -160, or +160)
- Set
ADDR0
increment in the direction you will sometimes increment. Even though this is the increment forADDR0
, we are using it in line draw mode as an incrementer forADDR1
.- For 8-bit mode: (+1, -1, -320, or +320).
- For 4-bit mode: (-0.5, +0.5, -160, or +160)
- For 4-bit mode, the half increments are set via the Nibble Increment bit and optionally the DECR bit in
ADDRx_H
. For the Nibble Increment bit to have effect, the main Address Increment must be set to 0, and the 4-bit Mode bit must be set in FX_CTRL ($9F29, DCSEL=2).
- Set
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F22 | ADDRx_H (x=ADDRSEL) | Address Increment | DECR |
Nibble Increment |
Nibble Address | VRAM Address (16) |
Octant | 8-bit ADDR1 increment |
8-bit ADDR0 increment |
4-bit ADDR1 increment |
4-bit ADDR0 increment |
---|---|---|---|---|
0 | +1 | -320 | +0.5 | -160 |
1 | -320 | +1 | -160 | +0.5 |
2 | -320 | -1 | -160 | -0.5 |
3 | -1 | -320 | -0.5 | -160 |
4 | -1 | +320 | -0.5 | +160 |
5 | +320 | -1 | +160 | -0.5 |
6 | +320 | +1 | +160 | +0.5 |
7 | +1 | +320 | +0.5 | +160 |
- Set your slope into the two "X Increment" registers (DCSEL=3, see below). Note that increment registers are 15-bit signed fixed-point numbers, and for this mode, the range should be 0.0 to 1.0 inclusive, so you'll either want to store the value of 1, or you'll want to set only the fractional part.
Note: Of the two incrementers, the line draw helper uses only the X incrementer. However depending on the octant you are drawing in, this incrementer will be used to depict either x or y pixel increments. So the "X" should not be taken literally here, it just means the first of the two incrementers.
- As a side effect of in line draw mode, by setting
FX_X_INCR_H
($9F2A, DCSEL=3), the fractional part (the lower 9 bits) of X Position are automatically set to half a pixel. Furthermore, the lowest bit of the pixel position (which acts as an overflow bit) is set to 0 as well. This effectively sets the starting X-position to 0.5 (the center) of a pixel.
Note: There is no need to set the higher bits of the X position, since the FX X position (accumulator) is only used to track the fractional (subpixel) part of the line draw.
When Addr1 Mode is set to 2 (=10b) the polygon filler helper is enabled.
Assuming a 320 pixel-wide screen
- Set
ADDR0
to the address of the y-position of the top point of the triangle and x=0 (so on the left of the screen). Set its increment to +320 (for 8-bit mode) or +160 (for 4-bit mode).- Note:
ADDR0
is used as "base address" for calculatingADDR1
for each horizontal line of the triangle.ADDR0
should therefore start at the top of the triangle and increment exactly one line each time. - There is no need to set
ADDR1
. This is done by VERA.
- Note:
- Calculate your slopes (dx/dy) for both the left and right point. Unlike the line draw helper, these slopes can be negative and can exceed 1.0. They are not dependent on octant, but cover the whole 180 degrees downwards. Below is an illustration of some (not-to-scale) examples of increments:
- Set
ADDR1
increment to +1 (for 8-bit mode) or +0.5 (for 4-bit mode)ADDR1
increment can also be +4 if you use 32-bit cache writes, explained later)
- Set your left slope into the two "X increment" registers and your right slope into the two "Y increment" registers (DCSEL=3, see below).
- Important: They should be set to half the increment (or decrement) per horizontal line! This is because the polygon filler increments in two steps per line.
- Note that increment registers are 15-bit signed fixed-point numbers:
- 6 bits for the integer pixel increment
- 9 bits for the fractional (subpixel) increment
- 1 additional bit that indicates the actual value should be multiplied by 32
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_X_INCR_L (DCSEL=3) (Write only) |
X Increment (-2:-9) (signed) | |||||||
$9F2A | FX_X_INCR_H (DCSEL=3) (Write only) |
X Incr. 32x | X Increment (5:0) (signed) | X Incr. (-1) | |||||
$9F2B | FX_Y_INCR_L (DCSEL=3) (Write only) |
Y/X2 Increment (-2:-9) (signed) | |||||||
$9F2C | FX_Y_INCR_H (DCSEL=3) (Write only) |
Y/X2 Incr. 32x | Y/X2 Increment (5:0) (signed) | Y/X2 Incr. (-1) |
- Due to the fact that we are in "polygon fill"-mode, by setting the high bits of the "X increment" ($9F2A, DCSEL=3), the "X position" (the lower 9 bits of the position in DCSEL=4 and DCSEL=5) are automatically set to half a pixel. The same goes for the high bits of the Y/X2 increment ($9F2C, DCSEL=3) and Y/X2 position.
- Set the "X position" and "Y/X2 position” to the x-pixel-position of the top triangle point.
Steps that are needed for filling a triangle part with lines:
-
Read from
DATA1
- This will not return any useful data but will do two things in the background:
- Increment/decrement the X1 and X2 positions by their corresponding increment values.
- Set
ADDR1
toADDR0
+ X1
- This will not return any useful data but will do two things in the background:
-
Then read the “Fill length (low)”-register. Its output depends on whether you're in 4 or 8-bit mode.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F2B | FX_POLY_FILL_L (DCSEL=5, 4-bit Mode=0) (Read only) |
Fill Len >= 16 | X Position (1:0) | Fill Len (3:0) | 0 | ||||
$9F2B | FX_POLY_FILL_L (DCSEL=5, 4-bit Mode=1, 2-bit Polygon=0) (Read only) |
Fill Len >= 8 | X Position (1:0) | X Pos. (2) | Fill Len (2:0) | 0 |
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F2C | FX_POLY_FILL_H (DCSEL=5) (Read only) |
Fill Len (9:3) | 0 |
- Together they give you 10-bits of fill length (ignore the other bits for now). Since
ADDR1
is already set properly you can immediately start drawing this number of pixels (given by Fill Len).sta DATA1
; as many times as Fill Len states
- Then read from
DATA0
: this will (also) increment X1 and X2 - Check if all lines of this triangle part have been drawn, if not go to the first step.
There is also a 2-bit polygon mode, which is better explained in the tutorial
When Addr1 Mode is set to 3 (=11b) the affine (transformation) helper is enabled.
When reading from ADDR1 in this mode, the affine helper reads tile data from a special tile area defined by two new FX registers:
- FX_TILEBASE is pointed to a set of 8x8 tiles in either 4-bit or 8-bit depth. FX can support up to 256 tile definitions, and can overlap the traditional layer tile bases.
- FX_MAPBASE points to a square-shaped tile map, one byte per tile. This tile map has no attribute bytes. unlike the traditional layer 0/1 tile maps.
- Affine Clip Enable changes the behavior when the X/Y positions are outside of the tile map such that it always reads data from tile 0. The default behavior is to wrap the X/Y position to the opposite side of the map.
- Map Size is a 2 bit value that affects both the width and height of the tile map.
Map Size | Dimensions |
---|---|
0 | 2×2 |
1 | 8×8 |
2 | 32×32 |
3 | 128×128 |
- The Transparent Writes toggle in FX_CTRL is especially useful in Affine helper mode. Setting this toggle causes a write of zero to leave the byte (or the nibble) at the target address intact. This toggle is not limited to affine helper mode, and it affects writes to both DATA0 and DATA1.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes |
Cache Write Enable | Cache Fill Enable | One-byte Cache Cycling | 16-bit Hop | 4-bit Mode | Addr1 Mode |
When using the affine helper, the X and Y position registers (DCSEL=4) are used to set ADDR1 to the source pixel indirectly in the aforementioned tile map, while the X and Y increments determine the step after each read of ADDR1.
The affine helper supports the full range of X and Y increment values, including negative values.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_X_INCR_L (DCSEL=3) (Write only) |
X Increment (-2:-9) (signed) | |||||||
$9F2A | FX_X_INCR_H (DCSEL=3) (Write only) |
X Incr. 32x | X Increment (5:0) (signed) | X Incr. (-1) | |||||
$9F2B | FX_Y_INCR_L (DCSEL=3) (Write only) |
Y/X2 Increment (-2:-9) (signed) | |||||||
$9F2C | FX_Y_INCR_H (DCSEL=3) (Write only) |
Y/X2 Incr. 32x | Y/X2 Increment (5:0) (signed) | Y/X2 Incr. (-1) |
When the CPU reads a byte via DATA0 or DATA1, and "cache fill enable" is set, the value read will be copied into an indexed location inside the 32-bit cache.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes | Cache Write Enable | Cache Fill Enable |
One-byte Cache Cycling | 16-bit Hop | 4-bit Mode | Addr1 Mode |
In 8-bit mode, a byte is cached, but in 4-bit mode, a nibble is cached instead. Afterwards, by default, the index into the cache is incremented, and loops back around to 0 after the last index. The index can be set explicitly via the FX_MULT register. 8-bit mode uses bits 3:2 and ranges from 0-3. 4-bit mode uses bits 3:1 and ranges from 0-7.
Alternatively, the cache index can cycle between two adjacent bytes: 0, 1, and back to 0; or 2, 3, and back to 2. This option only has effect in 8-bit mode.
Instead of filling the cache by reading from DATA0 or DATA1, the cache data can also be set directly by writing to the FX_CACHE* registers. Setting the cache directly does not affect the cache index.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CACHE_L (DCSEL=6) (Write only) |
Cache (7:0) | Multiplicand (7:0) (signed) | |||||||
$9F2A | FX_CACHE_M (DCSEL=6) (Write only) |
Cache (15:8) | Multiplicand (15:8) (signed) | |||||||
$9F2B | FX_CACHE_H (DCSEL=6) (Write only) |
Cache (23:16) | Multiplier (7:0) (signed) | |||||||
$9F2C | FX_CACHE_U (DCSEL=6) (Write only) |
Cache (31:24) | Multiplier (15:8) (signed) |
If "Cache write enabled" is set, the cache contents are written to VRAM when writing to DATA0 or DATA1. The primary use is to write all or part of the 32-bit cache to the 4-byte-aligned region of memory at the current address.
Control over which parts are written are chosen by the value written to DATA0 or DATA1. The value written is treated as a nibble mask where a 0-bit writes the data and a 1-bit masks the data from being written.In other words, writing a 0 will flush the entire 32-bit cache. Writing #%00001111
will write the second and third byte in the cache to VRAM in the second and third memory locations in the 4-byte-aligned region.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes | Cache Write Enable |
Cache Fill Enable | One-byte Cache Cycling | 16-bit Hop | 4-bit Mode | Addr1 Mode |
Transparent writes, when enabled, also applies to cache writes. If enabled, zero bytes (or zero nibbles in 4-bit mode) in the cache, which are treated as transparency pixels, are not written.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes |
Cache Write Enable | Cache Fill Enable | One-byte Cache Cycling | 16-bit Hop | 4-bit Mode | Addr1 Mode |
When "one-byte cache cycling" is turned on and DATA0 or DATA1 is written to, the byte at the current cache index is written to VRAM. When "Cache write enable" is set as well, the byte is duplicated 4 times when writing to VRAM.
Usually the incrementing of the cache index is only triggered by reading from DATA0 or DATA1 when cache filling is enabled. However it can also be triggered by reading from DATA0 in polygon mode when cache filling is not enabled and "one-byte cache cycling" is enabled.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes | Cache Write Enable | Cache Fill Enable | One-byte Cache Cycling |
16-bit Hop | 4-bit Mode | Addr1 Mode |
The 32-bit cache also doubles as an input to the hardware multiplier when Multiplier Enable is set.
To do a single multiplication, put the two 16-bit inputs into the two halves of the 32-bit cache.
lda #(2 << 1)
sta VERA_CTRL ; $9F25
stz VERA_FX_CTRL ; $9F29 (mainly to reset Addr1 Mode to 0)
lda #%00010000
sta VERA_FX_MULT ; $9F2C
lda #(6 << 1)
sta VERA_CTRL ; $9F25
lda #<69
sta VERA_FX_CACHE_L ; $9F29
lda #>69
sta VERA_FX_CACHE_M ; $9F2A
lda #<420
sta VERA_FX_CACHE_H ; $9F2B
lda #>420
sta VERA_FX_CACHE_U ; $9F2C
The accumulator can be used to accumulate the sum of several multiplications. Before doing this single multiplication, ensure this is reset this to zero, otherwise the output will be added to the value of the accumulator before being written. There are two methods to do this. The first is to write a 1 into bit 7 of FX_MULT ($9F2C, DCSEL=2). The other, more conveniently, is to read FX_ACCUM_RESET (the same register location as VERA_FX_CACHE_L).
lda FX_ACCUM_RESET ; $9F29 (DCSEL=6)
To perform the multiplication, it must be written to VRAM first. This is done via the cache write mechanism. Usually the cache itself is written to VRAM if "Cache Write Enable" is set. However, if the "Multiplier Enable" bit is also enabled, the multiplier result is written to VRAM instead.
; Set the ADDR0 pointer to $00000 and write our multiplication result there
lda #(2 << 1)
sta VERA_CTRL ; $9F25
lda #%01000000 ; Cache Write Enable
sta VERA_FX_CTRL ; $9F29
stz VERA_ADDRx_L ; $9F20 (ADDR0)
stz VERA_ADDRx_M ; $9F21
stz VERA_ADDRx_H ; $9F22 ; no increment
stz VERA_DATA0 ; $9F23 ; multiply and write out result
lda #%00010000 ; Increment 1
sta VERA_ADDRx_H ; $9F22 ; so we can read out the result
lda VERA_DATA0
sta $0400
lda VERA_DATA0
sta $0401
lda VERA_DATA0
sta $0402
lda VERA_DATA0
sta $0403
Note: the VERA works by pre-fetching the contents from VRAM whenever the address pointer is changed or incremented. This happens even when the address increment is 0. Due to this behavior, it is possible to have stale data latched in one of the two data ports if the underlying VRAM is changed via the other data port. This example avoids this scenario by only using ADDR0/DATA0. This potential gotcha was not introduced by the FX update, but rather has always been how VERA behaves.
One can also trigger the multiplication and add it to (or subtract it from) the multiplication accumulator by calling "accumulate" in one of two different ways. We could write a 1 into bit 6 of FX_MULT ($9F2C, DCSEL=2), but more conveniently, we can read FX_ACCUM (the same register location as VERA_FX_CACHE_M)
lda FX_ACCUM ; $9F2A (DCSEL=6)
Once the accumulation is triggered, the result of the operation is stored back into the accumulator.
The default accumulation operation is (multiply then) add. This can be switched to subtraction by setting the Subtract Enable bit in FX_MULT
If the multiplication accumulator has a nonzero value, any multiplications carried out via a VRAM Cache write will be offset by the value of the accumulator (either added to or subtracted from the accumulator), but they will not change the value of the accumulator.
There is a special address increment mode that can be used to read pairs of bytes via ADDR1.
Addr | Name | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
---|---|---|---|---|---|---|---|---|---|
$9F29 | FX_CTRL (DCSEL=2) |
Transp. Writes | Cache Write Enable | Cache Fill Enable | One-byte Cache Cycling | 16-bit Hop |
4-bit Mode | Addr1 Mode |
In this mode, setting ADDR1's increment to +4 will result in alternating increments of +1 and +3. Setting it to +320 will result in alternating increments of +1 and +319. All other increment values, including negative increments, lack this special hop property.
After this bit is set, writing to ADDRx_L resets the hop alignment such that the first increment is +1.
This mode is useful for reading out a series of 16-bit values after a series of multiplications.
For a more detailed explanation of chained math operations, see the tutorial.