Skip to content

Apple GCR disk encoding

Thomas Harte edited this page May 11, 2018 · 8 revisions

Physical Encoding

Apple's GCR encoding was designed in reaction to FM encoding, and uses the same data density and bit clock. However in its more efficient '6 and 2' form, it uses only two-thirds as much disk surface area as FM to encode the same data. It predates and is not as efficient as MFM encoding.

Two forms of Apple GCR were used during the Apple II's lifetime: '5 and 3' and '6 and 2'. Each name refers to the number of bits of information you get from each 8 flux transition windows on a disk. '5 and 3' is therefore less efficient than '6 and 2' because only five bits of information are decoded from each eight flux windows, rather than six.

GCR disks have a third encoding of data, '4 and 4' which is very similar to FM. It is used only for sector metadata, not for sector contents themselves.

Track Layout

Apple's track layout derives from that of FM data: for each sector there is a header, then a gap, then the data, then another gap.

The gaps contain synchronisation information, allowing the controller to align its read window with the on-disk data.

A header consists of the sector's track and sector number, the disk's volume identifier, and a check value. Each of those is one byte long. The check value is a simple exclusive OR of the other values.

The data consists of 256 bytes of information plus a check byte. As with the header, the check byte is a simple exclusive OR of the other values.

Flux window content rules

The Disk II utilises an 8-bit lsb-to-msb shift register. It shifts at the on-disk data rate. It will shift in 1s wherever a flux transition is found on the disk and 0s where the flux transition is absent.

If the MSB of the register is 0, it will shift immediately upon detecting the flux transition. If the msb is 1, it will pause slightly before shifting in the next bit.

This is to allow adherence to a rule that for the purpose of synchronisation, encoded bytes will always have the msb set. The archetypal polling loop to obtain the next byte from the Disk II is:

.loop    LDA shift_register
         BPL .loop

A further constraint is imposed by the analogue-to-digital conversion that looks for flux transitions; its automatic gain control is prone to amplifying noise into signal if more than two consecutive flux windows pass without a transition in them.

Applying those two constraints — the msb set and no more than two consecutive zeros — motivates '6 and 2' encoding as the number of bytes with that property is between 64 and 128, making 6 the easiest number of bits to encode in a byte for base two.

Steve Wozniak who designed the '6 and 2' encoding had previously been under the impression that the rule was that there could be no more than a single consecutive zero bit; the less-efficient '5 and 3' encoding is the result of conforming to that stricter constraint.

Sync words

As above, sync words lie in the gaps between sectors and between the header and data parts of sectors.

A sync word is simply an ff encoded byte followed by as many zeroes as the content rules will allow: a single zero for '5 and 3', or two zeroes for '6 and 2'.

Given the top-bit-set rule, a series of sync words has the effect of bringing a CPU polling loop as above into phase with the start of each sync word.

'4 and 4' Encoding

Sector header content is encoded in '4 and 4' form regardless of the encoding in use for sector contents.

'4 and 4' encoding encodes the source byte b, with bits b7, b6, b5 ... b0 as the two on-disk bytes:

1 b7 1 b5 1 b3 1 b1
1 b6 1 b4 1 b2 1 b0

Which is equivalent to FM encoding other than in bit order. The bits are ordered different to allow for efficient decoding:

(((1 b7 1 b5 1 b3 1 b1) << 1) | 1) & (1 b6 1 b4 1 b2 1 b0) = original byte

A complete sector header is formed on disk as:

three bytes prologue: 0xd5, 0xaa, 0x96
two bytes: '4 and 4' encoded volume
two bytes: '4 and 4' encoded track
two bytes: '4 and 4' encoded sector
two bytes: '4 and 4' encoded check value — the exclusive OR of (volume, track, sector)
three bytes epilogue: 0xde, 0xaa, 0xeb

track counts upwards from zero for the outermost track. sector counts upward from zero for the first sector on a track.

volume has at least two context-dependent meanings. In both of Apple's operating systems it defaults to 254 for Disk II-compatible media and in Pro DOS is used to confirm the volume type. Some software prefers to use it as volume number, for distinguishing different disks or sides of a disk. It should be written as 254 unless there is a reason to do otherwise.

In principle the entire disk contents could have been encoded in '4 and 4' form, to give exactly the same data density as FM encoding. In practice the more-efficient '5 and 3' encoding was the first to be deployed.

'6 and 2' Encoding

Apple's second deployed sector data encoding fits six data bits into every on-disk byte.

In overview:

  • the two lowest bits are taken from each of the 256 source bytes;
  • the remaining six bits for each of the source bytes fill the final 256 on-disk bytes of the sector;
  • 86 bytes before those 256 contain the separated low bits, so the total data size is 256+86=342 bytes;
  • an exclusive OR checksum is used, but to reduce decoding time it is applied within the six-bit data rather than as a completely orthogonal field, as described below; and
  • a three-byte prologue and a three-byte epilogue are applied.

The first 84 bytes after the prologue are consistently formed as:

byte n = {
    bits 4 & 5: low two bits of source byte n + 172, reversed
    bits 2 & 3: low two bits of source byte n + 86, reversed
    bits 0 & 1: low two bits of source byte n, reversed
}

n+172 would be out of bounds for the 85th and 86th bytes, so they are formed as:

byte n = {
    bits 2 & 3: low two bits of source byte n + 86, reversed
    bits 0 & 1: low two bits of source byte n, reversed
}

From the 87th byte onwards the content is then:

byte n+87 = high six bits of source byte n

For checksumming purposes, the 342nd byte is duplicated to create a 343rd. Each six-bit value is then exclusive ORd with the value one before it. The first value is unaltered. This allows the decoder to keep a running exclusive OR tally from the start of the data to the end; it is seeded with the decoded value in byte 0, and from then on determines the true value of byte n by running from start to finish, exclusive ORing the received value with the running tally and storing the result.

As long as the final byte decoded matches the one stored on disk as the 343rd, the exclusive OR test passes.

For writing to disk each six-bit value is mapped to an eight-bit value using the table:

const uint8_t six_and_two_mapping[] = {
	0x96, 0x97, 0x9a, 0x9b, 0x9d, 0x9e, 0x9f, 0xa6,
	0xa7, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb2, 0xb3,
	0xb4, 0xb5, 0xb6, 0xb7, 0xb9, 0xba, 0xbb, 0xbc,
	0xbd, 0xbe, 0xbf, 0xcb, 0xcd, 0xce, 0xcf, 0xd3,
	0xd6, 0xd7, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde,
	0xdf, 0xe5, 0xe6, 0xe7, 0xe9, 0xea, 0xeb, 0xec,
	0xed, 0xee, 0xef, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6,
	0xf7, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};

Where six-bit value n maps to the 8-bit value six_and_two_mapping[n].

The prologue for sector data is d5, aa, ad; the epilogue is de, aa, eb, which is the same epilogue as for sector headers.

So the complete sector data is formed on disk as:

prologue d5, aa, ad
[ XOR section:
    86 bytes containing combinations of the low two bits of source bytes
    256 bytes containing the high six bits of source bytes
], all encoded via the 6-to-8 table
the XOR check value, which is the same as the high six bits of the final source byte, 6-to-8 encoded
epilogue de, aa, eb

Sector interleaving and file formats

Apple's '5 and 3' operating systems physically interleave sectors on the disk surface. The 6 and 2 operating systems do not; a raw reading of the disk surface would show sector 0, followed by sector 1, followed by sector 2, etc. Instead they apply an internal remapping of DOS logical sectors to on-disk physical sectors.

The most common type of Apple disk image — variously DSK, DO, PO and other extensions — is a sector-contents-only dump, containing the original sectors in logical order. So to map them back to real media a program must produce sectors in ordinary ascending order but pick sector contents from non-sequential parts of the file: e.g. for a DOS 3.3 image the first on-disk sector, labelled as sector 0, should contain the first sector from the file but the second on-disk sector, labelled as sector 1, should contain the 8th sector from the file.

Therefore the proper physical interpretation of that sort of disk image is tightly coupled to the internals of the specific software that created it.

Specifically, on-disk sectors 0 to 15 of a DOS 3.3 image (ordinarily having the extension DSK or DO) should contain the contents of the image sectors at offsets: 0, 7, 14, 6, 13, 5, 12, 4, 11, 3, 10, 2, 9, 1, 8, 15 — i.e. increase by 7 at each step, and take the modulo by 15 if out of bounds. The on-disk sectors 0 to 15 of a Pro-DOS image (ordinarily PO) should contain the sectors at offsets 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15 — i.e. increase by 8 each step, and take the module by 15 if out of bounds.