Skip to content

Banjo Kazooie checksum routine

bryc edited this page Dec 24, 2023 · 3 revisions

Update: A complete comparison of the disassembly in all 4 games is here: https://github.com/bryc/rare-n64-chksm/wiki/Checksum-code-disassembly. So this old page has repetitive info.


Checksum Calculation (Banjo-Kazooie)

First Loop

8025C164: Load Byte at Address: S0 (First)

    8025C164:    LBU        T8, 0x0000 (S0)   ; T8 = bytes[i], S0=address to read byte
    8025C168:    LW         T5, 0x004C (SP)   ; T5 = (mem_DWORD & 0xFFFFFFFF);
    8025C16C:    ANDI       T9, S1, 0x000F    ; T9 = S1 & 0x000F;
    8025C170:    SLLV       T0, T8, T9        ; T0 = T8 << T9;
    8025C174:    LW         T4, 0x0048 (SP)   ; T4 = (mem_DWORD >> 32);
    8025C178:    ADDU       T7, T0, T5        ; T7 = T0 + T5;
    8025C17C:    SRA        T2, T0, 0x1F      ; T2 = T0 >> 0x1F;
    8025C180:    SLTU       AT, T7, T5        ; if(T7 < T5){AT = 1;}else{AT = 0;}
    8025C184:    ADDU       T6, AT, T2        ; T6 = AT + T2;
    8025C188:    ADDU       T6, T6, T4        ; T6 = T6 + T4;
    8025C18C:    SW         T6, 0x0048 (SP)   ; mem_DWORD = (T6 << 32) + T7;
    8025C190:    SW         T7, 0x004C (SP)   ; mem_DWORD = (T6 << 32) + T7;
    8025C194:    JAL        0x8025C29C        ; Jump and set RA = 0x8025C19C (next instruction after OR)
    8025C198:    OR         A0, S2, R0        ; A0 = S2 | R0

8025C29C: Meat of the algorithm (Bit shifting, XOR etc)

    8025C29C:    LD         A3, 0x0000 (A0)   ; A3 = mem_DWORD;
    8025C2A0:    DSLL32     A2, A3, 0x1F      ; A2 = A3 << (0x1F + 32);
    8025C2A4:    DSLL       A1, A3, 0x1F      ; A1 = A3 << 0x1F;
    8025C2A8:    DSRL       A2, A2, 0x1F      ; A2 = A2 >> 0x1F;
    8025C2AC:    DSRL32     A1, A1, 0x0       ; A1 = A1 >> (0x00 + 32);
    8025C2B0:    DSLL32     A3, A3, 0xC       ; A3 = A3 << (0x0C + 32);
    8025C2B4:    OR         A2, A2, A1        ; A2 = A2 | A1;
    8025C2B8:    DSRL32     A3, A3, 0x0       ; A3 = A3 >> (0x00 + 32);
    8025C2BC:    XOR        A2, A2, A3        ; A2 = A2 ^ A3;
    8025C2C0:    DSRL       A3, A2, 0x14      ; A3 = A2 >> 0x14;
    8025C2C4:    ANDI       A3, A3, 0x0FFF    ; A3 = A3 & 0x0FFF;
    8025C2C8:    XOR        A3, A3, A2        ; A3 = A3 ^ A2;
    8025C2CC:    DSLL32     V0, A3, 0x0       ; V0 = A3 << (0x00 + 32);
    8025C2D0:    SD         A3, 0x0000 (A0)   ; mem_DWORD = A3;
    8025C2D4:    JR         RA                ; (Jump to RA 0x8025C19C)
    8025C2D8:    DSRA32     V0, V0, 0x0       ; V0 = V0 >> (0x00 + 32);

8025C19C: Jump back to increase counters, then loop until S0 == S5. (First)

    8025C19C:    ADDIU      S0, S0, 0x0001      ; (S0 = S0 + 0x0001) or (i++)
    8025C1A0:    ADDIU      S1, S1, 0x0007      ; S1 = S1 + 0x0007;
    8025C1A4:    BNE        S0, S5, 0x8025C164  ; Loop until S0 == S5. OR: for as long as (S0 < S5) / (i < sizeof(bytes))
    8025C1A8:    XOR        S3, S3, V0          ; S3 = S3 ^ V0;

Between First and Second Loop

Update S0 and S5 for 2nd loop. continue on to the 2nd loop.

    8025C1AC:    LW         A3, 0x0058 (SP)     ; load start address? SP + 0x58
    8025C1B0:    ADDIU      S0, S5, 0xFFFF      ; S0 = S5 - 1  (start address = stop address minus 1)
    8025C1B4:    SLTU       AT, S0, A3
    8025C1B8:    BNEZ       AT, 0x8025C20C      ; branch where?
    8025C1BC:    ADDIU      S2, SP, 0x0048      ; S2 = SP + 0x48 (would already be the same as S2? S2 = S2)
    8025C1C0:    ADDIU      S5, A3, 0xFFFF      ; S5 = A3 - 1 (stop address = start address minus 1)    

Second Loop

8025C1C4: Load Byte at Address: S0 (Second)

    8025C1C4:    LBU        T1, 0x0000 (S0)   ; T1 = bytes[i];
    8025C1C8:    LW         T3, 0x004C (SP)   ; T3 = (mem_DWORD & 0xFFFFFFFF);
    8025C1CC:    ANDI       T8, S1, 0x000F    ; T8 = S1 & 0x000F;
    8025C1D0:    SLLV       T9, T1, T8        ; T9 = T1 << T8;
    8025C1D4:    LW         T2, 0x0048 (SP)   ; T2 = (mem_DWORD >> 32);
    8025C1D8:    ADDU       T5, T9, T3        ; T5 = T9 + T3;
    8025C1DC:    SRA        T0, T9, 0x1F      ; T0 = T9 >> 0x1F;
    8025C1E0:    SLTU       AT, T5, T3        ; if (T5 < T3) { AT = 1; } else { AT = 0; }
    8025C1E4:    ADDU       T4, AT, T0        ; T4 = AT + T0;
    8025C1E8:    ADDU       T4, T4, T2        ; T4 = T4 + T2;
    8025C1EC:    SW         T4, 0x0048 (SP)   ; mem_DWORD = (T4 << 32) + T5;
    8025C1F0:    SW         T5, 0x004C (SP)   ; mem_DWORD = (T4 << 32) + T5;
    8025C1F4:    JAL        0x8025C29C        ; Jump and set RA = 0x8025C1FC (next instruction after OR) 
    8025C1F8:    OR         A0, S2, R0        ; A0 = S2 | R0

8025C29C: Same algorithm as before

    8025C29C:    LD         A3, 0x0000 (A0)   ; A3 = mem_DWORD;
    8025C2A0:    DSLL32     A2, A3, 0x1F      ; A2 = A3 << (0x1F + 32);
    8025C2A4:    DSLL       A1, A3, 0x1F      ; A1 = A3 << 0x1F;
    8025C2A8:    DSRL       A2, A2, 0x1F      ; A2 = A2 >> 0x1F;
    8025C2AC:    DSRL32     A1, A1, 0x0       ; A1 = A1 >> (0x00 + 32);
    8025C2B0:    DSLL32     A3, A3, 0xC       ; A3 = A3 << (0x0C + 32);
    8025C2B4:    OR         A2, A2, A1        ; A2 = A2 | A1;
    8025C2B8:    DSRL32     A3, A3, 0x0       ; A3 = A3 >> (0x00 + 32);
    8025C2BC:    XOR        A2, A2, A3        ; A2 = A2 ^ A3;
    8025C2C0:    DSRL       A3, A2, 0x14      ; A3 = A2 >> 0x14;
    8025C2C4:    ANDI       A3, A3, 0x0FFF    ; A3 = A3 & 0x0FFF;
    8025C2C8:    XOR        A3, A3, A2        ; A3 = A3 ^ A2;
    8025C2CC:    DSLL32     V0, A3, 0x0       ; V0 = A3 << (0x00 + 32);
    8025C2D0:    SD         A3, 0x0000 (A0)   ; mem_DWORD = A3;
    8025C2D4:    JR         RA                ; (Jump to RA 0x8025C1FC)
    8025C2D8:    DSRA32     V0, V0, 0x0       ; V0 = V0 >> (0x00 + 32);

8025C1FC: Jump back to increase/decrease counters, then loop until S0 == S5. (Second)

    8025C1FC:    ADDIU      S0, S0, 0xFFFF      ; (S0 = S0 + 0xFFFF) or (i--)
    8025C200:    ADDIU      S1, S1, 0x0003      ; S1 = S1 + 0x0003;
    8025C204:    BNE        S0, S5, 0x8025C1C4  ; Loop until S0 == S5. OR: for as long as (S0 > S5) / (i > 0)
    8025C208:    XOR        S4, S4, V0          ; S4 = S4 ^ V0;

S3 and S4 should contain the component checksums.

After Second Loop

It immediately saves some values to memory:

checksum_save:
    lw    t6,96(sp)
    sw    s3,0(t6) ; goldeneye and banjo-tooie use
    sw    s4,4(t6) ; s3 and s4 raw as a u64 value
    lw    ra,44(sp)
    lw    s5,40(sp)
    lw    s4,36(sp)
    lw    s3,32(sp)
    lw    s2,28(sp)
    lw    s1,24(sp)
    lw    s0,20(sp)
    jr    ra
    addiu sp,sp,88

Checksum verification occurs in a different area of memory, and also is where the final XOR is located:

    8033C040:    SW    V1, 0x0000 (A1)    ; Save stored checksum to V1?
    8033C044:    LW    T7, 0x0028 (SP)    ; Load S3 to T7?
    8033C048:    LW    T6, 0x002C (SP)    ; Load S4 to T6?
    8033C04C:    SW    RA, 0x0014 (SP)    ; Save return address?
    8033C050:    XOR   T8, T6, T7         ; T8 = T6 ^ T7? XOR T6 and T7.
    8033C054:    BEQ   V1, T8, 0x8033C068 ; Branch if EQual. If V1===T8, go here.