# -------------------------------------------------------------------
# GPU/DSP                       (c) Copyright 1995 KKP & Nat!
# -------------------------------------------------------------------
# These are some of the results/guesses that Klaus and I (Nat!) found
# out about the Jaguar. Since we are not under NDA or anything from
# Atari we feel free to give this to you for educational purposes
# only.
#
# Please note, that this is not official documentation from Atari
# or derived work thereof (both of us have never seen the Atari docs)
# and Atari isn't connected with this in any way.
#
# Please use this informationphile as a starting point for your own
# exploration and not as a reference. If you find anything innacurate,
# missing, needing more explanation etc. by all means please write
# to us:
#     nat@zumdick.rhein-main.de
# or
#     kkp@gamma.dou.dk
#
# If you could do us a small favor, don't use this information for
# those lame flamewars on r.g.v.a or the mailing list.
#
# HTML soon ?
# -------------------------------------------------------------------
# $Id: risc_doc.txt,v 1.2 1995/12/08 18:48:09 nat Exp $
# -------------------------------------------------------------------

1 RISCy Buisness

The RISC's has 2 register banks of 32 registers each. There are
the Current and the Alternative register bank. Register R31 is the
stack pointer and normally R0 is initilized to 0 (Zero).

G_FLAGS and D_FLAGS are the status register. The first 5 bits
contains the Carry, Zero and Minus flags (I THINK).

Bit 3  IMASK (Interupt disable?)
Bit 5  Interupt level 1 enable
Bit 10 Interupt level 1 pending?
Bit 14 is the register bank selection.

Switch between the registerbanks is done:

        movei   #G_FLAGS,r1       ; Status flags
      or
        movei   #D_FLAGS,r1       ; Status flags
        load    (r1),r0
        bset    #14,r0
        store   r0,(r1)           ; Switch the GPU/DSP to bank 1

Register R31 is used by the RISC's as stack pointers. They only
seems to be used by interupts. See the section on interupts.


1.0 Move instructions

       move    Rn,Rm
       move    PC,Rn
       movei   #xxxxxxxx,Rn

       load    (Rn),Rn
       load    (Rm+n),Rn    * Rm = R14 | R15 !
       load    (Rm+Ri),Rn   * Rm = R14 | R15 !
       loadb   (Rn),Rn      * load byte
       loadw   (Rn),Rn      * Load word
       loadp   (Rn),Rn      * Load Phrase (GPU only)

       store   Rn,(Rn)
       store   Rn,(Rm+n)    * Rm = R14 | R15 !
       store   Rn,(Rm+Ri)   * Rm = R14 | R15 !
       storeb  Rn,(Rn)      * Store Byte
       storew  Rn,(Rn)      * Store Word
       storep  Rn,(Rn)      * Store Phrase (GPU only)

       moveta  Rn,Rn        * move to alternative register bank
       movefa  Rn,Rn        * move from alternative register bank

1.1 Logical Instructions

       or      Rn,Rn
       xor     Rn,Rn
       and     Rn,Rn

1.2 Bitoperation Instructions

       bset    #,Rn
       bclr    #,Rn
       btst    #,Rn

1.3 Shift Instructions

       shlq    #xx,Rn
       shrq    #xx,Rn

       sharq   #xx,Rn

       ror     Rn,Rn
       rorq    #xx,Rn


1.4 Arith. Instructions

       mult    Rn,Rn
       imult   Rn,Rn
       mmult   Rn,Rn
       imultn  Rn,Rn
       imacn   Rn,Rn
       resmac  Rn

       div     Rn,Rn          * exec seems to use max 4 i-cycles

       add     Rn,Rn
       addc    Rn,Rn          * add with carry
       addq    #xx,Rn
       addqt   #xx,Rn         * add quick, test result
       addqmod #xx,Rn         * add quick, take modulo

       sub     Rn,Rn
       subc    Rn,Rn          * add with carry
       subq    #xx,Rn
       subqt   #xx,Rn         * sub quick, test result
       subqmod #xx,Rn         * sub quick, take modulo

       cmp     Rn,Rn
       cmpq    #xx,Rn

       neg     Rn
       not     Rn
       abs     Rn


1.5 Program Structure Instructions

       jump     CC,(Rn)
       jump     (Rn)

       jr       CC,xxxxxx
       jr       xxxxxxx

       nop


1.6 Condition Codes

Condition codes CC can be any of

    CC (%00100) CS (%01000) EQ (%00010) MI (%11000)
    NE (%00001) PL (%10100) HI (%00101)  T (%00000).

They are used together with the jump instructions...


2.0 Restrictions


  'JR+MOVEI', 'JUMP+MOVEI', 'JR+JR', 'JR+JUMP', 'JUMP+JR', 'JUMP+JUMP',
  'JR+MOVE PC', 'JUMP+MOVE PC'

    IMULTN must be followed by a IMACN (Error displayed)
    IMACN must be followed by a IMACN or RESMAC (Error displayed)
    RESMAC must be preceed by a IMACN (Error displayed)
    a NOP is inserted between LOAD+MMULT and STORE+MMULT (Warning displayed).
    I don't know if LOADB+MMULT, LOADW+MMULT, LOADP+MMULT, ... are valid or
    not. Currently, it's not tested...


3.0 Instruction Encoding

Most instructions are only 2 bytes long. This means that 4
instructions can be pulled from RAM in one memory access!! This also
makes the code extremly tight, which is of optimum concern when
writing cartridge based programs.
One more than 2 byte instruction is the movei #x,Rn which have the
32 bit constant just after the 2 byte instruction, this saves a lot
of time and space over other RISC's. The ARM forexample uses 4 32 bit
instructions to fill a register (8 bit at a time). The SPARC 2 32 bit
instructions.


3.2 Instruction Encoding

All instructions uses the top 6 bits to encode the instruction.

The 2 operand instructions split the remainder of the 16 bits into
2 5 bit fields, the source (quick or register) and the destination
register.


3.2.1 The Implied Instructions

            iiiiii 0000000000
              /\       /\
              ||       |_============== room for extensions
              ||
              \`======================= instruction

The Implied instruction are nop!



3.2.2 The 1 Operand Instructions

            iiiiii 00000 ddddd  <====== destination register
              /\     /\
              ||     |_================ room for extensions
              ||
              \`======================= instruction

The one operand instructions are:

             neg    R0
             not    R1
             abs    R2
             resmac R3



3.2.3 The 2 Operand Instructions

Most instructions are 2 operand and follow this pattern. The register
to register instructions use the sssss and ddddd to specify source
and destination registers, as add r1,r0. In the quick to register
instructions the sssss field is used to hold a constant, as
asl #3,r0 where the constans is between 1 and 32 and moveq #0,d2
where the constant is between 0 and 31.

            iiiiii sssss ddddd  <====== destination register
              /\     /\
              ||     |_================ source (quick or register)
              ||
              \`======================= instruction

Examples of 2 operand instructions are:

            move  R1,R2
            bset  #31,R2
            etc...


3.2.4 The movei Instruction

The movei instruction are very special! This instruction is the
only 6 byte instruction, that is what makes it special.
The instruction word follow the general structure,

            iiiiii 00000 ddddd  <====== destination register
              /\     /\
              ||     |_================ room for extensions
              ||
              \`======================= instruction ($98)

but the 32 bit constant that is to be loaded into the destination
register followes the instruction

           +-------------+ +------------+ +------------+
           |   Movei Rn  | | Lower word | | Upper word |
           +-------------+ +------------+ +------------+


3.2.5 The Load & Store Instructions

Most instructions are 2 operand and follow this pattern.

            iiiiii ppppp ddddd  <====== destination register
              /\     /\
              ||     |_================ indirect register
              ||
              \`======================= instruction


3.2.5.1 Addressing Modes For Load/Store Byte/Word/Phrase

All load and store instructions support register indirect addressing,
which is written (Rn).
This means that you can load the memory location pointed to by a
register into yet another register (or the same).


3.2.5.2 Addressing Modes For Load/Store Longword

Together with the Load/Store longword instructions, there are other
addressing modes. Called:

  * indexed register indirect addressing, which is written (Rn+Rm),
  * register indirect addressing w. offset, which is written (Rn+xx),

In these addressing modes Rn _have_ to be R14 or R15!

fx:          load  (r1+r2),r0
             store r0,(r1+16)


3.2.5.3 Load/Store Phrase (GPU Only)

The GPU has an direct 64 bit (Phrase) interface to the main memory.
The loadp/storep instructions access this memorys full width.
The lower part of the phrase pointed to by the (Rp) goes from/to the
register specified, the other part of the phrase is in G_HIDATA
( 0xF02118 )  /* GPU Bus Interface high data  */

fx:          store r0,(rp)


3.2.6 The Program Control Instructions

Most Program Control instructions follow this pattern:

            iiiiii ddddd ccccc  <====== Condition Vector
              /\     /\
              ||     |_================ source (quick or register)
              ||
              \`======================= instruction

The ddddd field can either speify an offset (jr instruction) or
a register containing a absolute address (jump instruction), all
jump instructions are conditional.


3.2.6.1 Condition Codes

Condition codes ccccc can be any 5 bit vector, here are some ready
defined usefull values:

       CC (%00100    CS (%01000)   EQ (%00010)  MI (%11000)
       NE (%00001)   PL (%10100)   HI (%00101)  T  (%00000)

Examples of Program Control instructions:

            jump  mi, (r5)
            jr    ne, exit
            jr    t, loop   ; loop forever
            jr    loop      ; loop forever
            jump  (r5)


3.2.7 Modulo Aritimetics (DSP only)

The instructions addqmod and subqmod are modular with the size
specified in the D_MOD (0xF1A118) /* DSP Modulo Instruction Mask */
The mask register contains a mask that is applied to the register
after the add operation, as in the following two step

                    movei #%111111,r1
              loop: addq #4,r0
                    and  r1,r0
                    ...
                    jr loop

With the modulo register this can be written:

                    movei #D_MOD,r3
                    movei #~%111111,r1
                    store r1,(r3)
              loop: addq #4,r0
                    ...
                    jr loop

This is an obvious win! - you save a cycle each loop!

Instructions are
                   subqmod, addqmod


3.2.8 Multiply and Multiply-Accumulate Locations
D_MACHI         EQU     BASE+$1A120     ; DSP Hi byte of MAC operations


3.2.9 Matrix Multiply Locations
D_MTXC          EQU     BASE+$1A104     ; DSP Matrix Control
D_MTXA          EQU     BASE+$1A108     ; DSP Matrix Address
G_MTXC          EQU     BASE+$2104      ; GPU Matrix Control
G_MTXA          EQU     BASE+$2108      ; GPU Matrix Address


3.2.10 Divide Locations
D_REMAIN        EQU     BASE+$1A11C     ; DSP Division Remainder
D_DIVCTRL       EQU     BASE+$1A11C     ; DSP Divider control
G_REMAIN        EQU     BASE+$211C      ; GPU Division Remainder
G_DIVCTRL       EQU     BASE+$211C      ; GPU Divider control


3.2.20 Strange things:


3.3 Instruction numbers

   Mnemonic  Mode     iiiiii sssss ddddd  hex  Notes
   --------------------------------------------------------------
     ADD     Rs,Rd    000000 sssss ddddd  $00
     ADDC    Rs,Rd    000001 sssss ddddd  $04
     ADDQ    #q,Rd    000010 qqqqq ddddd  $08  q is [32, 1..31]
     ADDQT   #q,Rd    000011 qqqqq ddddd  $0C  q is [32, 1..31]

     SUB     Rs,Rd    000100 sssss ddddd  $10
     SUBC    Rs,Rd    000101 sssss ddddd  $14
     SUBQ    #q,Rd    000110 qqqqq ddddd  $18  q is [32, 1..31]
     SUBQT   #q,Rd    000111 qqqqq ddddd  $1C  q is [32, 1..31]

     NEG     Rd       001000 00000 ddddd  $20

     AND     Rs,Rd    001001 sssss ddddd  $24
     OR      Rs,Rd    001010 sssss ddddd  $28
     XOR     Rs,Rd    001011 sssss ddddd  $2C

     NOT     Rd       001100 00000 ddddd  $30

     BTST    #q,Rd    001101 qqqqq ddddd  $34  q is [0..31]
     BSET    #q,Rd    001110 qqqqq ddddd  $38  q is [0..31]
     BCLR    #q,Rd    001111 qqqqq ddddd  $3C  q is [0..31]

     MULT    Rs,Rd    010000 sssss ddddd  $40
     IMULT   Rs,Rd    010001 sssss ddddd  $44
     IMULTN  Rs,Rd    010010 sssss ddddd  $48
     RESMAC  Rd       010011 00000 ddddd  $4C
     IMACN   Rs,Rd    010100 sssss ddddd  $50

     DIV     Rs,Rd    010101 sssss ddddd  $54

     ABS     Rd       010110 00000 ddddd  $58
                                          $5C
     SHLQ    #q,Rd    011000 qqqqq ddddd  $60  q is [32, 1..31]
     SHRQ    #q,Rd    011001 qqqqq ddddd  $64  q is [32, 1..31]
                                          $68
     SHARQ   #q,Rd    011011 qqqqq ddddd  $6C  q is [32, 1..31]
     ROR     Rs,Rd    011100 sssss ddddd  $70
     RORQ    #q,Rd    011101 qqqqq ddddd  $74  q is [32, 1..31]

     CMP     Rs,Rd    011110 sssss ddddd  $78
     CMPQ    #q,Rd    011111 qqqqq ddddd  $7C  q is [0..31]

DSP  SUBQMOD #q,Rd    100000 qqqqq ddddd  $80  q is [32, 1..31]
                                          $84
     MOVE    Rs,Rd    100010 sssss ddddd  $88
     MOVEQ   #q,Rd    100011 qqqqq ddddd  $8C  q is [0..31]
     MOVETA  Rs,Rd    100100 sssss ddddd  $90
     MOVEFA  Rs,Rd    100101 sssss ddddd  $94
     MOVEI   #c32,Rd  100110 00000 ddddd  $98  followed by a 32 bit const

     LOADB   (Rp),Rd  100111 ppppp ddddd  $9C
     LOADW   (Rp),Rd  101000 ppppp ddddd  $A0
     LOAD    (Rp),Rd  101001 ppppp ddddd  $A4
GPU  LOADP   (Rp),Rd  101010 ppppp ddddd  $A8  Load Phrase
     LOAD  (R14+n),Rd 101011 nnnnn ddddd  $AC
     LOAD  (R15+n),Rd 101100 nnnnn ddddd  $B0

     STOREB  Rs,(Rp)  101101 ppppp sssss  $B4
     STOREW  Rs,(Rp)  101110 ppppp sssss  $B8
     STORE   Rs,(Rp)  101111 ppppp sssss  $BC
GPU  STOREP  Rs,(Rp)  110000 ppppp sssss  $C0  Store Phrase
     STORE Rs,(R14+n) 110001 nnnnn sssss  $C4
     STORE Rs,(R15+n) 110010 nnnnn sssss  $C8

     MOVE    PC,Rn    110011 00000 ddddd  $CC

     JUMP    CC,(Rd)  110100 ddddd ccccc  $D0
     JR      CC,q     110101 qqqqq ccccc  $D4

     MMULT   Rs,Rd    110110 sssss ddddd  $D8
                                          $DC
                                          $E0
     NOP              111001 00000 00000  $E4

     LOAD (R14+Ri),Rd 111010 iiiii ddddd  $E8
     LOAD (R15+Ri),Rd 111010 iiiii ddddd  $EC
    STORE Rs,(R14+Ri) 110001 iiiii sssss  $F0
    STORE Rs,(R15+Ri) 110010 iiiii sssss  $F4
                                          $F8
DSP  ADDQMOD #q,Rd    111111 qqqqq ddddd  $FC  q is [32, 1..31]


4.0 Interupts

The GPU and the DSP uses an interupt scheme that looks a lot like
the 56000's way of handling interupts.

In the lowest part of each processors memory the interupt entry
points are. There are 16 bytes for each interupt. This should
be enough to jump into the real interupt handler.

   ( If this works like on the 56000 it should be possible
     to have Fast Interupts, where the CPU returns automatically
     when the 16 bytes have been executed and no jump
     instructions have been executed ).

For the DSP it looks like this:

000000        Reset          (or DSP control interupt)
000010        I2S Interupt


Enable interupts I2S:

    movei   #D_FLAGS,r1     ; load dsp flags to go to bank 1
    load    (r1),r0
    bset    #5,r0           ; enable I2S interrupt
    store   r0,(r1)         ; save dsp flags


Handle i2s interupts

i2s_isr:
    movei   #D_FLAGS,r30            ; get flags ptr
    load    (r30),r12
    bclr    #3,r12          ; clear IMASK
    load    (r31),r28       ; get last instruction address
    bset    #10,r12         ; clear I2S interrupt
    addq    #2,r28          ; point at next to be executed
    addq    #4,r31          ; update the stack pointer
   ...
    jump    T,(r28)         ; and return
    store   r12,(r30)       ; restore flags

--- uugate 0.40 (SunOS 4.1.3)
 * Origin: Internet gateway [cindy] (2:200/427.1)
