<HTML>
<! $Id: op.txt,v 1.18 1996/02/11 23:00:43 nat Exp $>
<HEAD>
<TITLE>OBJECTPROCESSOR</TITLE>
</HEAD>
<body background="jaguar.gif">
<PRE>
# -------------------------------------------------------------------
# OBJECTPROCESSOR                  (c) Copyright 1995-1996 Nat! & KKP
# -------------------------------------------------------------------
# These are some of the results/guesses that Klaus and Nat! found
# out about the Jaguar. Since we are not under NDA or anything from
# Atari we feel free to give this to you for educational purposes
# only.
#
# Please note, that this is not official documentation from Atari
# or derived work thereof (both of us have never seen the Atari docs)
# and Atari isn't connected with this in any way.
#
# Please use this informationphile as a starting point for your own
# exploration and not as a reference. If you find anything inaccurate,
# missing, needing more explanation etc. by all means please write
# to us:
#    nat@zumdick.rhein-main.de
# or
#    kkp@gamma.dou.dk
#
# If you could do us a small favor, don't use this information for
# those lame flamewars on r.g.v.a or the mailing list.
#
# HTML soon ?
# -------------------------------------------------------------------
# $Id: op.txt,v 1.18 1996/02/11 23:00:43 nat Exp $
#
# If there are two theories I put the more likely one first.
# -------------------------------------------------------------------
Things to know about the Objectprocessor (OP):

-1    Imagine a phrase being an entity of 64 bits (or 8 bytes for that
      matter).

0.    The object list is a linked list.

1.    The object list is traversed by the object processor for
      each! scanline.

2.    The Objectprocessor probably works like this:

      Whenever a new scanline needs to be displayed, the
      objectprocessor provides a linebuffer to the videosystem. While
      the videosystem is busy displaying this, the OP readies the next
      scanline. (It uses a doublebuffering strategy) It does
      this by traversing the objectlist and interpreting each
      object in sequence. Each object has per scanline the chance
      ONCE to fill the linebuffer. It fills the linebuffer at
      a specified horizontal position for a specified width. The data
      in the linebuffer is always overwritten (except when the
      Read-Modify-Write bit is set). If the active object has the
      transparent bit set, it will not overwrite values in the
      linebuffer when its source pixel has the value zero.
      The 'transparency' check is done before looking up the pixel's
      color in the CLUT (1 - 256 color modes).

2.1   The sooner a object appears in the list the more
      in the background it appears. The linebuffer is initalized with
      the linebufferbackgroundcolor (BG) before the objectprocessor
      starts filling the linebuffer.

      One may also assume that the OP normally traverses the
      linebuffer from left to right, except when the horizontal flip
      bit is set. (Very useful information indeed! (har) )

      Each bitmap object is made up of pixels. These pixels can be either
      contain the color itself (direct) as in CrY and TrueColor modes
      or be an index into a Colorlookuptable (indirect).

2.2   We assume that the OP writes into the linebuffer locally, so that
      the objectdata is read over the bus, but not written into the
      linebuffer over the bus (which would be way evil)

2.3   If all these theories are true, then the OP has on the average one
      scanline time to prepare the linebuffer.

2.4   The videosystem can deal with 16bit RGB/Crycolor and 24bit RGB
      pixels, the size of the pixels the OP writes into the linebuffer
      and pulls out of the CLUT, depends on the pixeltype chosen for
      the videosystem.

2.5   The object in the objectlist are *modified* by the OP. This means
      that an object list is only good for one frame. You need to
      continually refresh your object list each VBLANK.

3.    The last object must be a STOP object.

4.    The Objectlist must be doublephrase aligned. This means
      that the lower nybble of the address must be zero.
      (Maybe this is wrong and it is just object alignment that you
      should take into account)

5.    The address of the image of an object must be (as expected)
      phrase aligned (zero in the lower 3 bits)

6.    There are five different objects that the Objectprocessor knows
      about. These are:

      1. Bitmapped Object
      2. Scaled bitmapped object
      3. GPU-Object (Calls the GPU to do the displaying ?? )
      4. Branchobject
      5. Stopobject (marks the end of the object list)

      The objects have different sizes. The minimum size of an object
      is a "phrase". Also note the alignment constraints.

      Object type    Number     Size in phrases  Alignment in phrases
      -------------------------------------------------------------
      BIMAP          0           2                       2
      SCALE          1           3                       4 !!
      GPU            2           1                       1
      BRANCH         3           1                       1
      STOP           4           1                       1


7.    To keep the Objectprocessor from fetching data (and wasting bandwidth)
      during the VBLANK you usually put two branch objects at the beginning
      of the display list, that branch to the stop object if the first
      displayable scanline has not been reached or the last displayable
      scanline has already been displayed.

8     The OP usually hogs the bus, when doing data transfers.


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
9                        Your friendly OP-registers
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

RW: OLP ($F00020)
~~~~~~~~~~~~~~~~~
 32       28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |              low_word               |           high_word               |
  +-------------------------------------+-----------------------------------+

low_word:
high_word:

   The address of the object list. The 32 bit address is word swapped.
   So you gotta store it like this:

            move.l   #objlist,d0
            swap     d0
            move.l   d0,OLP


RW: OB ($F00010)
~~~~~~~~~~~~~~~~
 32       28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^--------+
0 |                                 objectdata                              |
  +-------------------------------------------------------------------------+

 64       60        56        52       48       44       40       36        32
  +--------^---------^---------^--------^--------^--------^--------^--------+
1 |                                 objectdata                              |
  +-------------------------------------------------------------------------+

objectdata:
   This is used to pass data/pointer to the GPU when using a GPU object.


RW: OBF ($F00026)
~~~~~~~~~~~~~~~~~
 32       28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^-----+--+
  |                                                                      |f |
  +----------------------------------------------------------------------+--+

flag (f):   
   The object processor flag. The STOP objects data field is copied
   here. Furthermore I you can hook up an IRQ (Level 2) to the lowest bit,
   which can in turn serve to interrupt the GPU and the 68K (and possibly
   also the DSP). This can be used to generate HBLANK-like interrupts, 
   although the STOP does seldom occur in the blanking period! 


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
10             This is what a branch object looks like:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Phrase #0:

   63      56        48       40       32        24       16       8    3   0
  +--------^---------^-----+---^--------^--------+--------+--+-----^----+---+
  |        unused          |      Link-address   | ununsed|CC|   VCnt   |011|
  +------------------------+---------------------+--------+--+----------+---+
                               42..........24             15.14 13...3   2..0

   The branch objects are used to compare the current scanline
   with the value stored in the branch object. Depending on the
   branch instructions comparison mode, the branch is taken
   either on < == != or >. The taken branch taken uses the information
   from the Linkinfo and branches to the phraseindexed
   object. If the comparison fails it simply examines and handles
   the next object in the list.


   VCnt:    
      This is the value you compare the vertical scanline
      counter with (VC). For CC code 10 the operation goes:

      if( object->YCnt < VC)
         goto object->link;


   Conditioncodes (CC):

       Values     Comparison/Branch
     --------------------------------------------------
        000       Branch on equal            (VCnt==VC)
        001       Branch on less than        (VCnt>VC)
        010       Branch on greater than     (VCnt<VC)
        011       Branch if OP flag is set

      Note that 000 is a branch always if VCnt == $7FF (very strange!)


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
11                This is what a stop object looks like:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Phrase #0 (1 of 1):

 63       56        48        40       32        24       16       8    3   0
  +--------^---------^---------^--------^--------^--------^--------^--+-+---+
  |                             unused                                |f|100|
  +-------------------------------------------------------------------+-+---+

flag (f):
   The value of 'f' is copied to the OP status register. This might then
   in turn possibly trigger an IRQ (HBLANKing??=


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
12.               This is what a bitmap object looks like:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Phrase #0 (1 of 2):

 63       56        48        40       32        24       16       8    3   0
  +--------^---------^-----+------------^--------+--------^--+-----^----+---+
  |        data-address    |     Link-address    |   Height  |   YPos   |000|
  +------------------------+---------------------+-----------+----------+---+
      63 .............43        42.........24      23....14    13....3   2.0
        21 bits           19 bits        10 bits     11 bits  3 bits


   data-address:  Pointer to the bitmap      ***DESTROYED BY THE OP***
   link-address:  Pointer to the next object
   height:        Height in pixels
   y-pos:         Vertical position          ***DESTROYED BY THE OP***
   type:          Object type


   data-address:  bits 63-43
      An address is a memory address in terms of phrases. To get the
      byte address you have to shift it up by 3. (or in this example
      to get the data-address you would fetch the upper lword with
      the 68K and do):

         move.l   (a0),d0     ; fetch it  (bits 63-32)
         moveq #11,d1         ; or some other less lame way
         lsr.l d1,d0          ; shift it down for phrase address
         lsl.l d1,d0          ; shift it up for byte address

   link-address:  bits 42-24
      The link address strings the object list together. So it really
      is a linked list, not just an array. OK an array would have
      been better and the link could have been a number of phrases
      to skip. It misses the upper two bits two form a proper full
      24 bit address. This means that objects must reside in the
      lower 4 MB.

   height:
      The height of the object is also stored in the first phrase.
      This is the number of pixels an object has in it vertical extent.

   ypos:
      The YPos is predictably the vertical position of the object on
      the screen. The vertical position is the halfline vertical
      position.

    Theory 1:
      Like on the Falcon the screen is divided into two horizontal
      halflines. Except for really wide screens in excess of 1024
      pixels horizontally, you always stay in the first halfline.
      (That's why its eleven bits, and the height is only 10 bits.)
      A problem with this theory is, that the Xpos field is 12 bits
      anyway...

    Theory 2:
      This means that in interlace mode this is the "true"
      vertical position on the screen. In non-interlaced modes
      (non-flicker)  modes, you should multiply your Y-Pos by two and
      stuff that into the object.
      (That's why its eleven bits, and the height is only 10 bits.)

   type:
      Lastly the object type indicates with a 0 (000) that this object
      is a normal non-scaled bitmap object.


Phrase #1 (2 of 2):

   63      56        48       40       32        24       16       8       0
  +--------+---------+------+--^--+-----^---+----^----+---+---+----^--------+
  | unused | 1stpix  |flags | idx | iwidth  | dwidth  | p | d |    <XPos>   |
  +--------+---------+------+-----+---------+---------+---+---+-------------+
    63...55 54....49  48..45 44.38  37...28    27..18 17.15 14.12   11.....0
               6bit  4bit  7bit   10bit   10bit  3bit  3bit     12bit

   Curiously there seem to be some unused bits in the top half of
   this second phrase. Anyway starting from the left:

   1stpix:     Pixels to skip
   flags:      How to handle the source data
   index:      Index into the CLUT
   iwidth:     Width of the image
   dwidth:     Offset to the next line of the image
   pitch:      Increment for the Datapointer
   depth:      Pixeldepth of the bitmap
   x-pos:      Horizontal position of the object


   1stpix:  bits 54-49
      this is a field of 6 bits that contains the number of
      'bits' to skip before fetching the first pixel. This must be
      used whenever your bitmap data isn't phrase aligned.
      Maybe most often used for CLUT modes.
      You get the value you want to write here by calculating:

      pixelindex * bits_per_pixel (f.e. 8 for 256 color mode)


   flags:   bits 48-45
      You can tell the Objectprocessor the way it should
      handle the display data. These are the values you set here:

             Bit3          Bit2          Bit1             Bit0
      ----------------------------------------------------------------
            Release     Transparent  ReadWriteModify  Horizontal Flip

      A few guesses as to what each flag does:

      Horizonal flip:      
         Lets the Objectprocessor run its path from the other end 
         of the spritedata, which should effectively flip your 
         sprite data.

      ReadWriteModify:  
         The object processor reads the the pixel from the line 
         buffer does something with the bitmap pixel value and the 
         linebuffer pixel value and stores the result back into the 
         linebuffer.

         Theory 1. For Crycolor the lower byte of the bitmap pixel 
         value is sign extended and added to the lower byte of the 
         linebuffer pixel value, thereby increasing or decreasing 
         (depending on the sign) the intensity of the linebuffer 
         pixel. This is a 'saturating add' meaning that you don't 
         wrap around, but subtractions stick at 0 and additions stick 
         at 255.
         The cryhues (upper byte) are mangled even more strangely, 
         the effect could (with the right values) be like looking 
         through a colored glass (your bitmap object with the 
         RMW-flag set) onto the background (the other bitmap objects 
         below it)
         This might be similiar to what happens when gouraudshading. 
         Refer to the blitter docs.

         Theory 2. Both values are simply added together


      Transparent:      
         When the source pixel is zero, this pixel will not be written. 
         This is the way to achieve transparent sprites with the GPU. 
         (Both CLUT and non-CLUT pixels)

      Release:    
         If cleared then the OP 'hogs' the bus for the time it takes to 
         fetch the scanline data of the object. If this bit is set, 
         then the bustime is shared with other processors. If you have 
         lotsa interrupts going, this might be worthwhile.
         Should apparently NOT be used on objects with more than 8 
         bitplanes, probably because then the OP might glitch. 

   index (idx):   bits 44-38
      Index into the ColorLookUpTable (CLUT)
      This information is only used for 1 - 2 or 4 bitplane objects,
      to determine the offset in the CLUT to use.

         1 bitplane           2 bitplane       4 bitplane
      -------------------------------------------------------
           iiiiiiii          iiiiii0         iiiii00

      The value is shifted left once and then used as an index into
      the CLUT. Note that in 2 + 4 bitplane modes not all bits are in
      used, because the lower bits are replaced with the pixel value.

      For example in 4-bits-per-pixel mode pixel #7 and an idx value 
      of 64 gives you an index of (64*2)+7 -> 135

      So you preload the CLUT with the colors you want to use, for
      example green at index #241. When you want to display a small
      green arrow on the screen (as a pointer) for example you set
      your object to transparent, and the index to 120. When the
      object pointer fetches a set pixel, it will write the green
      value into the linebuffer.

   iwidth:     bits 37-28
      Tell the OP how many *phrases* to draw in each line. This is 
      the actual number of phrases to draw, not the horizontal index 
      to index the next line (dwidth). This is probably not just  
            #pixels_to_draw / bits_per_pixel, 
      but rather the number of phrases the object spans. If a 32bit 
      object spans two phrases you should enter a two here.

   dwidth:  bits 27-18     
      The horizontal phrase offset the OP should use to index to the 
      next line. If you data is laid out in consecutive strips of 
      horizontal data like this:

      screen <destination>:
         00000000000
         11111111111
         22222222222
         33333333333

      memory <source>:
         00000000000111111111112222222222233333333333

      then this will be just the same as <iwidth>. But if your data
      is laid out like this:

      00000000000xxxxx11111111111xxxxx22222222222xxxxx33333333333xxxxx

      you should set <dwidth> to the proper offset so that adding
      <dwidth> to the phrase-address will bring you to the next line.
      (This might be useful for 'horizontally scrolling' objects).

   pitch (p):  bits 17-15
      If you so desire you can organize your bitmap data in even 
      stranger ways than one would think possible. With this value 
      you control the datapointer that the OP uses to traverse your 
      bitmap data. This value is added to the datapointer after the 
      last fetch. If you use a 0 you will be always fetching the same 
      phrase over and over again. Normally you set <pitch> to 1, to 
      advance through memory contigously.
      Could be used with 2 screens and one common Z-buffer.

   depth (d):  bits 14-12  
      The number of bits of each pixel. This specifies the rez of the 
      object. You have the choice between direct pixel modes (16 or 
      24/32 bits) and indirect (CLUT) pixel modes. Note that using 
      transparency effectively reduces the number of available colors 
      by one (color #0).

      Values:

         0  1 bits per pixel  2 colors       CLUT
         1  2 bits per pixel  4 colors       CLUT
         2  4 bits per pixel  16 colors      CLUT
         3  8 bits per pixel  256 colors     CLUT
         4  16 bits per pixel 65536 colors   CRY
         5  24 bits per pixel 16 Mio Colors  TrueColor
         6  unused
         7  unused

   xpos:    bits 11-0    
      The horizontal position of the object on the screen (or in the 
      linebuffer if you will)


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
13.            This is what a scaled bitmap object looks like.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Phrase #0 (1 of 3):

 63       56        48        40       32       24       16        8    3   0
  +--------^---------^-----+---^--------^--------+--------^--+-----^----+---+
  |       data-address     |    Link-address     |  Height   |   YPos   |001|
  +------------------------+---------------------+-----------+----------+---+
      63 .............43         42..........24   23 ..... 14 13 ..... 3 2.0
        21 bits           19 bits        10 bits     11 bits  3 bits

   Except for the type, which is different, this is just
   the same as the first phrase of the bitmap (non-scaled)
   object.


Phrase #1 (2 of 3):  This is the same as the the 'bitmapped' object


Phrase #2 (3 of 3):

   63      56        48       40       32        24       16       8       0
  +--------^---------^---------^--------^--------+--------+--------+--------+
  |                  unused                      | remain | VScale | HScale |
  +----------------------------------------------+--------+--------+--------+
                                                   23...16  15...8   7...0

  remainder:   Keeps the VScale remainder ***DESTROYED BY THE OP***
  v-scale:     Vertical scaling factor
  h-scale:     Horizontal scaling factor


  The scale is a fractional representation, using 3 bits for the integer
  part and 5 bits for the fractional part. Or in ASCII-Graphics:

   76543210 00100000 or 0x20 is 1.0
   iiifffff 00010000 or 0x10 is 0.5

  The remainder is used by the objectprocessor for the vertical scaling,
  as a memory place. You should initialize it to 0.5 for best results,
  although in a lot of democode its initialized to 1.0.


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
14.                     The elusive GPU-object
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Phrase #0 (1 of 1):

 63       56        48        40       32       24       16        8    3   0
  +--------^---------^---------^--------^--------^--------^--------^----+---+
  |                           datafield                                 |010|
  +---------------------------------------------------------------------+---+

   The GPU gets an interrupt, it is believed that the OP is not halted 
   because of this action. You might want to stuff some information
   into the datafield, which the GPU could then read from the OP registers.
   The GPU can then be used to control OP program flow using OBF (F00026)
   and branch object condition 3.


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
15 You can also look at the object in terms of C-structs, that's how
   they'd look like.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

/* DON'T USE THESE BITFIELDS WITH ANYTHING ELSE THAN A
   ***GOOD*** C-COMPILER AND A MOTOROLA PROCESSOR
*/


   #define byte   unsigned char
   #define word   unsigned short
   #define lword  unsigned long
   #define phrase unsigned long long


   typedef struct
   {
       lword   data:21;
       lword   link:19;
       word height:10;
       word ypos:11;
       word type:3;
   } bitmap_obj_phrase_0;


   typedef struct
   {
      word  unused:9;
      word  firstpix:6;
      word  flags:4;
      word  index:7;
      word  iwidth:10;
      word  dwith:10;
      word  pitch:3;
      word  depth:3;
      word  x_pos:12;
   } bitmap_obj_phrase_1;


   typedef struct
   {
      lword   unused:24;
      word    remainder:8;
      word    v_scale:8;
      word    h_scale:8;
   } scale_obj_phrase_2;


   typedef struct
   {
       lword   unused:21;
       lword   link:19;
       word    conditioncode:2;
       word    unused:8;   ;; maybe index to register ?
       word    ypos:11;
       word    type:3;
   } branch_obj_phrase_0;


   typedef struct
   {
       phrase  unused:61;
       word type:3;
   } stop_obj_phrase_0;

   typedef struct
   {
       phrase  unknown:61;
       word type:3;
   } gpu_obj_phrase_0;


   typedef struct
   {
      stop_obj_phrase_0 p0;
   } stop_obj;


   typedef struct
   {
      branch_obj_phrase_0  p0;
   } branch_obj;


   typedef struct
   {
      gpu_obj_phrase_0  p0;
   } gpu_obj;


   typedef struct
   {
      bitmap_obj_phrase_0  p0;
      bitmap_obj_phrase_1  p1;
   } bitmap_obj;


   typedef struct
   {
      bitmap_obj_phrase_0  p0;
      bitmap_obj_phrase_1  p1;
      scale_obj_phrase_2   p2;
      /* need one padding phrase ? */
   } scale_obj;




SMALL DISCUSSION:
   Since the object processor walks the object list for each
   scanline, you should consider the following:

   If you have 64 bitmaps objects in your object list and a
   vertical rez of 240 lines going and a refreshrate of 60Hz
   the Objectprozessor is pulling

   60 hz * 240 lines * 64 objects * 2 phrases =  1.8 Mio phrases/s
   ~ 14.7 Mio bytes/s  for the object processor list alone!
      (ca. 14% of the systems bandwidth)


   If you figure you're using 128x128x16bit sprites fully visible,
   you're doing:

   128x128*16bits/64bits = 4096 phrases a sprite
   64 sprites in 60hz    = 3840 sprites
   yields 15728640 phrases/s or 120 Mbytes/s

   So it is fairly easy to unknowingly saturate the bus with
   a nice object list. (TEST THIS, possibly the OP is smart
   enough to detect, when the scanline is needed by the Video
   chip and stops processing the object list)

   It should be obvious that non-"truecolor" sprites still make
   lotsa sense, when you're using the OP heavily.

   It would have been better in our opinion, if Atari had used a
   small 2-Kbit hitbuffer (or single bit Z-Buffer) and reversed
   the object order, so that the nearest object comes first and
   the background last in the object list.

   With such a slightly more complicated scheme,the OP could
   run at a rather constant:

      hrez * vrez * refresh * average_bits_per_pixel
      ---------------------------------------------- phrases/s
               64


   If it is true that the OP has on average one scanline time to
   prepare the linebuffer, we can do a quick estimate how complex
   such a line can be:


   NTSC
         30 Hz refresh rate (2 refreshs a 1/60s)
         525 lines frame

   Therefore 525*30=15750 lines/s 

      13.3 mio phrase/s / 15750 lines ~ 825 phrases / scanline
   or ~ 3300 truecolor pixels / scanline

   this means that on a 320 pixel display you can have approximately
   ten layers of overlapping truecolor parallax (sans sprites)!!
   Or if you have a 320 pixel background, you can have about 80 
   32 bit wide truecolor sprites on the same scanline.

   Its doubtful that you'll reach these limits...

   Since the OP with a 320x200 rez is pulling data only on 200 lines
   of 525 scanlines, you can use up (without producing display errors)
   only ~40% of the Jaguars bus resources this way. Nice!


NEEDED STUFF:
   Need to document the logic setting up objects, that cross
   boundaries (especially the scaled bitmaps)
</PRE>
<HR>
<address><a href="mailto:nat@zumdick.rhein-main.de">Nat! (nat@zumdick.rhein-main.de)</a></address>
<address><a href="mailto:kkp@gamma.dou.dk">Klaus (kkp@gamma.dou.dk)</a></address>
<P>
$Id: op.txt,v 1.18 1996/02/11 23:00:43 nat Exp $
</BODY>
</HTML>
