Author: Don Marsh & Jim Luther
Year: 1989
... discusses techniques for fast animation on the Apple IIGS.
Apple II
Technical Notes
_____________________________________________________________________________
Developer Technical Support
Apple IIGS
#70: Fast Graphics Hints
Written by: Don Marsh & Jim Luther September 1989
This Technical Note discusses techniques for fast animation on the Apple IIGS.
_____________________________________________________________________________
QuickDraw II gives programmers a very generalized way to draw something to the
Super Hi-Res screen or to other parts of Apple IIGS memory. Unfortunately,
the overhead in QuickDraw II makes it an unacceptable tool for all but simple
animations. If you bypass QuickDraw II, your application has to write pixel
data directly to the Super Hi-Res graphics display buffer. It also has to
control the New-Video register at $C029, and set up the scan-line control
bytes and color palettes in the graphics display buffer. Chapter 4 of the
Apple IIGS Hardware Reference documents where you can find the graphics
display buffer in memory and how the scan-line control bytes, color palettes,
and pixel data bytes are used in Super Hi-Res graphics mode. The techniques
described in this Note should be used with discretion--we do not recommend
bypassing the Apple IIGS Toolbox unless it is absolutely necessary.
Map the Stack Onto Video Memory
To achieve the fastest screen updates possible, you must remove all
unnecessary overhead from the instructions that perform graphics memory
writes. The obvious method for achieving sequential writes to the graphics
memory uses an index register, which must be incremented or decremented
between writes. These operations can be avoided by using the stack. Each
time a byte or word is pushed onto the stack, the stack pointer is
automatically decremented by the appropriate amount. This is faster than
doing an indexed store followed by a decrement instruction.
But how is the stack mapped onto the graphics memory? The stack can be
located in bank $01 instead of bank $00 by writing to the WrCardRAM auxiliary-
memory select switch at $C005. Bank $01 is shadowed into $E1 by clearing bit
3 of the Shadow register at $C035. Under these conditions, if the stack
pointer is set to $3000, the next byte pushed onto the stack is written to
$013000, then shadowed into $E13000. The stack pointer is automatically
decremented so the stage is set for another byte to be written at $E12FFF.
Warning: While the stack is mapped into bank $01, you may not call
any firmware, toolbox or operating system routines (ProDOS
8 or GS/OS). Don't even think about it.
Unroll All Loops
Another source of overhead is branching instructions in loops. By "straight-
lining" the code to move up a scan-line's worth of memory at one time, branch
instructions are avoided. Following is an example of this technique.
lda |164,y ; accumulator is 16 bits for
pha ; best efficiency
lda |162,y
pha
lda |160,y
pha
In this example, the Y register is used to point to data to be moved to the
graphics memory, and hard-coded offsets from the Y register are used to avoid
register operations between writes.
Hard-Code Instructions and Data
In desperate circumstances, it is necessary to remove overhead from the
previous code example. This can be accomplished by hard-coding pixel data
into your code instead of loading pixel values from a separate data space and
transferring them to the graphics memory (as in the example). If you are
writing an arbitrary pattern of three or fewer constant values to the screen,
for example, the following method is the fastest known:
lda #val1
ldx #val2
ldy #val3
pha ; arbitrary pattern of pushes
phx
phy
phy
phx
In cases where many different values must be written to the screen, pixel data
can be written to the screen using immediate push instructions:
pea $5389 ; some arbitrary pixel values
pea $2378
pea $A3C1
pea $39AF
Your program can generate this mixture of PEA instructions and pixel data
itself, or it could load pixel data that already has PEA instructions
intermixed (thus increasing the data size by one half).
Be Aware of Slow-Side and Fast-Side Synchronization
Estimating execution speed by counting instruction cycles is always a
challenging task on the IIGS, but it is particularly tricky when one is
writing to the graphics memory. The graphics memory resides in the side of
the IIGS system controlled by the 1 MHz Mega II chip, which means that during
all writes to this memory, the fast side of the system controlled by the Fast
Processor Interface (FPI) chip must be synchronized with slow side of the
system controlled by the Mega II, even if the system is running code at full
native speed. This synchronization is performed automatically and
transparently by the FPI in the IIGS, and it isn't normally of concern to the
programmer. Animation programmers must worry about synchronization delays,
however, because slight changes in graphics update code may change the
frequency of these delays, and hence the speed of the program. In practical
terms, this means that one loop writing data to the graphics memory may run at
the same speed as a second loop with a higher cycle count.
A careful analysis of the synchronization problem leads to the following
tables, which are useful as a rough estimate of the speed attained by
different pieces of code. Each entry is based on the number of cycles
consumed during consecutive write instructions. For example, a series of PEA
instructions requires five cycles for each 16-bit write. A short PHA
instruction followed by a branch requires six cycles for each 8-bit write.
Fast Cycles per Write (byte) Actual Speed (microseconds/byte)
________________________________________________________________
3 to 5 2.0
6 to 8 3.0
9 to 11 4.0
________________________________________________________________
Fast Cycles per Write (word) Actual Speed (microseconds/word)
________________________________________________________________
4 to 6 3.0
7 to 8 4.0
9 to 11 5.0
________________________________________________________________
The times given in the tables apply only if the same number of fast cycles
separate each consecutive write operation. The first write operation in a set
of write instructions usually takes longer than subsequent writes, because the
potentially long synchronization operation is accomplished at that time.
Unpredictable delays caused by memory refresh slow things down further,
although refresh delays byte-wide writes more often than word-wide writes.
Therefore, it is usually preferable from a speed standpoint to use word-wide
writes to the graphics memory.
For more information on synchronization cycle timing within the IIGS, see
Chapter 2 of the Apple IIGS Hardware Reference and Apple IIGS Technical Note
#68, Tips for I/O Expansion Slot Card Design.
Use Change Lists
The timing data given in the preceding section shows that it is not possible
to perform full-screen updates in the time it takes the IIGS to scan the
entire screen. In fact, it would be difficult to update more than one-sixth
of the screen in one scan time. Therefore, it is necessary to update only
those pixels which have actually changed from the previous frame of animation.
One method of doing this is to precalculate the pixels which change by
comparing each frame against the preceding frame. For interactive animation,
fast methods must be developed for predicting which areas of the screen must
be updated (a determination of the exact pixels might require more computation
than the actual update would require).
Using the Video Counters
To achieve "tear-free" screen updates, it is necessary to monitor the location
of the scan-line beam when writing to graphics memory. As described in Apple
IIGS Technical Note #39, Mega II Video Counters, the VertCnt and HorizCnt Mega
II video counter registers at $C02E-C02F allow you to determine which scan
line is currently being drawn.
By using only the VertCnt register and ignoring the low bit of the 9-bit
vertical counter stored in HorizCnt, you can determine within 2 scan lines
which scan line is currently being drawn. The VertCnt video counter contains
the number of the current scan line divided by two, offset by $80. For
example, if the scan-line beam was currently refreshing either scan line four
or five, VertCnt would contain $82 (4/2 + $80 or 5/2 + $80). Vertical
blanking happens during VertCnt values $7D through $7F and $E4 through $FF.
Clever updates can modify twice as many pixels on the screen by sacrificing
some smoothness, running at 30 frames per second instead of 60. The technique
is as follows:
1. Wait for the scan line beam to reach the first scan line.
2. Start updates from the top of the screen, being careful not to
pass the scan line beam.
3. Continue updates while the scan line beam progresses toward the
bottom of the screen, then goes into vertical blanking, then
restarts at the top of the screen.
4. Finish the update before the scan line beam catches the update
point.
Careful use of this method allows a frame to be updated during two scans of
the screen instead of just one. If you are not sufficiently careful, tearing
results.
Note: The Apple IIGS main logic board Mega II-VGC registers and
interrupts are not synchronous to the Apple II Video Overlay Card
video and therefore should not be used for time synchronization
with the Apple II Video Overlay Card video output. However, they
can be used for time synchronization with the Apple IIGS video
output. See the Apple II Video Overlay Card Development Kit for
more information.
Interrupts
It is not possible to support interrupts while sustaining a high graphics
update rate, unless jerkiness or tearing is acceptable. Be aware that many
system activities such as GS/OS and AppleTalk depend on interrupts and do not
function if interrupts are disabled.
Further Reference
_____________________________________________________________________________
o Apple IIGS Firmware Reference
o Apple IIGS Hardware Reference
o Apple II Video Overlay Card Development Kit
o Apple IIGS Technical Note #39, Mega II Video Counters
o Apple IIGS Technical Note #40, VBL Signal
o Apple IIGS Technical Note #68, Tips for I/O Expansion Slot Card Design