
This document describes the XAA (XFree86 Acceleration Architecture),
which is the new acceleration interface for the SVGA server (but not
limited to the SVGA server).

This code is not at all dependent on the SVGA server, but does
assume linear addressing at > 8bpp. It might be extendable to an
mi-based set up for configurations that can't use cfb. There are
still configurations around that need banked support for 16bpp.

To use the new acceleration interface, write low-level functions
like the sampledrv.c and ark_accel.c and call the ChipInitAccel()
function before screen initialization (from FbInit in a SVGA
driver, for example).

You're welcome to comment, test, debug, or add to this code.

Have fun...

Harm Hanemaayer
H.Hanemaayer@inter.nl.net

Here's a list of known problems (roughly in order of importance). If you
can confirm a problem using the lastest version, please do so.

- I've seen crashes when using Netscape related to stipple functions.
  These might be caused by the "fall-back" logic still getting it
  wrong.  It seems to be triggered by a call of
  vga8256FillRectTransparentStippled32. Fixed by mod 186?

- The "NonTE" text acceleration triggers core dumps (related to an invalid
  fall-back function scheme in ValidateGC). It might also trigger lock-ups
  (which would point towards a problem in NonTE text color expansion).
  These functions are currently disabled.

- Color expanded (monochrome) 8x8 pattern is may not be working correctly
  yet in all cases (not fully tested).

- The disabled non-terminal emulator font acceleration is suspect, I
  don't think it handles horizontally overlapping characters correctly
  (no visible evidence yet) in the xf86DrawNonTETextScanline function. 
  I don't know enough about the X font parameters to correctly
  implement it.

- The SCANLINE_PAD_BYTE and SCANLINE_NO_PAD text transfer code for CPU
  to screen color expansion has not been fully tested, nor has the
  FIXED_BASE support.

- The pattern fill primitives are taken to have the same graphics operation
  restrictions (planemask, rop etc) as ScreenToScreenCopy.

- The support for TRIPLE_BITS_24BPP has improved, but it has not yet
  been fully tested.

- For color expansion implementation of stipples the graphics operation
  restriction of color expansion are not honoured, but instead the
  CopyArea ones are used. This is now sort-of fixed, but it has not been
  tested in relevant cases.

- Instead of not accelerating GXinvert operations that would normally
  access the source, we could instead to a GXinvert FillRectSolid.

When the server crashes, run 'gdb -c core XF86_SVGA' and print a
back-trace ('backtrace').

Change Log:

218. As well as GXinvert, also avoid GXclear, GXnoop, and GXset.
217. Don't accelerate functions that use source bitmap data (such as text,
     stipples, bitmaps) when the raster-op is GXinvert.
216. Truncate pixel values to pixel depth in ValidateGC.

XFree86 3.2v
215. Rotate monochrome patterns stored in video memory in opposite
     direction (David Bateman). I doubt whether this is correct.
214. Add FullPlanemask field to xf86AccelInfoRec, and use it for planemask
     checks.
213. Use GC alu instead of cfb reduced "rrop" when checking raster-op
     restrictions.
212. Add secondary restriction flag hack for stippled rectangles to
     correctly handle different restrictions for pixmap cache and color
     expansion stipple acceleration.
211. Move macros for graphics operations restriction checks to from
     xf86gcmisc.c to xf86local.h.
210. Fix the check for server resets in xf86initac.c and xf86scrin.c (use
     serverGeneration instead of xf86Resetting).

XFree86 3.2u
209. Fix monochrome pattern stored in video memory with PROGRAMMED_ORIGIN
     and SCREEN_ORIGIN (Corin Anderson).

XFree86 3.2s
208. Respect CapStyle when using TwoPointLine for a non-clipped segment.
207. Fix line clipping when hardware clipping is used with multiple
     clipping regions (Xavier Ducoin).
206. Add a hack to counter cfb cheating in PolyGlyphBlt when it does not
     call ValidateGC when changing the foreground color to fill in the
     background (affecting RGB_EQUAL).
205. Remove left-over fall-back tile function setting code in xf86gcmisc.c
     that may have caused problems.
204. At ValidateGC time, take note of background color changes for
     evaluation of RGB_EQUAL restrictions.
203. Add support for a monochrome pattern with PROGRAMMED_BITS that
     needs to be rotated in software, so that all possible monochrome
     pattern variations are now supported.
202. Add NO_TEXT_COLOR_EXPANSION flag.
201. Fix bitmap (CopyPlane1ToN) color expansion acceleration at 24bpp with
     TRIPLE_BITS_24BPP defined.
200. Fix support for ScanlineScreenToScreenColorExpand with
     TRIPLE_BITS_24BPP defined.
199. Fix the case of a monochrome 8x8 pattern stored in video memory.
     The code was not consistently assuming that the patternx
     coordinate is in units of "bits" (David Bateman).
198. Invalidate the pixmap cache when VT-switching back (suggested by
     Andrew Vanderstock).
197. If color expansion is used for stipples, say so in the start-up
     messages.
196. Indicate in start-up messages whether 8x8 pattern fill is actually
     usable.
195. Better RGB_EQUAL checks for text acceleration with TRIPLE_BITS_24BPP
     (David Bateman).
194. Do not accelerate lines with non-FillSolid fill style. Stippled lines
     were rendered incorrectly as solid lines.

XFree86 3.2r
193. Check RGB_EQUAL for text acceleration with TRIPLE_BITS_24BPP
     (David Bateman).
192. Honour RGB_EQUAL when deciding CopyPlane1To24 acceleration in
     xf86plane.c.
191. Fix bugs in handling of left edge in CPU-to-screen color expansion
     of bitmaps.
190. When the server resets, don't execute the start-up benchmarks.
189. When the server resets, don't execute the main part of the
     xf86GCInfoRec and xf86AccelInfoRec initialization code, which
     depends on default values for some fields.
188. When there is any kind of accelerated stippled rectangle fill, also
     use it for stippled spans.
187. Add 10x10 CPU-to-screen color expansion benchmark.
186. Remove left-over broken fall-back stipple function setting code
     in xf86gcmisc.c, probably fixing crashes.
185. Implement "no_pixmap_cache", and new "xaa_benchmark" and
     "xaa_no_color_exp" server flags.
184. Only print detailed messages when xf86Verbose is TRUE.
183. Implement color expansion acceleration of stipple-filled rectangles
     in xf86stip.c. Requires SCANLINE_PAD_DWORD for CPU-to-screen color
     expansion, and does not support TRIPLE_BITS_24BPP.
182. Fix SCANLINE_NO_PAD CPU-to-screen color expansion by not defining
     the flag definition as zero (Koen Gadeyne).
181. Add LEFT_EDGE_CLIPPING_NEGATIVE_X color expansion flag.
180. Fix potential bug in handling of LEFT_EDGE_CLIPPING.
179. Add new file xf86tables.c with byte expansion tables for
     TRIPLE_BITS_24BPP.
178. Support TRIPLE_BITS_24BPP for bitmap color expansion, with the
     exception of non-DWORD scanline padding of CPU-to-screen color
     expansion.
177. Fix a probable bug in CPU-to-screen bitmap color expansion in
     MSB-first mode without left edge clipping.
176. When checking for hardware pattern usage for tiles, prefer the
     color expand (monochrome) pattern.
175. Improve TRIPLE_BITS_24BPP support with color expansion enabled for
     text using screen-to-screen color color expansion or CPU-to-screen
     with SCANLINE_PAD_DWORD.
174. Add TRANSPARENCY_GXCOPY graphics operation flag, and take it into
     consideration for ScreenToScreenCopy with transparency.

XFree86 3.2q
173. Fix infinite loop in CPU_TRANSFER_BASE_FIXED color expansion
     (Xavier Ducoin).
172. Fix color expansion benchmark when TRIPLE_BITS_24BPP is defined
     (David Bateman).
171. When using the monochrome pattern, use an existing cache entry when
     the stipple is the same but the colors are different.
170. Fix bugs in pattern handling code.
169. Fix a bug in the MSB-first version of the Pentium-optimized text
     bitmap transfer functions (David Bateman).
168. Fully implement detection of tiles that only use two colors in order
     to use a monochrome (color-expand) hardware pattern.
167. Potentially fix the case of rotated monochrome patterns stored video
     memory.
166. Change start-up messages a little.
165. Add support for 8x8 hardware pattern with SCREEN_ORIGIN in addition
     to PROGRAMMED_ORIGIN, and take into account the bit order when
     PROGRAMMED_BITS is defined (Radek).
164. Mark pixmaps that are found to be unsuitable for caching.
163. Make caching of transparent stipples possible for chips that don't
     have ScreenToScreenCopy with transparency but do have a color-expand
     pattern fill that supports transparency.
162. Add low-level benchmarks for 10x10 pattern fill.
161. Use 64-bit access on DEC alpha in CPU-to-framebuffer bandwidth
     benchmark.
160. Add the HARDWARE_PATTERN_MONO_TRANSPARENCY flag to further
     differentiate between mono pattern and regular color expansion.
159. Use depth instead of bitsperpixel in planemask check for line
     rectangles.
158. Reorganize the "cfbGetLongWidthAndPointer" function.
157. Support the HARDWARE_PATTERN_PROGRAMMED_ORIGIN for non-color
     expanded patterns (including patterns that must be aligned on a 64
     pixel boundary) and for color-expanded patterns that are stored
     in video memory.
156. Various fixes for the cfb8 (non-vga256) support.
155. At start-up, display a message when no acceleration primitives
     are defined.
154. Add ScratchBufferBase field to support scanline screen-to-screen
     color expansion without a linear framebuffer.
153. Support multiple buffers for scanline screen-to-screen color
     expansion. Adds the PingPongBuffers field.

XFree86 3.2n
152. Add HARDWARE_PATTERN_BIT_ORDER_MSBFIRST flag to differentiate between
     mono hardware pattern and regular color expansion.
151. Fix missing fields in xf86defs.c.
150. Delay the actual initialization of the pixmap cache until general XAA
     initialization. A server InfoRec field and pixmap cache memory boundary
     fields are added to the xf86AccelInfoRec. This also eliminates the
     dependency of the XAA code on vga256 (the SVGA server).
149. Lift the SCANLINE_PAD_DWORD requirement in xf86initac.c for the
     enabling of text color expansion.
148. Add functions in xf86expblt.c for whole text bitmap transfer in the
     case of BYTE padding or no padding at the end of scanlines, and
     support this in xf86text.c.
147. Add a cfb8-based layer to support stand-alone servers not using
     vga256.
146. Enable fixed-base CPU-to-screen color-expansion for bitmap and TE text.
145. Add FIXEDBASE support to color expansion functions in xf86expblt.c.
144. Add the HARDWARE_PATTERN_PROGRAMMED_BITS and HARDWARE_PATTERN_
     PROGRAMMED_ORIGIN flags, and implement 8x8 mono pattern code
     used when both flags are set.
143. Don't use scanline-byte-padded CPU to screen color expansion since
     it doesn't work.
142. Use MSB-first versions of Pentium-optimized text transfer functions
     when required.
141. Finally fix the flawed fall-back function schemes, potentially fixing
     crashes associated with some span and rectangle fills. Still not stable.
140. More heavily unroll the CPU to framebuffer benchmark code (mainly
     for the Cyrix 6x86).

XFree86 3.2g
139. Enable 24bpp CopyPlane acceleration (Dirk).

Version of 17 December 1996 (XFree86 3.2f)
138. Enable the Pentium-optimized text transfer functions for 6 and 8-pixel
     wide fonts.
137. Fix a problem with accelerated horizontal and vertical lines clashing
     with framebuffer lines.
136. In some places, fix the "NO_PLANEMASK" check to only check bits up to
     the actual depth (rather than PMSK).
135. Avoid recursive xf86miFillRectStippledFallBack call.
134. When the new HARDWARE_PATTERN_MOD_64_OFFSET flag is set, do use
     the hardware pattern when the framebuffer width guarantees
     corrects alignment.
133. Avoid unaligned accesses in xf86expblt.c for DEC Alpha. Not tested.
     What's a mem_barrier? Do we need them?
132. Do not use 8x8 pattern stipple fill at 24bpp because there's no
     pixmap-to-pixmap CopyPlane1ToN primitive.
131. Allow CopyPlane1ToN to be accelerated at 24bpp.
130. Remove messages printed when xf86miStippleFallBack is called.
129. Add FillRectSolid graphics operation restriction flags to the line
     draw restrictions in xf86initac.c, fixing a problem with planemask
     restrictions not being honoured for some lines.
128. Add some Pentium-optimized text bitmap transfer functions in
     xf86txtblt.s, but they are not used yet.

XFree86 3.2e
127. ImakefileBPP renamed to Imakefile.BPP

Version 0.4f (XFree86 3.2d) (28 November 1996)
126. Fix typo in xf86frect.c, pixmap-cache re-enabled (Alan).
125. Disable Non-TE text acceleration.
124. Add text fall-back functions in xf86gcmisc.c.
123. Fix ImakefileBPP.

Version 0.4e (XFree86 3.2c) (24 November 1996)
122. Fix some problems with drawing of Non-TE of text strings.
121. Fix compilation problem in xf86spans.c (Takaaki Nomura).
120. Add sanity checks in PolyFillRect and FillSpans for <= 0 specified
     rects or spans.
119. Make sure the cfb fall-back text functions are initialized correctly.
     This problem showed up when non-TE text acceleration was added.
118. Add the TWO_POINT_LINE_ERROR_TERM flag, but don't implement it yet.
117. Implement a stipple bitmap scanline function in xf86expblt.c for
     future use in color expansion stipple acceleration.
116. Implement color expansion text acceleration for non-terminal emulator
     fonts. Fix non-TE text scanline function in xf86expblt.c.
115. Add NO_SYNC_AFTER_CPU_COLOR_EXPAND flag.
114. When cfb MatchCommon is succesful in ValidateGC, make sure the
     devPrivate.val is still correct. Fixes memory leak.
113. Fix bug in xf86PolyFillRect. This does not fix the pixmap cache.

Version 0.4d (XFree86 3.2a) (18 November 1996)
112. Prepare for integration into source tree (hw/xfree86/xaa/*).
111. Move the declaration of xf86PixmapIndex into xf86initac.c.
110. Screeninit functions renamed; vgabpp.h renamed to xf86scrin.h.
109. Cosmetic changes in preparation for integration into source tree.
108. Rename xf86gc.h to xf86xaa.h, and modify some long filenames.
107. In ValidateGC, correctly handle the case of the drawable of an
     on-screen GC being changed to a pixmap.
106. Fix a problem with byte-padded CPU to screen color expansion in
     xf86bitmap.c.
105. Initialize CPUToScreenColorExpandRange to default value of 64K
     if it is not defined.
104. Fix missing cfb stipple function mappings in vga256map.h.

Version 0.4c (15 November 1996)
103. Really fix the CPU-to-screen color expansion benchmark.
102. Disable the accidently enabled debugging on-screen pixmap cache.

Version 0.4b (15 November 1996)
101. Add the UsingVGA256 flag to the xf86AccelInfoRec, and use this
     to adjust the address pointer for low-level line fall-backs for
     vga256 so that the non-bank checking versions will be used when
     linear addressing is enabled (implemented in cfb8GetLongWidthAndPointer
     in xf86im.c).
100. Add general line acceleration for chips that can only accelerate
     horizontal/vertical lines using FillRectSolid and for chips that
     only have TwoPointLine without fool-proof hardware clipping.
 99. Fix crash with line rectangles when the raster-op is not GXcopy.
 98. Change the 8x8 pattern benchmark a little.
 97. Add an aligned screen copy (scroll) test to the low-level benchmarks,
     and remove the transparent color expansion tests.
 96. Fix related type warnings in xf86bitmap.c (Radek).
 95. Really fix the initialization of CPUToScreenColorExpandEndMarker
     (Radek). 
 94. Fix the initialization of CPUToScreenColorExpandEndMarker in
     xf86initacl.c
 93. Fix problems with small patterns when using 8x8 hardware pattern fill.
 92. Fix for CopyPlane1to32 (resolves olvwm crash at 32bpp).
 91. Fix the CPU to screen color expansion benchmark (Radek).
 90. Use the accelerated FillPolygonSolid from the GCInfoRec in ValidateGC.
 89. In xf86orect.c, use cfbGCGetPrivate().
 88. Add monochrome 8x8 tile detection (not used yet).
 87. Fix external byte_reversed declaration in xf86expblt.c and
     xf86pcache.c (fixes problem with 8x8 color expanded pattern).
 86. Fix xf86expblt.c inline asm for different OSs (Takaaki Nomura).
 85. Support xf86bench.c on different OSs (Akio Morita).
 84. Fix a bug in the color expanded 8x8 pattern code.
 83. In ReduceTileToSize8, don't give up when not using 8bpp.
 82. Cosmetic changes to sampledrv.c.
 81. Improve the start-up messages.
 80. In the benchmark routines, avoid memset().
 79. Lift the LSBFIRST requirement for buffered screen-to-screen color
     expansion.

Version 0.4a (7 November 1996)
 78. Correct xf86AccelInfoRec.BitsPerPixel for 24bpp.
 77. Add ONLY_LEFT_TO_RIGHT_BITBLT for chips that only support screen-to-
     screen BitBLTs with xdir = 1, and support this in CopyArea.
 76. Make decisions in InitAccel about whether specified CPU-to-screen
     color expansion memory range is large enough.
 75. Add FramebufferWidth (equivalent to infoRec.displayWidth).
 74. Add CPUToScreenColorExpandRange, which is taken into account after
     each scanline in text and bitmap color-expansion operations
     (CPUToScreenColorExpandEndMarker is derived from it).
 73. Only allow text CPU-to-screen color expansion with SCANLINE_PAD_DWORD
     defined.
 72. If the CPUToScreenColorExpandBase isn't initialized, use the
     start of the framebuffer as color expansion base address.
 71. Fix bug in DrawNonTETextScanline (a function not yet used).
 70. Add ONLY_TWO_BITBLT_DIRECTIONS for chips that only support screen-to-
     screen BitBLTs with xdir = ydir, and support this in CopyArea.
 69. Add VIDEO_SOURCE_GRANULARITY_DWORD flag for color expansion.
 68. Fix cfbPushPixels8 name mapping for vga256. This was probably causing
     most of the stability problems.

Version 0.4 (5 November 1996)
 67. Add support for color-expanded 8x8 hardware patterns (untested).
 66. Fix a bug in FillSpansSolid that caused some spans to be drawn
     at the wrong position (Radek).
 65. In FillSpansSolid, correctly handle the case of no spans remaining
     after clipping.
 64. When doing the raster-op precomputations for cfb in ValidateGC,
     don't clear the flag indicating that the raster-op has changed
     since we must still evaluate accelerated functions.
 63. Correctly modify devPrivate.val when new GC ops are created in
     ValidateGC.
 62. Reduce tiles to 8x8 pixels if possible.
 61. Set the USE_TWO_POINT_LINE flag if appropriate.
 60. Reduce stipples to 8x8 pixels if possible.
 59. Add start-up benchmark timings for low-level primitives.
 58. Reduce stipples and tiles to 8 pixels wide if possible in order to
     use the 8x8 hardware pattern.
 57. Add 8x8 hardware pattern stipples.
 56. Provide a mechanism to call non-accelerated CopyPlane1toN
     directly. Adds CopyPlane1toNFallBack to GCInfoRec.
 55. Add the HARDWARE_PATTERN_ALIGN_64 flag (not supported yet).
 54. Debug the 8x8 hardware pattern.
 53. Guarantee a different transparency color instead of using the GC
     background color when caching transparent stipples.
 52. Add support for 8x8 hardware patterns, and use them for small
     tiles.
 51. Add BitsPerPixel to xf86AccelInfoRec.
 50. Assign fall-back function to xf86AccelInfoRec.ImageWrite if
     necessary for convenience.
 49. Lift the VIDEO_SOURCE_GRANULARITY_PIXEL requirement for indirect
     screen-to-screen text color expansion.
 48. Add an extra set of wide slots to the pixmap cache. Disabled.
 47. Honour ONE_RECT_CLIPPING flag when checking line drawing function.
 46. Support WriteBitmap when only non-transparent color expansion is
     supported.

Version 0.3b (31 October 1996)
 45. Update vga256 patch (vga.c) to force GC validation after a VT-switch.
 44. Support ImageText when only transparent color expansion is supported.
 43. Add PolyText color expansion for TE fonts.
 42. Use devPrivate.val to signal status of GC ops, and use this to
     modify them when required.
 41. Create new GC ops when GC ops are still pointing to a defaults
     structure when modifying GC ops in ValidateGC.
 40. As a stop-gap measure, reset all GC ops and pretend everything in
     the GC has changed when a switch-away is detected in ValidateGC.
 39. Update sampledrv.c.
 38. Disable the pixmap cache if the memory range is wrongly specified
     (Alan).
 37. Fix typo in pixmap cache initialization in sampledrv.c.
 36. Make xf86PolyRectangle use new line drawing functions for vertical
     lines.
 35. Add ErrorTermBits to the xf86AccelInfoRec for re-scaling Bresenham
     error terms when software clipping is used.
 34. Add flags to indicate whether PolySegment is supported with CapNotLast
     using TwoPointLine.
 33. Implement xf86PolyLine/Segment using BresenhamLine or TwoPointLine
     (untested).
 32. Add BresenhamLine and TwoPointLine primitives.
 32. Fix initial coordinates for color-expanded text.
 31. Move function prototypes from xf86gc.h to xf86local.h.
 30. Take into account source offset into first byte of bitmap scanline.
 29. Bitmap with buffered screen-to-screen color expansion now works.
 28. Fix prototypes for intermediate-level text functions.
 27. Fix source overrun problems in xf86DrawBitmapScanline.
 26. Intialize FramebufferBase in ScreenInit.
 25. Implement untested/unused functions for filling 24bpp pixels using
     8bpp mode color expansion in two passes.
 24. Fix color expand flag testing in xf86bitmap.c.

Version 0.3a (28 October 1996)
 23. If tiles are cached but not stipples (but stipples are accelerated),
     be aware of this in xf86PolyFillRect.
 22. Disable updating of the PolyGlyphBlt GC op in ValidateGC because of an
     unresolved problem showing up at > 8bpp.
 21. Implement a better understanding of how GC changes affect the selection
     of cfb and accelerated functions in ValidateGC.
 20. Fix the way ValidateGC handles cfb operations initialized with
     MatchCommon.
 19. Add xf86mapfuncs.h for local functions that are depth-mapped.
 18. Fix bugs accidently introduced into xf86initacl.c in version 0.3,
     which effectively disabled pixmap caching.
 17. Add untested, unoptimized CopyPlane1to24 (GXcopy, no planemask),
     for use with stipple caching. Doesn't work yet.

Version 0.3 (27 October 1996)
 16. Use framebuffer function for some vertical lines in PolyRectangle.
 15. Fix SaveAreas and RestoreAreas for vga256.
 14. Add PolyLine and PolySegment hooks to the xf86GCInfoRec.
 13. Fix missing cfbPolyFillArc mappings in vga256map.h.
 12. Fix MatchCommon call in ValidateGC. This fixes vga256 operation.
 11. Fix updating of GC ops for text functions during ValidateGC.
 10. Optimize the CopyPlane1to16/32 functions.
  9. Update the docs, and include a sample driver template.

Version 0.2 (26 October 1996)
  8. Re-enable CopyPlane hook.
  7. Fix typo that prevented CopyArea from being accelerated.
  6. Fix confusion over arguments of cfbBitBlt helper function.
  5. Call the correct depth-specific cfbBitBlt helper function.
  4. Fix the coordinates for the transparent stipple mi fall-back.
  3. Fix problem with zero-width spans in FillSpansAsRects.
  2. Disable CopyPlane hook because it doesn't work.

Version 0.1 (25 October 1996)
  1. First logged version. Implements solid filled rectangles, arcs,
     polygons, CopyArea, pixmap caching.
     Untested are line-drawn rectangles, color expansion text, color
     expansion stipple upload, bitmaps.


Overview of XAA
---------------

1.1

Some advantages of this new interface:

- Easier implementation of accelerated functions.
- More efficient use of accelerated functions.
- Code size reduction.
- Source code size reduction (less duplicated code).
- Greater test base for higher level code.
- Improvements can be beneficial for all drivers.

Disadvantages:

- More overhead in ValidateGC.
- Arguably more complex set of acceleration primitives.


1.2	Graphics Operation Flags

GXCOPY_ONLY

    Indicates that the graphics operation only allows a GXcopy
    raster-op (copy source). If this flag is not defined, the graphics
    operation is assumed to be supported with all 16 raster operations.

NO_PLANEMASK

    Indicates that the graphics operation does not allow a write
    planemask. All bits in a pixel are written.

ONE_RECT_CLIPPING

    Indicates that an accelerated function (usually a high-level one that 
    handles clipping) only accepts one clipping rectangle. This may be
    of use for line drawing. [It is only checked for line drawing]

RGB_EQUAL

    Indicates that the graphics operation requires that the red, green,
    and blue bytes of the foreground color (and background color, if
    applicable) are equal. This is useful for 24bpp when the graphics
    coprocessor is used in 8bpp mode, which is the often the case since
    most chips have no or only limited support for acceleration at
    24bpp. This way, many operations will be accelerated for the common
    case of "grayscale" colors. It should only be defined for 24bpp.

NO_TRANSPARENCY

    Indicates that the graphics operation does not handle transparency.
    This can be enabled for screen-to-screen copy.

NO_CAP_NOT_LAST

    Indicates that the graphics operation (typically PolySegment) does
    not support not drawing of the last pixel.

TRANSPARENCY_GXCOPY

    Indicates that, unlike the case of no transparency, when
    transparency is enabled only the GXcopy raster-op is allowed. This
    is valid only for ScreenToScreenCopy.


1.3	The AccelInfoRec

Flags

    This is a set of flags that controls some overall parameters for
    the acceleration code.

    BACKGROUND_OPERATIONS

    If enabled, the "simple" acceleration functions are not assumed to
    wait until the graphic coprocessor operation is finished. The
    generic acceleration functions will call Sync() when all operations
    have been done.

    PIXMAP_CACHE

    Use a pixmap cache for tiles and stipples, when the required
    low-level functions (such as ScreenToScreenCopy) are available.

    COP_FRAMEBUFFER_CONCURRENCY

    CPU access to the framebuffer can continue while a screen-to-screen
    coprocessor operation is being executed. This is taken advantage of
    in some color expansion routines when CPU-to-screen color expansion
    is not available, and potentially in some other places.

    DO_NOT_CACHE_STIPPLES

    Do not cache stipples, but instead use the CPU-to-screen color
    expansion routines for stipples. These routines have not yet been
    implemented.

    HARDWARE_CLIP_LINE

    When a general line has to be clipped, use hardware clipping
    (SetClippingRectangle must be defined, and clipping must only
    be active for the single following general line draw).

    USE_TWO_POINT_LINE

    Use two-point lines (TwoPointLine) instead of Bresenham lines for
    general lines. This flag is automatically set if appropriate. It
    should not be set in a driver in any case.

    TWO_POINT_LINE_NOT_LAST

    Indicates that TwoPointLine supports the notlast flag that indicates
    whether the last pixel should be drawn. If this is not supported,
    PolySegment cannot support the CapNotLast CapStyle.

    TWO_POINT_LINE_ERROR_TERM

    Indicates that TwoPointLine supports the optional error term flag
    and parameter that allows the initial error term to be provided
    for software clipped lines.

    HARDWARE_PATTERN_SCREEN_ORIGIN

    Indicates that the baseline origin for hardware 8x8 pattern fills
    is the top left corner of the screen, as opposed to the top left
    corner of the area to be filled. Note that an origin offset feature
    might still be supported.

    HARDWARE_PATTERN_TRANSPARENCY

    Indicates that the hardware 8x8 pattern fill supports transparency
    color compare (does not apply to mono pattern).

    HARDWARE_PATTERN_ALIGN_64

    Indicates that the 8x8 hardware pattern must be stored on a
    64-pixel boundary in video memory, and programmed pattern start
    location must be the start of such a pattern. In the absence of a
    programmable origin, this requires a lot more pre-rotated copies to
    be made, although they should still fit within a 128x128 cache
    area.

    HARDWARE_PATTERN_MOD_64_OFFSET

    Indicates that while the 8x8 hardware pattern must be stored
    aligned on a 64-pixel boundary, the programmed pattern start
    location can in fact include a multiple-of-8-pixels offset, which
    indicates the vertical offset into the pattern. This flag is
    mutually exclusive to HARDWARE_PATTERN_ALIGN_64. If you can also
    specify the horizontal offset, do not use this flag, but instead
    use HARDWARE_PATTERN_PROGRAMMED_ORIGIN.

    HARDWARE_PATTERN_PROGRAMMED_BITS

    Indicates that the monochrome (color expand) 8x8 pattern data must be
    programmed into registers, rather than stored in video memory. This
    is only supported in combination with the following flag.

    HARDWARE_PATTERN_PROGRAMMED_ORIGIN

    Indicates that the hardware pattern supports a programmable origin
    (x and y offsets into the pattern). This is supported for all three
    pattern storage types (programmed monochrome, monochrome in video
    memory and regular (pixel depth) in video memory).

    HARDWARE_PATTERN_BIT_ORDER_MSBFIRST

    Indicates that the monochrome 8x8 pattern data is in MSB-first bit
    order ("Windows-style").

    HARDWARE_PATTERN_MONO_TRANSPARENCY

    Indicates that the monochrome 8x8 pattern supports transparency
    (signalled by a background color equal to -1).

    HARDWARE_PATTERN_NOT_LINEAR

    Indicates that the 8x8 pattern data should not be stored linearly
    in video memory, but rather, as a tiled 8x8 pattern in the cache.

    ONLY_TWO_BITBLT_DIRECTIONS

    Indicates that ScreenToScreenCopy is only allowed with xdir = ydir
    (both -1 or both 1). BitBLTs are converted to smaller BitBLTs with
    supported directions if necessary.

    ONLY_LEFT_TO_RIGHT_BITBLT

    Indicates that ScreenToScreenCopy is only allowed with xdir = 1.
    BitBLTs are converted to smaller BitBLTs with supported directions
    if necessary.

    NO_SYNC_AFTER_CPU_COLOR_EXPAND

    Indicates that a Sync() is not required after a CPU-to-screen color
    expansion operation. Generally, this can be defined if host color
    expansion data is processed by the graphics chip in the same way as
    accelerated graphics commands (it uses the command FIFO).

    NO_TEXT_COLOR_EXPANSION

    Do not use color expansion to accelerate text. Define this if
    color expansion is slower than plain framebuffer for text (which
    might happen with scanline screen-to-screen color expansion,
    when there is little video memory bandwidth but the CPU to
    framebuffer bandwidth is decent).

Sync()

    This function should be defined if BACKGROUND_OPERATIONS is enabled
    (and also if any kind of CPU-to-screen color expansion is used). It
    should wait for all graphics coprocessor operations to finish. It
    also provides an opportunity to clean up the coprocessor state
    after a batch for commands.

SetupForFillRectSolid(color, rop, planemask)

    Sets up the color, raster-op and planemask for a solid rectangle
    fill. It is called once before a batch of "Subsequent" fill
    commands. Currently the restrictions for the operation are set up 
    with xf86GCInfoRec.PolyFillRectSolidFlags.

    Another acceleration commmand might still be executing when a SetUp
    function is called (assuming BACKGROUND_OPERATIONS). You may have
    to do a Sync() here. In the current XAA code this doesn't happen,
    but it might in the future.

SubsequentFillRectSolid(x, y, w, h)

    This actually fills a rectangle. When writing spans, h will
    be 1. It is usually called many times in a row.

    A key thing to notice here is that the function call overhead
    is "eaten" when performing coprocessor operations "in the
    background" (concurrently with CPU processing). If you need to
    wait for the previous operation to finish before sending the
    commands for the next one, you can do that in this function.
    Generally, you want to avoid querying the chip as much as
    possible since PCI read operations have a devastating effect
    on performance.

    This function is taken advantage of when filling solid rectangles,
    spans, polygons and arcs, and in other places.

SetupForScreenToScreenCopy(xdir, ydir, rop, planemask, transparency_color)

    Set up for a screen-to-screen BitBLT. The transparency color is -1
    when there is no transparency. Transparency is used when drawing
    transparent stipples from the pixmap cache. There are general flags
    (set in xf86AccelInfoRec.Flags) to indicate restrictions for the
    direction of the BitBLT (xdir, ydir); if restrictions exist, the
    generic code converts the blits to allowable blits. Currently the
    other restrictions for the operation are set up with
    xf86GCInfoRec.CopyAreaFlags.

SubsequentScreenToScreenCopy(x1, y1, x2, y2, w, h)

    Perform a screen-to-screen BitBLT. Again often there is
    a batch of commands. Note that (x1, y1) is always the top-left
    corner, regardless of the direction.

    It is used for screen-to-screen area copies (such as scrolling),
    and for the pixmap cache.

SubsequentBresenhamLine(x1, y1, octant, err, e1, e2, length)

    Draw a line using the Bresenham algorithm. This is the most common
    general line drawing feature that chips support. The octant consists
    of bitflags that are defined as follows (miline.h defines them):

    XDECREASING	    4	Draw from right to left (a.o.t. right to left).
    YDECREASING	    2	Draw from bottom to top (top to bottom).
    YMAJOR	    1	Y is the major axis (X is the major axis).

    The error terms are usually no bigger than a screen coordinate, but
    when software clipping is used, the error time might be too big; it
    is then rescaled according to the number of bits specified in
    ErrorTermBits. When HARDWARE_CLIP_LINE is defined,
    SetClippingRectangle must be defined. It seems to me that hardware
    clipping makes the implicit assumption that the chip can handle
    coordinates in the range [-37268, 32767]. Or are coordinates
    guaranteed to be on-screen? Anyway I think having the chip trace
    lines way off the screen does not sound like a good idea.

    There is no SetUp function. SetupForFillRectSolid is called before
    a batch of lines (this linked to the fact that horizontal lines
    are drawn with FillRectSolid; they should not be affected by
    hardware clipping).

SubsequentTwoPointLine(x1, y1, x2, y2, bias)

    Draw a line between (x1, y1) and (x2, y2); the last point is drawn. 
    This is found in some newer chips. It is taken advantage of. The 8
    lower bits of bias indicate whether 1 should be subtracted from the
    error term for each of the octants (e.g. bit 0 matches octant 0),
    it is not a requirement to support this parameter.  If bit 8
    (0x100) of bias is set, the last pixel should not be drawn (use
    TWO_POINT_LINE_NOT_LAST to indicate whether this flag is
    supported). This function requires hardware clipping.

    Note that horizontal lines are always drawn with FillRectSolid.

SetClippingRectangle(x1, y1, x2, y2)

    Set the hardware clipping rectangle. (x2, y2) is the inclusive
    right-bottom corner. Clipping should be active only for the first
    following line draw (BresenhamLine or TwoPointLine). This function
    is only used when HARDWARE_CLIP_LINE is enabled.

ImageWrite(x, y, w, h, src, srcwidth, rop, planemask)

    This hasn't been formalized yet. It used only to upload a tile
    to the pixmap cache (usually there's not much benefit compared
    to the unaccelerated version).

SetupForFill8x8Pattern(patternx, patterny, rop, planemask, trans_col)

    Set up for hardware 8x8 pattern fill (non-color expanded). If
    neither the HARDWARE_PATTERN_SCREEN_ORIGIN flag or the HARDWARE_
    PATTERN_PROGRAMMED_ORIGIN flag is set, patternx and patterny can be
    ignored. Otherwise, patternx and patterny just indicate the video
    memory address where the pattern is stored. The pattern is stored
    linearly in video memory. When the transparency color is -1 there
    is no transparency.

SubsequentFill8x8Pattern(patternx, patterny, x, y, w, h)

    Perform a hardware 8x8 pattern fill. If the flag HARDWARE_PATTERN_
    SCREEN_ORIGIN is set, patternx and patterny can be ignored;
    otherwise, patternx and patterny indicate the video memory address
    where the pattern is stored. However, if HARDWARE_PATTERN_
    PROGRAMMED_ORIGIN is set patternx and patterny define the origin
    offset into the pattern. Any rotation issues are handled by the
    generic code by generating pre-rotated copies of the pattern. The
    pattern address will always be at a multiple of 8 pixels offset
    from the start of a scanline (x will be a multiple of 8), unless
    the HARDWARE_PATTERN_ALIGN_64 is set. At the moment, setting
    HARDWARE_PATTERN_ALIGN_64 in the absence of HARDWARE_PATTERN_
    PROGRAMMED_ORIGIN will disable the use of this function, but this
    will change in a future version.

SetupFor8x8PatternColorExpand(patternx, patterny, bg, fg, rop, planemask)

    Set up for hardware color-expanded 8x8 pattern fill. If the flag
    HARDWARE_PATTERN_SCREEN_ORIGIN is set, or HARDWARE_PATTERN_
    PROGRAMMED_ORIGIN is set in the absence of HARDWARE_PATTERN_
    PROGRAMMED_BITS, patternx and patterny indicate the video memory
    address where the pattern is stored, which will be on an 8 byte
    boundary relative to the start of a scanline. Otherwise, patternx
    and patterny can be ignored. The pattern x-coordinate will be in
    units of "bits", that is, a byte offset of one relative to the
    start of the scanline is represented by a patternx value of 8.

    If HARDWARE_PATTERN_PROGRAMMED_BITS is set, patternx and patterny
    are overloaded as follows: patternx holds the first 4 lines (32
    pixels) of the pattern, with each byte (MSB-first bit order if the
    HARDWARE_PATTERN_BIT_ORDER_MSBFIRST flag is set) corresponding to a
    scanline of the pattern. patterny holds the second half of the
    pattern. This is the so-called "Windows-format".

    A background color of -1 indicates transparency (support of
    transparency is indicated by HARDWARE_PATTERN_MONO_TRANSPARENCY).

Subsequent8x8PatternColorExpand(patternx, patterny, x, y, w, h)

    Perform a hardware color-expanded 8x8 pattern fill. If the flag
    HARDWARE_PATTERN_SCREEN_ORIGIN is set, patternx and patterny
    can be ignored; otherwise, patternx and patterny indicate the
    video memory address where the pattern is stored. Any rotating
    issues are handled by the generic code by generating pre-rotated
    copies of the pattern. Again patternx is in "bit" or "stencil"
    units.

    If HARDWARE_PATTERN_PROGRAMMED_ORIGIN is set, patternx and
    patterny hold the origin (x and y offsets into the pattern).
    HARDWARE_PATTERN_SCREEN_ORIGIN may be defined additionally;
    in that case, the following is true: patternx and patterny will
    be the same for all "Subsequent" calls. You may only need to
    program the origin in the first Subsequent call.

ColorExpandFlags

    This selects the restrictions for color expansion operations. The
    flags are extended with a set of flags that is used to define
    details about the hardware-specific implementation of color
    expansion, as performed by the low-level color expansion functions. 
    The following extra flags are defined:

    SCANLINE_NO_PAD
    SCANLINE_PAD_BYTE
    SCANLINE_PAD_DWORD

	Defines the padding at the end of a scanline of monochrome
	data, which indicates the number of bits that is ignored by the
	graphics chip at the end of each scanline in multi-scanline
	color-expansion operations from the CPU to the screen. DWORD
        padding is preferred. These flags do not apply to screen-to-screen
        color expansion. Currently, not defining SCANLINE_PAD_DWORD will
        result in non-optimized and limited use of CPU-to-screen color
        expansion.

    CPU_TRANSFER_PAD_DWORD
    CPU_TRANSFER_PAD_QWORD

	Defines the total amount of data to be transferred in a
        multi-scanline CPU-to-screen color-expansion operation. Most
        chips pad to a DWORD boundary.

    CPU_TRANSFER_BASE_FIXED

	Indicates that the destination address for monochrome data for
	CPU-to-screen color-expansion is a fixed address, rather than
	a large range starting from the ColorExpandBase address.

    ONLY_TRANSPARENCY_SUPPORTED

	Indicates that the color expansion operations only work with
	transparency (bit 0 pixels are not written).
	
    TRIPLE_BITS_24BPP

	When enabled (must be in 24bpp mode), color expansion functions
	are expected to require three times the amount of bits to be
	transferred so that 24bpp grayscale colors can used with color
	expansion in 8bpp coprocessor mode. Each bit is expanded to 3
	bits when writing the monochrome data. When definining this
        flag, also define RGB_EQUAL.

    VIDEO_SOURCE_GRANULARITY_PIXEL
    VIDEO_SOURCE_GRANULARITY_BYTE
    VIDEO_SOURCE_GRANULARITY_DWORD

	This indicates the granularity of the horizontal source location
	specification for screen-to-screen color expansion operations.
	It is either one pixel, 8 pixels (a byte), or 32 pixels (a 32-bit
        word). If there's some kind of clipping mechanism available, pixel
        granularity is usually possible.

    BIT_ORDER_IN_BYTE_LSBFIRST
    BIT_ORDER_IN_BYTE_MSBFIRST

	This defines the order of bits within a byte. As far as X is
	concerned, it's best when the lowest-order bit corresponds to
	the leftmost pixel on the screen (this is the technically
	superior format), but many chips only support the "wrong" bit
	order (MSBFIRST).

    LEFT_EDGE_CLIPPING

	This indicates that CPU-to-screen color expansion operations
	support the left-edge clipping parameter, which indicates
	the number of pixels to skip at the left edge.

    LEFT_EDGE_CLIPPING_NEGATIVE_X

	This indicates that when the left-edge clipping parameter is
        specified, the x coordinate is allowed to be negative (while
        being on-screen when the parameter is actually added to it).
        At the moment, this flag is a requirement for CPU-to-screen
        color expansion acceleration of (large) stipples.

    Note that the regular graphics operations flags for raster-op,
    planemask and color restrictions are also valid. NO_TRANSPARENCY
    indicates that color expansion does not support transparency.

SetupForCPUToScreenColorExpand(bg, fg, rop, planemask)

    Set up for CPU-to-screen color expansion operations. This is used
    for writing bitmaps and text, and (not yet) stipples. When bg is
    equal to -1, the background (bits that are 0) is transparent.

SubsequentCPUToScreenColorExpand(x, y, w, h, skipleft)

    Perform a CPU-to-screen color expansion operation. The monochrome
    data will be transferred after this function has been called.
    Sync() is called when the data has been transferred. The optional
    skipleft parameter defines a number of pixels (0 - 7) to be skipped
    at the left edge (at the start of each scanline).

SetupForScreenToScreenColorExpand(bg, fg, rop, planemask)

    Set up for screen-to-screen color expansion operations. This will
    only be used when the storing of monochrome data in the pixmap (or
    font) cache is implemented.

SubsequentScreenToScreenColorExpand(srcx, srcy, x, y, w, h)

    Perform a screen-to-screen color expansion operation. scrx is in
    pixel units (8 corresponds to one byte offset).

SetupForScanlineCPUToScreenColorExpand(x, y, w, bg, fg, rop, planemask)

    Set up for a scanline-by-scanline color expansion operation from
    the CPU to the screen. This is not of much use (except when a chip
    is not compatible with supported methods of color expanding a whole
    bitmap). It's not used currently.

SubsequentScanlineCPUToScreenColorExpand()

    Color expand a scanline from the CPU to the screen. Many chips
    automatically add the pitch of the dislay to the destination
    address after a scanline has been written so that it doesn't need
    to be updated. Otherwise you'll need to keep track of the address.

SetupForScanlineScreenToScreenColorExpand(x, y, w, h, bg, fg, rop, planemask)

    Set up for a scanline-by-scanline color expansion operation from
    the screen to the screen (top-down). This is typically used for
    chips that don't have usable CPU-to-screen color expansion. It is
    taken advantage of for bitmaps, text, and (not yet) stipples.

SubsequentScanlineScreenToScreenColorExpand(srcaddr)

    This performs color expansion of a scanline from the screen
    (typically a scratch buffer) to the screen. To take advantage of
    this operation, ScratchBufferAddr and ScratchBufferSize must be
    defined (> 0), and either linear addressing must be used or
    ScratchBufferBase must be defined. Being able to support
    COP_FRAMEBUFFER_CONCURRENCY is a win here. The srcaddr is the
    linear framebuffer address in (non-expanded) pixel units. The real
    address is (srcaddr / 8). When TRIPLE_BITS_24BPP is defined,
    srcaddr is in non-expanded 8bpp pixel units.

    In addition, PingPongBuffers defines the number of alternating
    buffers used. The default is two. Depending on the implementation
    and size of framebuffer and coprocessor write buffers on the chip,
    you might need more than two.

CPUToScreenColorExpandBase

    This address defines the base address for writing monochrome bitmap
    data to when performing CPU-to-screen color expansion operations. 
    When the CPU_TRANSFER_BASE_FIXED flag is not set and
    CPUToScreenColorExpandRange is not defined, a large range is
    assumed to be available (at least the number pixels in the virtual
    screen / 8). For text operations this is probably never a problem.
    At the moment hardware that has 64 bytes or so of transfer space is
    unlucky. 32-bit access is always used.

    If this is not defined, FramebufferBase will automatically be
    used.

CPUToScreenColorExpandRange

    This defines the size of the "window" starting from the base
    address for writing CPU-to-screen color-expand data. If this is
    not defined or zero, the range is assumed to be large enough.
    When it is greater than the width of the screen in pixels / 8,
    the base address will be adjusted if necessary at the end of each
    scanline. Currently, if it is smaller than that, the
    CPU_TRANSFER_BASE_FIXED flag is set.

    At the moment, the bottom line is that you need about 256 bytes
    of transfer space to use CPU-to-screen color expansion (128 bytes
    with a 1024 pixel screen width) with PCI-burst mode support.
    However, "fixed-base" operation is supported.

FramebufferBase

    This is a pointer to the framebuffer. It is required by the
    ScanlineScreenToScreenColorExpand, and is automatically
    initialized. It should not be set up in a chip-specific driver.

BitsPerPixel

    This is the number of bits per pixel, stored here for convenience.
    There's no need to initialize this from a driver.

FramebufferWidth

    The is the width of the framebuffer in pixels, stored here for
    convenience. There's no need to initialize this from a driver.

ScratchBufferAddr
ScratchBufferSize

    This specifies the linear address in bytes and size of the scratch
    buffer used for ScanlineScreenToScreenColorExpand operations.

ScratchBufferBase

    This is a pointer to the mapped video memory of the scratch buffer.
    When not defined, the scratch buffer is assumed to be at the
    specified offset (ScratchBufferAddr) into a linear framebuffer.
    This field should only be initialized when using
    ScanlineScreenToScreenColorExpand with a non-linear framebuffer,
    in which case it should be noted that it is totally independent
    from ScratchBufferAddr.

PingPongBuffers

    This field defines the number of alternating buffers used in the
    scratch buffer for ScanlineScreenToScreenColorExpand. The default
    is two. Depending on the implementation and size of framebuffer and
    coprocessor write buffers on the chip, you might need more than
    two.

ErrorTermBits

    Indicates the number of bits of precision for the Bresenham line
    error terms. The absolute values of the of the terms are guaranteed
    to be in the range [0, 2 ^ ErrorTermBits - 1]. If your registers
    have 14 significant bits, you would probably use 13 here because of
    the sign bit.

ServerInfoRec

    This is a pointer to the XFree86 server InfoRec. It must be defined.
    The InitPixmapCache function initializes it for compatibility with
    earlier versions of XAA. The SVGA server initializes it
    automatically.

PixmapCacheMemoryStart
PixmapCacheMemoryEnd

   These values must be defined if the pixmap cache is enabled. The
   InitPixmapCache function initializes them, for compatibility with
   earlier versions of XAA.


1.6	Commonly Used Parameters

This section clarifies the format of some of the commonly used
parameters in the low-level functions (as described above).

Coordinates ("x", "y") are pixel coordinates unless otherwise noted.

The width and height ("w", "h") define the size of the area involved in
pixel units.

Colors (named "color", "bg" or "fg") are simple pixel values. They are
not "replicated" over the 32-bit integer argument. So for example in
8bpp mode, bits 0-7 of the value represent the pixel value, and the
rest of the bits is zero. If your chip requires a "replicated" 32-bit
pixel value (4 duplicated pixels for 8bpp), you will have to do that in
your low-level functions implementation.

The planemask is a mask that defines what bits in the pixel value
are to be modified on the screen. Again, this value cannot be assumed
to be "replicated" to 32-bit in 8bpp and 16bpp modes.

The raster-op ("rop") is one of the 16 raster-operations that X
defines:

    #define GXclear			0x0	/* 0 */
    #define GXand			0x1	/* src AND dst */
    #define GXandReverse		0x2	/* src AND NOT dst */
    #define GXcopy			0x3	/* src */
    #define GXandInverted		0x4	/* NOT src AND dst */
    #define GXnoop			0x5	/* dst */
    #define GXxor			0x6	/* src XOR dst */
    #define GXor			0x7	/* src OR dst */
    #define GXnor			0x8	/* NOT src AND NOT dst */
    #define GXequiv			0x9	/* NOT src XOR dst */
    #define GXinvert			0xa	/* NOT dst */
    #define GXorReverse			0xb	/* src OR NOT dst */
    #define GXcopyInverted		0xc	/* NOT src */
    #define GXorInverted		0xd	/* NOT src OR dst */
    #define GXnand			0xe	/* NOT src OR NOT dst */
    #define GXset			0xf	/* 1 */

For each graphics operation you can define that only GXcopy is supported
by setting the GXCOPY_ONLY flag in the flags for that particular
operation. Similarly, NO_PLANEMASK indicates that the plane mask is
not supported.


1.5	The best strategy

Start with simple filled solid rectangles and screen-to-screen copies
(BitBLT). Those two functions alone will accelerate the vast majority
of graphic operations requested. The sample driver can be used as a
starting point.

Next you might want to look at color expansion (CPUToScreen, or if
that can't be done, ScanlineScreenToScreen), BresenhamLine or
TwoPointLine, and Fill8x8Pattern/ColorExpand8x8Pattern.

The relative win of seperately implementing functions that are already
accelerated with solid filled rectangles varies, but it can make a
difference since just using rectangle fills has some overhead. You may
be able to make better use of features of the graphics chip, and better
exploit CPU/graphics concurrency, although this already done by the
generic code for some operations (such as filled polygons and arcs).


2	Acceleration hooks

Many operations can be "hooked" at a higher level, instead of just
defining the low-level functions. This can be useful for existing code
or operations for which there are no adequate low-level functions. What
follow is a description of most of the functions that can be hooked.

[This isn't complete]

2.1	Filled Rectangles

Rectangles can be filled with a single source color, or with three
different types of repeating pattern:

    Stipple: a transparent bitmap pattern where 1's correspond to the
    foreground color.

    Opaque stipple: a bitmap pattern where 0's correspond to the
    background color and 1's to the foreground color.

    Tile: an image pattern that can have full pixel depth.

2.1.1	Solid Filled Rectangles

Solid filled rectangles are a very common operation. Apart from
a regular solid fill, special raster ops are often used, for example
for inverting the destination.

To define a simple function for drawing one filled rectangle that will
be used for many kinds of operation, use this:

    xf86AccelInfoRec.SetupForFillRectSolid = MySetupForFillRectSolid;
    xf86AccelInfoRec.SubsequentFillRectSolid = MySubsequentFillRectSolid;

If you accelerate solid filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:

    xf86GCInfoRec.PolyFillRectSolid = MyPolyFillRect;

If you don't handle clipping, but do have a replacement for accelerated
solid filled rectangles, do this:

    xf86GCInfoRec.PolyFillRectSolid = xf86PolyFillRect;
    xf86AccelInfoRec.FillRectSolid = MyFillRectSolid;

In all cases, the following flags can be set in
xfGCInfoRec.FillRectSolidFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.
    RGB_EQUAL		    Only a foreground color with same values
			    for red, green and blue is accepted.

2.1.2	Tiled Filled Rectangles

If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw tiles. For tiles, you just need
ScreenToScreenCopy.

If you accelerate tiled filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:

    xf86GCInfoRec.PolyFillRectTiled = MyPolyFillRect;

If you don't handle clipping, but do have accelerated tiled filled
rectangles, do this:

    xf86GCInfoRec.PolyFillRectTiled = xf86PolyFillRect;
    xf86AccelInfoRec.FillRectTiled = MyFillRectTiled;

In both cases, the following flags can be set in
xfGCInfoRec.FillRectTiledFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.

2.1.3	Stippled Filled Rectangles

If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw stipples. For stipples, you just need
ScreenToScreenCopy with support for transparency.

If you accelerate stippled filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:

    xf86GCInfoRec.PolyFillRectStippled = MyPolyFillRect;

If you don't handle clipping, but do have accelerated stippled filled
rectangles, do this:

    xf86GCInfoRec.PolyFillRectStippled = xf86PolyFillRect;
    xf86AccelInfoRec.FillRectStippled = MyFillRectStippled;

In both cases, the following flags can be set in
xfGCInfoRec.FillRectStippledFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.

2.1.4	Opaque Stippled Filled Rectangles

If you have the required low-level functions and enable PIXMAP_CACHE,
the pixmap cache will be used to draw stipples. For stipples, you just need
ScreenToScreenCopy.

If you accelerate opaque filled rectangles, and have a complete
replacement for PolyFillRect that handles clipping, do this:

    xf86GCInfoRec.PolyFillRectOpaqueStippled = MyPolyFillRect;

If you don't handle clipping, but do have accelerated opaque filled
rectangles, do this:

    xf86GCInfoRec.PolyFillRectOpaqueStippled = xf86PolyFillRect;
    xf86AccelInfoRec.FillRectOpaqueStippled = MyFillRectOpaqueStippled;

In both cases, the following flags can be set in
xf86GCInfoRec.FillRectOpaqueStippledFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.

2.2	Filled Spans

Filled spans can be used for many purposes, mostly filled areas
of different shapes. The fill style can be solid (by far the most
useful), tiled, stippled and opaque stippled.

If you accelerate solid filled spans, and have a complete
replacement for FillSpansSolid that handles clipping, do this:

    xf86GCInfoRec.FillSpansSolid = MyFillSpanstSolid;

And similarly for other fill styles:

    xf86GCInfoRec.FillSpansTiled = MyFillSpanstTiled;
    xf86GCInfoRec.FillSpansStippled = MyFillSpanstStippled;
    xf86GCInfoRec.FillSpansOpaqueStippled = MyFillSpanstOpaqueStippled;

If you don't handle clipping, but do have a function for drawing solid
filled spans, do this:

    xf86GCInfoRec.FillSpansSolid = xf86FillSpans;
    xf86AccelInfoRec.FillSpansSolid = MyFillSpansSolid;

In all cases, the following flags can be set in
xfGCInfoRec.FillSpansSolidFlags (and similarly for for other fill styles):

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.
    RGB_EQUAL		    Only a foreground color with same values
			    for red, green and blue is accepted.

2.3	Filled Arcs

If you accelerate filled solid arcs, and have a complete replacement
for PolyFillArc that handles clipping, do this:

    xf86GCInfoRec.PolyFillArc = MyPolyFillArc;

The following flags can be set in xf86GCInfoRec.PolyFillArcFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.

If you have a function for accelerated solid horizontal spans, it will
automatically be taken advantage of for filled arcs.

2.4	Text

There are two kinds of text, transparent text (the background is not
written), and image text (the background is filled with the background
color).

There are also two types of font. Terminal-emulator fonts, which have
characters that are all the same size, and non-terminal emulator fonts,
which have characters of varying size.

In the case of image text with a non-terminal emulator font, the filled
background corresponds to the bounding box of the text image.

2.4.1	Transparent Text

If you accelerate transparent text strings, and have a complete replacement
for PolyGlyphBlt that handles clipping, do this if you accelerate
terminal-emulator fonts:

    xf86GCInfoRec.PolyGlyphBltTE = MyPolyGlyphBltTE;

And if you also support non-terminal emulator fonts:

    xf8GCInfoRec.PolyGlyphBltNonTE = MyPolyGlyphBltNonTE;

And if you also support non-terminal emulator fonts:

    xf8GCInfoRec.PolyGlyphBltNonTE = MyPolyGlyphBltNonTE;

If you don't handle clipping, but do have accelerated transparent text:

    xf86GCInfoRec.PolyGlyphBltTE = xf86PolyGlyphBltTE;
    xf86AccelInfoRec.PolyTextTE = MyPolyTextTE;

And similarly for non-terminal emulator fonts:

    xf86GCInfoRec.PolyGlyphBltNonTE = xf86PolyGlyphBltNonTE;
    xf86AccelInfoRec.PolyTextNonTE = MyPolyTextNonTE;

2.4.2	Image text

If you accelerate image text strings, and have a complete replacement
for ImageGlyphBlt that handles clipping, do this if you accelerate
terminal-emulator fonts:

    xf86GCInfoRec.ImageGlyphBltTE = MyImageGlyphBltTE;

And if you also support non-terminal emulator fonts:

    xf8GCInfoRec.ImageGlyphBltNonTE = MyImageGlyphBltNonTE;

If you don't handle clipping, but do have accelerated transparent text:

    xf86GCInfoRec.ImageGlyphBltTE = xf86ImageGlyphBltTE;
    xf86AccelInfoRec.ImageTextTE = MyImageTextTE;

And similarly for non-terminal emulator fonts:

    xf86GCInfoRec.ImageGlyphBltNonTE = xf86ImageGlyphBltNonTE;
    xf86AccelInfoRec.ImageTextNonTE = MyImageTextNonTE;

2.5	CopyArea

Screen-to-screen area copies (BitBLTs) are extremely useful. It's vital
for smooth scrolling and dragging of windows. Unaccelerated, this
operation is often slow because of the slowness of read operations
from the framebuffer. This function can also be used to great effect
for caching mechanisms for patterns and fonts, when support for it
is added.

If you accelerate screen-to-screen area copies (BitBLTs), and have a
complete replacement for CopyArea that handles clipping, do this:

    xf8GCInfoRec.CopyArea = MyCopyArea;

If you don't handle clipping, but do have an accelerated CopyArea:

    xf86GCInfoRec.CopyArea = xf86CopyArea;
    xf86AccelInfoRec.ScreenToScreenBitBlt = MyScreenToScreenBitBlt;

In all cases, the following flags can be set in
xfGCInfoRec.CopyAreaFlags:

    GXCOPY_ONLY		    Only the raster-op GXcopy is supported.
    NO_PLANEMASK            No special planemask is supported.
    NO_TRANSPARENCY         Transparency color compare is not supported.


3.	Opportunities For Improvement

- The graphics operation flags aren't consistent. There should be
  seperate flags indicating the restrictions for the lower-level
  functions.

- VT-switching awareness has not been extensively tested, and the
  current implement has a few rough edges.

- Solid tile fill may be faster with cfb in some cases (if the chip
  doesn't have much video memory bandwidth to play with and the PCI
  bus bandwidth is decent).

- Having a function for clipped filled spans that clips on the fly. This
  doesn't exist yet anywhere in the source tree. This would be a minor
  speed up for things like clipped filled polygons and arcs, and wide
  lines.

- Having the pixmap cache store stipples in monochrome format, and using
  color expansion features of the graphics chip to replicate them. This
  is more efficient since less video memory bandwidth is required for
  the cached pattern source. Not all chips support this kind of operation
  easily, especially w.r.t. clipping of the leftmost edge (the first pixel
  to be drawn may start at some bit of the leftmost video memory byte), and
  defining the location of the monochrome pattern in video memory can be
  a little complex.

- Taking more advantage of built-in (8x8) chip pattern registers. This
  works OK now, but things not implemented include detection of
  tiles that have only two colors so that they can be done with
  color-expand 8x8 pattern fill, and interleaving schemes allowing 16
  and 32 pixel high patterns to be done using the hardware pattern. Also
  some chips support 16x8 and 32x8 pattern fill at 8bpp by using 16bpp
  or 32bpp pattern fill. Currently, support for chips that require the
  pattern to be aligned on a 64-pixel boundary is missing in most cases,
  which in practice means the 8x8 pattern is not usable for many chips.

- Font-caching (useful for configurations where it's not possible to use
  color expansion for text, and for certain fonts). Non-"terminal emulator"
  fonts is certainly a weak area of XAA.

- Complete implementation of non-terminal emulator font text acceleration
  using color expansion (the code is in place, but causes problems).

- Generic hardware-cursor code (this sounds very useful to me), including
  Harald Koenig's support for real-time software/hardware cursor switching.

- More complete 24bpp-in-8bpp-mode support. Missing is full implementation
  of color expansion schemes to allow 24bpp fills in 8bpp mode in two
  passes.

- The Pentium optimized text bitmap functions exist only for 6 and 8-pixel
  wide fonts. BTW, on a Cyrix 6x86 the Pentium-optimized 6-wide function
  seems to cause a 2% performance decrease.

- Accelerated stipples using direct color expansion would definitely be
  worthwhile.  The lowest-level function is in place (but untested). It
  would take care of cases where the font cache cannot be used (such as
  24bpp, lack of transparency color compare for transparent stipples,
  lack of off-screen video memory), or when color expansion is faster
  (generally on video memory bandwidth-starved configurations).

3.1	More Concurrency?

More concurrency between graphics and CPU processing sounds very
attractive. This can be implemented by not "syncing" when leaving the
graphic drawing code, but instead allowing graphics commands to
continue while X is doing its request processing, or even during
context switching or when the client is running. The ever larger PCI
write buffers help to make this a very nice optimization. This requires
awareness of coprocessor activity at several levels in the server code
(for example, at any point where something is read or written to the
video card).

There are variations between chipsets that affect how easily they would
support such a scheme. The best behaviour is what I would call "in
order execution" of coprocessor commands and simple CPU writes to the
framebuffer. That is, if you send some graphics coprocessor commands to
the chip, and then write something to the framebuffer, it is guaranteed
that the framebuffer writes will only happen when the graphics commands
have been completed. This avoids, for example, having to check for
coprocessor activity each time something is drawn with a "dumb"
framebuffer function. I think that PCI write buffers on the motherboard
generally follow this behaviour, but graphics chips generally do not.

Of course, reading or querying anything from the graphics card is
something you will want to avoid, since in most cases this will result
in the CPU being stalled until all the PCI and on-chip write buffers
are flushed and processed. Chips that require frequent querying or do
not allow concurrent coprocessor execution and CPU framebuffer access
will take much less benefit.

A somewhat wild way to test this kind of scheme is to simply not define
the BACKGROUND_OPERATIONS flag, but despite that not do any syncing
in the graphics primitives. Without BACKGROUND_OPERATIONS set, the XAA
code almost never calls Sync itself. Someone (inadvertently) tried
this on an ET6000, and it seemed to measurably increase
performance. This is of course hazardous and prone to lock-ups etc.


4.	Comparison Of Chip-specific Implementations


4.1	Current Chip-specific Implementations

ARK Logic

    Uses BACKGROUND_OPERATIONS and COP_FRAMEBUFFER_CONCURRENCY. The
    latter is vital for high-performance color expansion, since the
    ARK chips don't appear to have CPU-to-screen color expansion.
    There's no need to "sync" during a batch of accelerator commands;
    the ARK chips seem to have "PCI-Retry" support.

    Screen locations are programmed as pixel addresses. The ARK chip
    also supports coordinates, but that restricts the possible
    framebuffer widths and I don't think it would be faster.

    FillRectSolid is provided. At 24bpp, it uses 8bpp coprocessor mode
    which leads to RGB_EQUAL and NO_PLANEMASK restrictions. 
    ScreenToScreenCopy is supported, again restrictions at 24bpp:
    NO_PLANEMASK and NO_TRANSPARENCY. BresenhamLine is very
    straightforward.

    Fill8x8Pattern is supported; the ARK chip requires the pattern to
    be aligned on a 64-pixel boundary and the address modulo 64 seems
    to indicate the vertical offset (y origin) (HARDWARE_PATTERN_MOD_
    64_OFFSET). The latter means the pattern can actually be used (when
    the framebuffer width is a multiple of 64), despite the limited
    support for 64-pixel pattern alignment in XAA.  The ARK chips don't
    seem to have support for a monochrome pattern.

    Color expansion is implemented using ScanlineScreenToScreen-
    ColorExpand (24bpp: RGB_EQUAL, NO_PLANEMASK), which is pretty fast
    thanks to COP_FRAMEBUFFER_CONCURRENCY.
    Color expansion flags are VIDEO_SOURCE_GRANULARITY_PIXEL and
    BIT_ORDER_IN_BYTE_LSBFIRST. At 24bpp, TRIPLE_BITS_24BPP would be
    useful but is not yet supported by XAA.

    ScreenToScreenColorExpand is provided for future use by XAA. One
    thing that ARK chips can accelerate but is not yet provided by XAA
    is styled (patterned) line drawing.

Cirrus Logic GD5426/28/29/30/34/40/46 and 7543/48

    Uses BACKGROUND_OPERATIONS. The driver is shared by a very wide
    range of largely compatible chips, from the first-generation
    accelerator CL-GD5426 to the recent CL-GD5446, which is the only
    one to support COP_FRAMEBUFFER_CONCURRENCY and also doesn't need
    "sync"-ing between coprocessor operations. Screen locations are
    programmed as byte addresses (which makes the driver larger than,
    for example, ARK). The driver is compiled twice, with programmed I/O
    (required for earlier chips) and with memory-mapped I/O.

    FillSolidRect is provided (NO_PLANEMASK, since the chips don't
    support a planemask), and at 24bpp on a non-5436/46 uses 8bpp mode
    in which case RGB_EQUAL is set.

    ScreenToScreenCopy is supported (NO_PLANEMASK). A few chips
    (5429/30/34) don't support transparency color compare at all
    (NO_TRANSPARENCY), and none of the chips support it at pixel depths
    greater than 16bpp.

    For CPU-to-screen color expansion, chips earlier than the CL-GD5436
    don't support DWORD padding of scanlines, so the XAA code isn't
    usable for them. Instead, these chips use byte-padding-aware text
    acceleration code from the old accelerated driver, and the
    ScanlineScreenToScreenColorExpand method (which isn't very fast
    on these chips) is provided for other things. NO_PLANEMASK.  The
    5436/46 support 24bpp color expansion, but only with transparency
    (ONLY_TRANSPARENCY_SUPPORTED); the others would benefit from
    TRIPLE_BITS_24BPP. The bit order is BIT_ORDER_IN_BYTE_MSBFIRST. The
    LEFT_EDGE_CLIPPING parameter (a value from 0 to 7) is supported for
    CPU-to-screen color expansion. Screen-to-screen color expansion is
    provided for future use. It requires the source to be aligned on a
    DWORD boundary (VIDEO_SOURCE_GRANULARITY_DWORD).

Matrox Millennium

    BACKGROUND_OPERATIONS
    24bpp: NO_PLANEMASK
    FillRectSolid
    ScreenToScreenCopy (NO_TRANSPARENCY)

    Color expansion:
    CPUToScreenColorExpand
    SCANLINE_PAD_DWORD
    CPU_TRANSFER_PAD_DWORD
    BIT_ORDER_IN_BYTE_LSBFIRST
    LEFT_EDGE_CLIPPING
    ScreenToScreenColorExpand
    VIDEO_SOURCE_GRANULARITY_PIXEL


4.2	Chip-specific Performance

This table is intended to help with determining what kinds of
operations best suit a particular chip. It shows the results (in MB/s)
for the low-level bandwidth benchmarks run at start-up. Because refresh
is disabled at the time time benchmark is run, the result reflects the
full DRAM bandwidth on DRAM-based cards (the dot clock doesn't really
matter). For this reason, the comparison isn't really fair (biased against
VRAM/WRAM and MDRAM). The virtual display width can have an influence.

Chip	          ARK1000PV Trid9385  CLGD5434  TGUI9440  MGA-Mill  ET6000
Memory            1MB DRAM  2MB DRAM  2MB DRAM  1MB DRAM  2MB WRAM  2MB MDRAM
CPU               DX4/100   ?         DX4/100   AMK5/100  AMK5/100  6x86P150+
Bus		  PCI 33MHz PCI       VLB 33MHz PCI 33MHz PCI 33MHz PCI 30MHz
bpp, width        8bpp 1024 8bpp      8bpp 1024 8bpp      8bpp      8bpp
-----------------------------------------------------------------------------
framebuffer        43.95     15.89     32.76     44.23     44.48     61.40
solid filled rect
10x1                7.38      3.14      4.58      8.72     34.99     28.22
40x40              85.82     89.93    120.34     62.60    369.84    143.84
400x400           108.81    157.20    211.18     80.03   1618.11    264.35
screen copy
10x10		   24.77     11.26     20.49     18.53     28.14     24.22
40x40		   38.81     43.90     41.68     32.11     89.27     70.57
400x400		   46.70     68.59     54.47     34.14    126.88    194.23
400x400 scroll       -         -       55.63     40.22       -      189.34
8x8 pattern fill
400x400		  105.16    116.08       -       80.02       -      264.34
color expansion
CPU to screen        -         -      116.25       -      261.03*      -
scanl. scr-to-scr  71.75       -       80.64       -         -      187.25
10x10 scr-to-scr   29.90       -       26.69     20.07       -         -

Chip	          MGA-Mill  MGA-Mill  MGA-Mill  TGUI9680  ARK2000PV
Memory            2MB WRAM  4MB WRAM  2MB WRAM  2MB DRAM  2MB EDO
CPU               DX4/133   P133      P133                DX4/100  
Bus		  PCI 33MHz PCI 33MHz PCI 33Mhz PCI       PCI 33MHz
bpp, width        8bpp      8bpp 1024 8bpp 1152 8bpp 2048 8bpp 1024
-------------------------------------------------------------------
framebuffer        26.48     83.70     83.13     10.09     64.44   
solid filled rect
10x1               28.63     34.28     41.30      3.06      7.91   
40x40             385.13    316.47    453.02     55.97    155.76   
400x400          1656.93   1367.64   1942.59    145.16    244.35   
screen copy
10x10		   29.18     23.47     35.09     12.61     33.20   
40x40		   81.52     74.55     99.92     35.38     78.80   
400x400		  114.81    105.83    137.60     46.48     99.03   
400x400 scroll       -         -         -       51.88    100.08   
8x8 pattern fill
400x400		     -         -         -       51.74    228.70   
color expansion
CPU to screen     211.11*   419.09*   416.23*      -         -     
scanl. scr-to-scr    -         -         -         -      137.97   
10x10 scr-to-scr     -         -         -       15.26     36.79   

Chip		  ARK2000PV ARK2000PV MGA-Mill  CL-GD5426 CL-GD5446
Memory		  2MB EDO   2MB EDO   2MB WRAM  1MB DRAM  2MB DRAM
CPU		  6x86P150+ 6x86P166+ 6x86P166+ DX4/100   P133
Bus		  PCI 30MHz PCI 33Mhz PCI 33MHz VLB 33MHz PCI 33Mhz
bpp, width	  8bpp 1024 8bpp 1024 8bpp 1024 8bpp 1024 8bpp 1280
-------------------------------------------------------------------
framebuffer	   88.84    100.45     92.18     13.22     80.81
solid filled rect
10x1		   20.99     22.82     41.30      1.16     25.49
40x40		  157.00    157.00    458.38     28.88    167.10
400x400		  244.34    244.34   1961.74     41.74    218.02
screen copy
10x10	           33.22     33.23     35.09      5.35     34.39
40x40		   78.81     78.81     99.93     15.83     77.71
400x400		   99.13     99.02    137.87     20.98     97.03
400x400 scroll	  100.09    100.09    664.02     21.18     98.55
8x8 pattern fill
400x400		  228.68    228.72       -       41.10    217.91
color expansion
CPU to screen	     -                730.52       -      221.10
scanl. scr-to-scr 138.01    138.09       -       19.56       -   
10x10 scr-to-scr   36.87     36.86     39.19      6.16     66.46

(*) After this benchmark was taken, the color expansion benchmark was
    changed to write a pattern including both colors instead of just the
    background one, which is likely to affect the score.

The 10x1 filled rectangles score tells a lot about the command overhead
for small fills, which is important for operations that fill span-by-span.
The 10x10 and 40x40 screencopy give an impression of pixmap cache
efficiency, while the 10x10 score also indicates how a simple font cache
would perform (compare with color expansion). The 10x10 screen-to-screen
color expand score reflects a smarter kind of font cache.

If your implementation seems weak at a particular kind of operation, maybe
you are not doing it optimally and can improve it (usually by reducing
the command overhead, for example by minimizing the number of graphics
chip queries).


5.	Development notes

When adding a function to the GCInfoRec or AccelInfoRec, make sure to
have a Makefile with dependencies (run make depend after doing make
Makefile). If you don't, you're bound to get unexplainable core dumps.
That also applies to SVGA drivers using the new interface; they should
be recompiled after a new version of the generic acceleration code
in installed.

Header files:

vgabpp.h	Declares the new ScreenInit functions for each depth.

xf86xaa.h	General public definitions, including the GCInfoRec and
		AccelInfoRec.

xf86scrin.h	XAA screen initialization functions.

xf86local.h	Declares functions local to the generic acceleration code.

xf86gcmap.h	Maps names of some local functions to depth-specific
		versions.

xf86maploc.h	Declares local functions that are name-mapped depending on
		the depth.

vga256map.h	Maps the name of some cfb functions to their vga256
		equivalents. This is used for the vga256 version of the
		GC validation code.

xf86pcache.h	Some declarations for the pixmap cache.

xf86expblt.h	Declares monochrome data color-expansion blit functions
		defined in xf86expblt.c


6. Acknowledgements

The Mach64 server by Kevin Martin has been used a base for some parts
(notably pixmap caching), and the set of functions accelerated in the
Mach64 server provided a baseline for what to implement first.
