This is a list of things we should do but haven't done
yet.  

1) Implement some sort of Prime Factor algorithm (Temperton's?)
   (PFA is now used in the codelets)

2) Try the Winograd blocks for the base cases (partially done,
   not yet released).

3) Try on-the-fly generation of twiddle factors, to save space
   and cache (tried: it is slower.  We have now a better scheme
   for FFTW 1.1 that stores the twiddle factors in the same 
   order they are used, thus enhancing locality).

4) Try the Fast Hartley Transform (are there patent issues here?)

5) Implement real-complex fft. (partially done, but we are
   not happy with it.  The code is in rfftw/)  We first need
   to benchmark real-complex ffts to quantify performance.

6) Return an error code instead of crashing when malloc() fails.
   [The to authors diverge on this point.  SGJ thinks an error
   code should be returned to the user.  MF thinks that C is simply
   not the right language to simulate exceptions by repeated
   poll of a return value --- especially in recursive functions.
   If ever a well-publicized web language gets decent compilers,
   MF would be more than happy to do The Right Thing]  (This issue
   has been partially resolved by using the fftw_malloc_hook
   mechanism.)
