Tips for performance tuning on a specific architecture:

1. Choose the optimal limb size (intDsize). This is fundamental. On 32-bit
   platforms intDsize=32 is best. On 64-bit platforms intDsize=64 may be
   better, especially if there is a 64x64-bit multiplication in hardware.

2. Tune GMP.

3. The break-even points between several algorithms for the same task
   have to be determined experimentally, in the order given below:

   multiplication:
     cl_DS_mul.cc          karatsuba_threshold
     cl_DS_mul.cc          function cl_fftm_suitable
   division:
     cl_DS_div.cc          function cl_recip_suitable
   2-adic reciprocal:
     cl_2DS_recip.cc       recip2adic_threshold
   2-adic division:
     cl_2DS_div.cc         function cl_recip_suitable
   square root:
     cl_DS_sqrt.cc         function cl_recipsqrt_suitable
     cl_LF_sqrt.cc         "if (len > ...)"
   gcd:
     cl_I_gcd.cc           cl_gcd_double_threshold
   binary->decimal conversion:
     cl_I_to_digits.cc     cl_digits_div_threshold
   pi:
     cl_LF_pi.cc           best of 4 algorithms
   exp, log:
     cl_F_expx.cc          factor limit_slope of isqrt(d)
     cl_R_exp.cc           inside function exp
     cl_R_ln.cc            inside function ln
   eulerconst:
     cl_LF_eulerconst.cc   function compute_eulerconst
   sin, cos, sinh, cosh:
     cl_F_sinx.cc          factor limit_slope of isqrt(d)
     cl_R_sin.cc           inside function sin
     cl_R_cos.cc           inside function cos
     cl_R_cossin.cc        inside function cl_cos_sin
     cl_F_sinhx.cc         factor limit_slope of isqrt(d)
     cl_R_sinh.cc          inside function sinh
     cl_R_cosh.cc          inside function cosh
     cl_R_coshsinh.cc      inside function cl_cosh_sinh
     cl_F_atanx.cc         factor limit_slope of isqrt(d)
     cl_F_atanx.cc         inside function atanx
     cl_F_atanhx.cc        factor limit_slope of isqrt(d)
     cl_F_atanhx.cc        inside function atanhx


