doc/vbccm68k.texi - vbcc - Gitiles

 This chapter documents the backend for the M68k and Coldfire
 processor families.

 @section Additional options

 This backend provides the following additional options:

 @table @option

     @item -a2scratch
               Allow using @code{A2} as scratch register.

     @item -amiga-softfloat
               Call AmigaOS MathIEEE library functions via direct inline
               code, instead of callling stub routines from @file{mieee.lib}.
               It still requires that you either link with @file{mieee.lib} or
               define @code{MathIeeeSingBasBase}, @code{MathIeeeDoubBasBase}
               and @code{MathIeeeDoubTransBase} yourself.

     @item -conservative-sr
               Restrict strength-reduction. Experimental.

     @item -const-in-data
               By default constant data will be placed in the code
               section (and therefore is accessible with faster pc-relative
               addressing modes). Using this option it will be placed in the
               data section.

               This could e.g. be useful if you want to use small data and
               small code, but your code gets too big with all the constant
               data.

               Note that on operating systems with memory protection this
               option will disable write-protection of constant data.

     @item -cpu=n
               Generate code for cpu n (e.g. @option{-cpu=68020}),
               defaults to 68000.

     @item -d2scratch
               Allow using @code{D2} as scratch register.

     @item -fastcall
               Pass function arguments in volatile registers, when possible.

     @item -fp2scratch
               Allow using @code{FP2} as scratch register.

     @item -fpu=n
               Generate code for fpu n (e.g. @option{-fpu=68881}), default: 0.

     @item -gas
               Create output suitable for the GNU assembler.

     @item -hunkdebug
               When creating debug-output (@option{-g} option) create
               Amiga debug hunks rather than DWARF2.
               Does not work with @option{-gas}.

     @item -no-delayed-popping
               By default arguments of function calls are not always popped
               from the stack immediately after the call, so that the
               arguments of several calls may be popped at once.
               With this option @command{vbcc} can be forced to pop them after every
               function call.
               This may simplify debugging and reduce the
               stack size needed by the compiled program.

     @item -no-fp-return

               Do not return floats and doubles in floating-point registers
               even if code for an fpu is generated.

     @item -no-intz
               When generating code for FPU do quick&dirty conversions
               from floating-point to integer. The code may be somewhat
               faster but will not correctly round to zero.
               Only use it if you know what you are doing.

     @item -no-mreg-return
               Do not use multiple registers to return types that do not
               fit into a single register. This is mainly for backwards
               compatibility with certain libraries.

     @item -no-peephole
               Do not perform peephole-optimizations.

     @item -no-reserve-regs
               Do not reserve temporary registers for the backend. Can lead to
               worse code generation.

     @item -old-softfloat
               Use old libcall mechanism for software floating point.
               Should not be used, will usually generate worse code.

     @item -old-libcalls
               Use old libcall mechanism for (some) integer support routines.
               Should not be used, will usually generate worse code.

     @item -phxass
               Generate assembly output for the PhxAss assembler.

     @item -prof
               Insert code for profiling.

     @item -sc
               Use small code model (see below).

     @item -sd
               Use small data model (see below).

     @item -use-commons
               Use real common symbols instead of bss symbols for
               non-initialized external variables.

     @item -use-framepointer
               By default automatic variables are addressed through a7
               instead of a5. This generates slightly better code, because
               the function entry and exit overhead is reduced and a5 can be
               used as register variable etc.

               However this may be a bit confusing when debugging and you
               can force @command{vbcc} to use a5 as a fixed framepointer.


 @end table

 @section ABI

     The current version generates assembler output for use with the
     @command{vasmm68k_mot}. Most peephole optimizations are done by the
     assembler so @command{vbcc} only does some that the assembler cannot make.
     The generated executables will probably only work with OS2.0 or higher.

     With @option{-gas} assembler output suitable for the GNU assembler is generated
     (the version must understand the Motorola syntax - some old ones do not).
     The output is only slightly modified from the @command{vasm}-output and will
     therefore result in worse code on @command{gas}.

     The register names provided by this backend are:

 @example
          a0,  a1,  a2,  a3,  a4,  a5,  a6,  a7
          d0,  d1,  d2,  d3,  d4,  d5,  d6,  d7
         fp0, fp1, fp2, fp3, fp4, fp5, fp6, fp7
 @end example

         The registers @code{a0} - @code{a7} are supported to hold pointer
         types. @code{d0} - @code{d7} can be used for integers types
         excluding @code{long long}, pointers and @code{float} if no
         FPU code is generated. @code{fp0} - @code{fp7} can be used for
         all floating point types if FPU code is generated.

         Additionally the following register pairs can be used for
         @code{long long}:

 @example
         d0/d1, d2/d3, d4/d5, d6/d7
 @end example

     The registers @code{d0, d1, a0, a1, fp0} and @code{fp1}  are used as scratch registers
     (i.e. they can be destroyed in function calls), all other registers are
     preserved.

         By default, all function arguments are passed on the stack.

     All scalar types up to 4 bytes are returned in register @code{d0},
         @code{long long} is returned in @code{d0/d1}.
         If compiled for FPU, floating point values are returned in
     @code{fp0} unless @option{-no-fpreturn} is specified.
     Types which are 8, 12 or 16 bytes large will be returned in several
     registers (@code{d0/d1/a0/a1}) unless @option{-no-mreg-return} is specified.
     All other types are returned by passing the function the address
     of the result as a hidden argument - such a function must not be called
     without a proper declaration in scope.

     Objects which have been compiled with different settings must not be
         linked together.

         @code{a7} is used as stack pointer. If @option{-sd} is used,
         @code{a4} will be used as small data pointer. If
         @option{-use-framepointer} is used, @code{a5} will be used as
         frame pointer. All other registers will be used by the
         register allocator and can be used for register parameters.

         The size of the stack frame is limited to 32KB for early members
         of the 68000 family prior to 68020.


 The basic data types are represented like:

 @example
     type        size in bits        alignment in bytes

     char                8                       1
     short              16                       2
     int                32                       2
     long               32                       2
     long long          64                       2
     all pointers       32                       2
     float(fpu)         32                       2       see below
     double(fpu)        64                       2       see below
     long double(fpu)   64                       2       see below
 @end example


 @section Small data

     @command{vbcc} can access static data in two ways. By default all such data will
     be accessed with full 32bit addresses (large data model).
     However there is a second way. You can set up an address register
         (@code{a4})
     to point into the data segment and then address data with a 16bit
     offset through this register.

     The advantages of the small data model are that the program will
     usually be smaller (because the 16bit offsets use less space and no
     relocation information is needed) and faster.

     The disadvantages are that one address register cannot be used by the
     compiler and that it can only be used if all static data occupies
     less than 64kb. Also object modules and libraries that
     have been compiled with different data models must not be mixed
     (it is possible to call functions
     compiled with large data model from object files compiled with small
     data model, but not vice versa and only functions can be called that
     way - other data cannot be accessed).

     If small data is used with functions which are called from
     functions which have not been compiled with @command{vbcc} or without the small data
     model then those functions must be declared with the @code{__saveds} attribute
     or call @code{geta4()} as the first statement (do not use
     automatic initializations prior to the call to @code{geta4}).
     Note that @code{geta4()} must not be called through a function pointer!


 @section Small code

     In the small code model calls to external functions (i.e. from
     libraries or other object files) are done with 16bit offsets through
     the program counter rather than with absolute 32bit addresses.

     The advantage is slightly smaller and faster code.
     The disadvantages are that all the code (including library functions)
     must be small enough. Objects/libraries can be linked together if they
     have been compiled with different code models.


 @section CPUs

     The values of @option{-cpu=n} have those effects:

 @table @option

     @item n<68000
         Code for the Coldfire family is generated.

     @item n>=68000
         Code for the 68k family is generated.

     @item n>=68020
 @itemize @minus
    @item 32bit multiplication/division/modulo is done with the
                   mul?.l, div?.l and div?l.l instructions.
    @item tst.l ax is used.
    @item extb.l dx is used.
    @item 16/32bit offsets are used in certain addressing modes.
    @item link.l is used.
    @item Addressing modes with scaling are used.
 @end itemize

     @item n==68040
 @itemize @minus
    @item 8bit constants are not copied in data registers.
    @item Static memory is not subject to common subexpression elimination.
 @end itemize

 @end table


 @section FPUs

     At the moment the values of -fpu=n have those effects:

 @table @option
     @item n>68000
         Floating point calculations are done using the FPU.
     @item n=68040
     @itemx n=68060
         Instructions that have to be emulated on these FPUs
                   will not be used; at the moment this only includes
                   the @code{fintrz} instruction in case of the 040.
 @end table

 @section Math

     Long multiply
     on CPUs <68020 uses inline routines. This may increase code size a bit,
     but it should be significantly faster, because function call overhead
     is not necessary.
     Long division and modulo is handled by calls to library
     functions.
     (Some operations involving constants (e.g. powers of two) are always
     implemented by more efficient inline code.)

     If no FPU is specified floating point math is done using
     math libraries. 32bit IEEE format is used for float and 64bit IEEE
     for double and long double.

     If floating point math is done with the FPU
     floating point values are kept in registers and therefore may
     have extended precision sometimes. This is not ANSI compliant but
     will usually cause no harm. When floating point values are stored in
     memory they use the same IEEE formats as without FPU.
     Return values are passed in @code{fp0}.

     Note that you must not link object files together if they were not
     compiled with the same @code{-fpu} settings and that
     a proper math library must be linked.


 @section Target-Specific Variable Attributes

     This backend offers the following variable attributes:

 @table @code
     @item __amigainterrupt
                 Used to write interrupt-handlers for AmigaOS. Stack-checking
                 for a function with this attribute will be disabled and if a value
                 is returned in d0, the
                 condition codes will be set accordingly.

     @item __chip
               Place variable in chip-memory. Only applicable on
                  AmigaOS to variables with static storage-duration.

     @item __far
                Do not place this variable in the small-data segment
                  in small data mode. No effect in large data mode.
                  Only applicable to variables with static
                 storage-duration.

     @item __interrupt
                  This is used to declare interrupt-handlers. The
                  function using this attribute will save all registers
                  it destroys (including scratch-registers) and return
                  with @code{rte} rather than @code{rts}.

     @item __near
               Currently ignored.

     @item __regargs
             Declare function to use the @option{-fastcall} ABI. The
             first arguments are passed in volatile registers.

     @item __saveds
             Load the pointer to the small data segment at
                  function-entry. Applicable only to functions.

     @item __section(<string-literal>)
                  Places the variable/function in a section named
                  according to the argument.

     @item __stdargs
             Declare function to use the standard ABI (default),
             which passes all arguments on the stack.
 @end table

 @section Target-specific pragmas

     This backend offers the following #pragmas:

 @table @code

     @item #pragma stdargs-on
           Automatically declare the following functions with the
           @code{__stdargs} attribute.

     @item #pragma stdargs-off
           Stop automatically declaring the following functions with the
           @code{__stdargs} attribute.

 @end table

 @section Predefined Macros

         This backend defines the following macros:

 @table @code
         @item __AMIGADATE__
                 This is set to current date as @code{"(DD.MM.YYYY)"},
                 useful with version strings.

         @item __COLDFIRE
                 (If a Coldfire CPU is selected.)

         @item __INTSIZE
                 Is set to the size of the @code{int} type.
                 Either 16 (vbccm68ks) or 32 (vbccm68k).

         @item __M680x0
                 (Depending on the settings of @option{-cpu}, e.g.
                 @code{__M68020}.)

         @item __M68881
                 (If @option{-fpu=68881} is selected.)

         @item __M68882
                 (If code for another FPU is selected;
                 @option{-fpu=68040} or @option{-fpu=68060} will
                 set @code{__M68882}.)

         @item __M68K__

         @item __SMALL_DATA__
                 (If @option{-sd} is selected to enable small data.)
 @end table


 @section Stack

     If the @option{-stack-check} option is used, every function-prologue will
     call the function @code{__stack_check} with the stacksize needed by the
     current function on the stack. This function has to consider its own
     stacksize and must restore all registers.

     If the compiler is able to calculate the maximum stack-size of a
     function including all callees, it will add a comment in the
     generated assembly-output (subject to change to labels).


 @section Stdarg

     A possible @file{<stdarg.h>} could look like this:

 @example

 typedef unsigned char *va_list;

 #define __va_align(type) (__alignof(type)>=4?__alignof(type):4)

 #define __va_do_align(vl,type) ((vl)=(char *)((((unsigned int)(vl))+__va_align(type)-1)/__va_align(type)*__va_align(type)))

 #define __va_mem(vl,type) (__va_do_align((vl),type),(vl)+=sizeof(type),((type*)(vl))[-1])

 #define va_start(ap, lastarg) ((ap)=(va_list)(&lastarg+1))

 #define va_arg(vl,type) __va_mem(vl,type)

 #define va_end(vl) ((vl)=0)

 #define va_copy(new,old) ((new)=(old))

 #endif


 @end example

 @section Known problems

 @itemize @minus
     @item The extended precision of the FPU registers can cause problems if
       a program depends on the exact precision. Most programs will not
       have trouble with that, but programs which do exact comparisons
       with floating point types (e.g. to try to calculate the number
       of significant bits) may not work as expected (especially if the
       optimizer was turned on).
 @end itemize
	This chapter documents the backend for the M68k and Coldfire
	processor families.

	@section Additional options

	This backend provides the following additional options:

	@table @option

	@item -a2scratch
	Allow using @code{A2} as scratch register.

	@item -amiga-softfloat
	Call AmigaOS MathIEEE library functions via direct inline
	code, instead of callling stub routines from @file{mieee.lib}.
	It still requires that you either link with @file{mieee.lib} or
	define @code{MathIeeeSingBasBase}, @code{MathIeeeDoubBasBase}
	and @code{MathIeeeDoubTransBase} yourself.

	@item -conservative-sr
	Restrict strength-reduction. Experimental.

	@item -const-in-data
	By default constant data will be placed in the code
	section (and therefore is accessible with faster pc-relative
	addressing modes). Using this option it will be placed in the
	data section.

	This could e.g. be useful if you want to use small data and
	small code, but your code gets too big with all the constant
	data.

	Note that on operating systems with memory protection this
	option will disable write-protection of constant data.

	@item -cpu=n
	Generate code for cpu n (e.g. @option{-cpu=68020}),
	defaults to 68000.

	@item -d2scratch
	Allow using @code{D2} as scratch register.

	@item -fastcall
	Pass function arguments in volatile registers, when possible.

	@item -fp2scratch
	Allow using @code{FP2} as scratch register.

	@item -fpu=n
	Generate code for fpu n (e.g. @option{-fpu=68881}), default: 0.

	@item -gas
	Create output suitable for the GNU assembler.

	@item -hunkdebug
	When creating debug-output (@option{-g} option) create
	Amiga debug hunks rather than DWARF2.
	Does not work with @option{-gas}.

	@item -no-delayed-popping
	By default arguments of function calls are not always popped
	from the stack immediately after the call, so that the
	arguments of several calls may be popped at once.
	With this option @command{vbcc} can be forced to pop them after every
	function call.
	This may simplify debugging and reduce the
	stack size needed by the compiled program.

	@item -no-fp-return

	Do not return floats and doubles in floating-point registers
	even if code for an fpu is generated.

	@item -no-intz
	When generating code for FPU do quick&dirty conversions
	from floating-point to integer. The code may be somewhat
	faster but will not correctly round to zero.
	Only use it if you know what you are doing.

	@item -no-mreg-return
	Do not use multiple registers to return types that do not
	fit into a single register. This is mainly for backwards
	compatibility with certain libraries.

	@item -no-peephole
	Do not perform peephole-optimizations.

	@item -no-reserve-regs
	Do not reserve temporary registers for the backend. Can lead to
	worse code generation.

	@item -old-softfloat
	Use old libcall mechanism for software floating point.
	Should not be used, will usually generate worse code.

	@item -old-libcalls
	Use old libcall mechanism for (some) integer support routines.
	Should not be used, will usually generate worse code.

	@item -phxass
	Generate assembly output for the PhxAss assembler.

	@item -prof
	Insert code for profiling.

	@item -sc
	Use small code model (see below).

	@item -sd
	Use small data model (see below).

	@item -use-commons
	Use real common symbols instead of bss symbols for
	non-initialized external variables.

	@item -use-framepointer
	By default automatic variables are addressed through a7
	instead of a5. This generates slightly better code, because
	the function entry and exit overhead is reduced and a5 can be
	used as register variable etc.

	However this may be a bit confusing when debugging and you
	can force @command{vbcc} to use a5 as a fixed framepointer.




	@end table

	@section ABI

	The current version generates assembler output for use with the
	@command{vasmm68k_mot}. Most peephole optimizations are done by the
	assembler so @command{vbcc} only does some that the assembler cannot make.
	The generated executables will probably only work with OS2.0 or higher.

	With @option{-gas} assembler output suitable for the GNU assembler is generated
	(the version must understand the Motorola syntax - some old ones do not).
	The output is only slightly modified from the @command{vasm}-output and will
	therefore result in worse code on @command{gas}.

	The register names provided by this backend are:

	@example
	a0, a1, a2, a3, a4, a5, a6, a7
	d0, d1, d2, d3, d4, d5, d6, d7
	fp0, fp1, fp2, fp3, fp4, fp5, fp6, fp7
	@end example

	The registers @code{a0} - @code{a7} are supported to hold pointer
	types. @code{d0} - @code{d7} can be used for integers types
	excluding @code{long long}, pointers and @code{float} if no
	FPU code is generated. @code{fp0} - @code{fp7} can be used for
	all floating point types if FPU code is generated.

	Additionally the following register pairs can be used for
	@code{long long}:

	@example
	d0/d1, d2/d3, d4/d5, d6/d7
	@end example

	The registers @code{d0, d1, a0, a1, fp0} and @code{fp1} are used as scratch registers
	(i.e. they can be destroyed in function calls), all other registers are
	preserved.

	By default, all function arguments are passed on the stack.

	All scalar types up to 4 bytes are returned in register @code{d0},
	@code{long long} is returned in @code{d0/d1}.
	If compiled for FPU, floating point values are returned in
	@code{fp0} unless @option{-no-fpreturn} is specified.
	Types which are 8, 12 or 16 bytes large will be returned in several
	registers (@code{d0/d1/a0/a1}) unless @option{-no-mreg-return} is specified.
	All other types are returned by passing the function the address
	of the result as a hidden argument - such a function must not be called
	without a proper declaration in scope.

	Objects which have been compiled with different settings must not be
	linked together.

	@code{a7} is used as stack pointer. If @option{-sd} is used,
	@code{a4} will be used as small data pointer. If
	@option{-use-framepointer} is used, @code{a5} will be used as
	frame pointer. All other registers will be used by the
	register allocator and can be used for register parameters.

	The size of the stack frame is limited to 32KB for early members
	of the 68000 family prior to 68020.


	The basic data types are represented like:

	@example
	type size in bits alignment in bytes

	char 8 1
	short 16 2
	int 32 2
	long 32 2
	long long 64 2
	all pointers 32 2
	float(fpu) 32 2 see below
	double(fpu) 64 2 see below
	long double(fpu) 64 2 see below
	@end example



	@section Small data

	@command{vbcc} can access static data in two ways. By default all such data will
	be accessed with full 32bit addresses (large data model).
	However there is a second way. You can set up an address register
	(@code{a4})
	to point into the data segment and then address data with a 16bit
	offset through this register.

	The advantages of the small data model are that the program will
	usually be smaller (because the 16bit offsets use less space and no
	relocation information is needed) and faster.

	The disadvantages are that one address register cannot be used by the
	compiler and that it can only be used if all static data occupies
	less than 64kb. Also object modules and libraries that
	have been compiled with different data models must not be mixed
	(it is possible to call functions
	compiled with large data model from object files compiled with small
	data model, but not vice versa and only functions can be called that
	way - other data cannot be accessed).

	If small data is used with functions which are called from
	functions which have not been compiled with @command{vbcc} or without the small data
	model then those functions must be declared with the @code{__saveds} attribute
	or call @code{geta4()} as the first statement (do not use
	automatic initializations prior to the call to @code{geta4}).
	Note that @code{geta4()} must not be called through a function pointer!


	@section Small code

	In the small code model calls to external functions (i.e. from
	libraries or other object files) are done with 16bit offsets through
	the program counter rather than with absolute 32bit addresses.

	The advantage is slightly smaller and faster code.
	The disadvantages are that all the code (including library functions)
	must be small enough. Objects/libraries can be linked together if they
	have been compiled with different code models.


	@section CPUs

	The values of @option{-cpu=n} have those effects:

	@table @option

	@item n<68000
	Code for the Coldfire family is generated.

	@item n>=68000
	Code for the 68k family is generated.

	@item n>=68020
	@itemize @minus
	@item 32bit multiplication/division/modulo is done with the
	mul?.l, div?.l and div?l.l instructions.
	@item tst.l ax is used.
	@item extb.l dx is used.
	@item 16/32bit offsets are used in certain addressing modes.
	@item link.l is used.
	@item Addressing modes with scaling are used.
	@end itemize

	@item n==68040
	@itemize @minus
	@item 8bit constants are not copied in data registers.
	@item Static memory is not subject to common subexpression elimination.
	@end itemize

	@end table


	@section FPUs

	At the moment the values of -fpu=n have those effects:

	@table @option
	@item n>68000
	Floating point calculations are done using the FPU.
	@item n=68040
	@itemx n=68060
	Instructions that have to be emulated on these FPUs
	will not be used; at the moment this only includes
	the @code{fintrz} instruction in case of the 040.
	@end table

	@section Math

	Long multiply
	on CPUs <68020 uses inline routines. This may increase code size a bit,
	but it should be significantly faster, because function call overhead
	is not necessary.
	Long division and modulo is handled by calls to library
	functions.
	(Some operations involving constants (e.g. powers of two) are always
	implemented by more efficient inline code.)

	If no FPU is specified floating point math is done using
	math libraries. 32bit IEEE format is used for float and 64bit IEEE
	for double and long double.

	If floating point math is done with the FPU
	floating point values are kept in registers and therefore may
	have extended precision sometimes. This is not ANSI compliant but
	will usually cause no harm. When floating point values are stored in
	memory they use the same IEEE formats as without FPU.
	Return values are passed in @code{fp0}.

	Note that you must not link object files together if they were not
	compiled with the same @code{-fpu} settings and that
	a proper math library must be linked.


	@section Target-Specific Variable Attributes

	This backend offers the following variable attributes:

	@table @code
	@item __amigainterrupt
	Used to write interrupt-handlers for AmigaOS. Stack-checking
	for a function with this attribute will be disabled and if a value
	is returned in d0, the
	condition codes will be set accordingly.

	@item __chip
	Place variable in chip-memory. Only applicable on
	AmigaOS to variables with static storage-duration.

	@item __far
	Do not place this variable in the small-data segment
	in small data mode. No effect in large data mode.
	Only applicable to variables with static
	storage-duration.

	@item __interrupt
	This is used to declare interrupt-handlers. The
	function using this attribute will save all registers
	it destroys (including scratch-registers) and return
	with @code{rte} rather than @code{rts}.

	@item __near
	Currently ignored.

	@item __regargs
	Declare function to use the @option{-fastcall} ABI. The
	first arguments are passed in volatile registers.

	@item __saveds
	Load the pointer to the small data segment at
	function-entry. Applicable only to functions.

	@item __section(<string-literal>)
	Places the variable/function in a section named
	according to the argument.

	@item __stdargs
	Declare function to use the standard ABI (default),
	which passes all arguments on the stack.
	@end table

	@section Target-specific pragmas

	This backend offers the following #pragmas:

	@table @code

	@item #pragma stdargs-on
	Automatically declare the following functions with the
	@code{__stdargs} attribute.

	@item #pragma stdargs-off
	Stop automatically declaring the following functions with the
	@code{__stdargs} attribute.

	@end table

	@section Predefined Macros

	This backend defines the following macros:

	@table @code
	@item __AMIGADATE__
	This is set to current date as @code{"(DD.MM.YYYY)"},
	useful with version strings.

	@item __COLDFIRE
	(If a Coldfire CPU is selected.)

	@item __INTSIZE
	Is set to the size of the @code{int} type.
	Either 16 (vbccm68ks) or 32 (vbccm68k).

	@item __M680x0
	(Depending on the settings of @option{-cpu}, e.g.
	@code{__M68020}.)

	@item __M68881
	(If @option{-fpu=68881} is selected.)

	@item __M68882
	(If code for another FPU is selected;
	@option{-fpu=68040} or @option{-fpu=68060} will
	set @code{__M68882}.)

	@item __M68K__

	@item __SMALL_DATA__
	(If @option{-sd} is selected to enable small data.)
	@end table


	@section Stack

	If the @option{-stack-check} option is used, every function-prologue will
	call the function @code{__stack_check} with the stacksize needed by the
	current function on the stack. This function has to consider its own
	stacksize and must restore all registers.

	If the compiler is able to calculate the maximum stack-size of a
	function including all callees, it will add a comment in the
	generated assembly-output (subject to change to labels).


	@section Stdarg

	A possible @file{<stdarg.h>} could look like this:

	@example

	typedef unsigned char *va_list;

	#define __va_align(type) (__alignof(type)>=4?__alignof(type):4)

	#define __va_do_align(vl,type) ((vl)=(char )((((unsigned int)(vl))+__va_align(type)-1)/__va_align(type)__va_align(type)))

	#define __va_mem(vl,type) (__va_do_align((vl),type),(vl)+=sizeof(type),((type*)(vl))[-1])

	#define va_start(ap, lastarg) ((ap)=(va_list)(&lastarg+1))

	#define va_arg(vl,type) __va_mem(vl,type)

	#define va_end(vl) ((vl)=0)

	#define va_copy(new,old) ((new)=(old))

	#endif


	@end example

	@section Known problems

	@itemize @minus
	@item The extended precision of the FPU registers can cause problems if
	a program depends on the exact precision. Most programs will not
	have trouble with that, but programs which do exact comparisons
	with floating point types (e.g. to try to calculate the number
	of significant bits) may not work as expected (especially if the
	optimizer was turned on).
	@end itemize