[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 Writing a CPU backend

The CPU backend files are the heart of the assembler. They are the only parts of the assembler which actually know anything about the instruction set of the processor.

You must define a reasonably small list of macros and functions in the CPU backend files. You may define a large number of additional macros in the CPU backend files, not all of which are documented here. You must, of course, define macros in the `.h' file, which is included by every assembler source file. You may define the functions as macros in the `.h' file, or as functions in the `.c' file.

TC_CPU
By convention, you should define this macro in the `.h' file. For example, `tc-m68k.h' defines TC_M68K. You might have to use this if it is necessary to add CPU specific code to the object format file.

TARGET_FORMAT
This macro is the BFD target name to use when creating the output file. This will normally depend upon the OBJ_FMT macro.

TARGET_ARCH
This macro is the BFD architecture to pass to bfd_set_arch_mach.

TARGET_MACH
This macro is the BFD machine number to pass to bfd_set_arch_mach. If it is not defined, GAS will use 0.

TARGET_BYTES_BIG_ENDIAN
You should define this macro to be non-zero if the target is big endian, and zero if the target is little endian.

md_shortopts
md_longopts
md_longopts_size
md_parse_option
md_show_usage
md_after_parse_args
GAS uses these variables and functions during option processing. md_shortopts is a const char * which GAS adds to the machine independent string passed to getopt. md_longopts is a struct option [] which GAS adds to the machine independent long options passed to getopt; you may use OPTION_MD_BASE, defined in `as.h', as the start of a set of long option indices, if necessary. md_longopts_size is a size_t holding the size md_longopts. GAS will call md_parse_option whenever getopt returns an unrecognized code, presumably indicating a special code value which appears in md_longopts. GAS will call md_show_usage when a usage message is printed; it should print a description of the machine specific options. md_after_pase_args, if defined, is called after all options are processed, to let the backend override settings done by the generic option parsing.

md_begin
GAS will call this function at the start of the assembly, after the command line arguments have been parsed and all the machine independent initializations have been completed.

md_cleanup
If you define this macro, GAS will call it at the end of each input file.

md_assemble
GAS will call this function for each input line which does not contain a pseudo-op. The argument is a null terminated string. The function should assemble the string as an instruction with operands. Normally md_assemble will do this by calling frag_more and writing out some bytes (see section 2.4 Frags). md_assemble will call fix_new to create fixups as needed (see section 2.3 Fixups). Targets which need to do special purpose relaxation will call frag_var.

md_pseudo_table
This is a const array of type pseudo_typeS. It is a mapping from pseudo-op names to functions. You should use this table to implement pseudo-ops which are specific to the CPU.

tc_conditional_pseudoop
If this macro is defined, GAS will call it with a pseudo_typeS argument. It should return non-zero if the pseudo-op is a conditional which controls whether code is assembled, such as `.if'. GAS knows about the normal conditional pseudo-ops, and you should normally not have to define this macro.

comment_chars
This is a null terminated const char array of characters which start a comment.

tc_comment_chars
If this macro is defined, GAS will use it instead of comment_chars.

tc_symbol_chars
If this macro is defined, it is a pointer to a null terminated list of characters which may appear in an operand. GAS already assumes that all alphanumberic characters, and `$', `.', and `_' may appear in an operand (see `symbol_chars' in `app.c'). This macro may be defined to treat additional characters as appearing in an operand. This affects the way in which GAS removes whitespace before passing the string to `md_assemble'.

line_comment_chars
This is a null terminated const char array of characters which start a comment when they appear at the start of a line.

line_separator_chars
This is a null terminated const char array of characters which separate lines (null and newline are such characters by default, and need not be listed in this array). Note that line_separator_chars do not separate lines if found in a comment, such as after a character in line_comment_chars or comment_chars.

EXP_CHARS
This is a null terminated const char array of characters which may be used as the exponent character in a floating point number. This is normally "eE".

FLT_CHARS
This is a null terminated const char array of characters which may be used to indicate a floating point constant. A zero followed by one of these characters is assumed to be followed by a floating point number; thus they operate the way that 0x is used to indicate a hexadecimal constant. Usually this includes `r' and `f'.

LEX_AT
You may define this macro to the lexical type of the @ character. The default is zero.

Lexical types are a combination of LEX_NAME and LEX_BEGIN_NAME, both defined in `read.h'. LEX_NAME indicates that the character may appear in a name. LEX_BEGIN_NAME indicates that the character may appear at the beginning of a name.

LEX_BR
You may define this macro to the lexical type of the brace characters {, }, [, and ]. The default value is zero.

LEX_PCT
You may define this macro to the lexical type of the % character. The default value is zero.

LEX_QM
You may define this macro to the lexical type of the ? character. The default value it zero.

LEX_DOLLAR
You may define this macro to the lexical type of the $ character. The default value is LEX_NAME | LEX_BEGIN_NAME.

NUMBERS_WITH_SUFFIX
When this macro is defined to be non-zero, the parser allows the radix of a constant to be indicated with a suffix. Valid suffixes are binary (B), octal (Q), and hexadecimal (H). Case is not significant.

SINGLE_QUOTE_STRINGS
If you define this macro, GAS will treat single quotes as string delimiters. Normally only double quotes are accepted as string delimiters.

NO_STRING_ESCAPES
If you define this macro, GAS will not permit escape sequences in a string.

ONLY_STANDARD_ESCAPES
If you define this macro, GAS will warn about the use of nonstandard escape sequences in a string.

md_start_line_hook
If you define this macro, GAS will call it at the start of each line.

LABELS_WITHOUT_COLONS
If you define this macro, GAS will assume that any text at the start of a line is a label, even if it does not have a colon.

TC_START_LABEL
TC_START_LABEL_WITHOUT_COLON
You may define this macro to control what GAS considers to be a label. The default definition is to accept any name followed by a colon character.

TC_START_LABEL_WITHOUT_COLON
Same as TC_START_LABEL, but should be used instead of TC_START_LABEL when LABELS_WITHOUT_COLONS is defined.

NO_PSEUDO_DOT
If you define this macro, GAS will not require pseudo-ops to start with a . character.

TC_EQUAL_IN_INSN
If you define this macro, it should return nonzero if the instruction is permitted to contain an = character. GAS will call it with two arguments, the character before the = character, and the value of input_line_pointer at that point. GAS uses this macro to decide if a = is an assignment or an instruction.

TC_EOL_IN_INSN
If you define this macro, it should return nonzero if the current input line pointer should be treated as the end of a line.

TC_CASE_SENSITIVE
Define this macro if instruction mnemonics and pseudos are case sensitive. The default is to have it undefined giving case insensitive names.

md_parse_name
If this macro is defined, GAS will call it for any symbol found in an expression. You can define this to handle special symbols in a special way. If a symbol always has a certain value, you should normally enter it in the symbol table, perhaps using reg_section.

md_undefined_symbol
GAS will call this function when a symbol table lookup fails, before it creates a new symbol. Typically this would be used to supply symbols whose name or value changes dynamically, possibly in a context sensitive way. Predefined symbols with fixed values, such as register names or condition codes, are typically entered directly into the symbol table when md_begin is called. One argument is passed, a char * for the symbol.

md_operand
GAS will call this function with one argument, an expressionS pointer, for any expression that can not be recognized. When the function is called, input_line_pointer will point to the start of the expression.

tc_unrecognized_line
If you define this macro, GAS will call it when it finds a line that it can not parse.

md_do_align
You may define this macro to handle an alignment directive. GAS will call it when the directive is seen in the input file. For example, the i386 backend uses this to generate efficient nop instructions of varying lengths, depending upon the number of bytes that the alignment will skip.

HANDLE_ALIGN
You may define this macro to do special handling for an alignment directive. GAS will call it at the end of the assembly.

TC_IMPLICIT_LCOMM_ALIGNMENT (size, p2var)
An .lcomm directive with no explicit alignment parameter will use this macro to set p2var to the alignment that a request for size bytes will have. The alignment is expressed as a power of two. If no alignment should take place, the macro definition should do nothing. Some targets define a .bss directive that is also affected by this macro. The default definition will set p2var to the truncated power of two of sizes up to eight bytes.

md_flush_pending_output
If you define this macro, GAS will call it each time it skips any space because of a space filling or alignment or data allocation pseudo-op.

TC_PARSE_CONS_EXPRESSION
You may define this macro to parse an expression used in a data allocation pseudo-op such as .word. You can use this to recognize relocation directives that may appear in such directives.

BITFIELD_CONS_EXPRESSION
If you define this macro, GAS will recognize bitfield instructions in data allocation pseudo-ops, as used on the i960.

REPEAT_CONS_EXPRESSION
If you define this macro, GAS will recognize repeat counts in data allocation pseudo-ops, as used on the MIPS.

md_cons_align
You may define this macro to do any special alignment before a data allocation pseudo-op.

TC_CONS_FIX_NEW
You may define this macro to generate a fixup for a data allocation pseudo-op.

TC_INIT_FIX_DATA (fixp)
A C statement to initialize the target specific fields of fixup fixp. These fields are defined with the TC_FIX_TYPE macro.

TC_FIX_DATA_PRINT (stream, fixp)
A C statement to output target specific debugging information for fixup fixp to stream. This macro is called by print_fixup.

TC_FRAG_INIT (fragp)
A C statement to initialize the target specific fields of frag fragp. These fields are defined with the TC_FRAG_TYPE macro.

md_number_to_chars
This should just call either number_to_chars_bigendian or number_to_chars_littleendian, whichever is appropriate. On targets like the MIPS which support options to change the endianness, which function to call is a runtime decision. On other targets, md_number_to_chars can be a simple macro.

md_atof (type,litP,sizeP)
This function is called to convert an ASCII string into a floating point value in format used by the CPU. It takes three arguments. The first is type which is a byte describing the type of floating point number to be created. Possible values are 'f' or 's' for single precision, 'd' or 'r' for double precision and 'x' or 'p' for extended precision. Either lower or upper case versions of these letters can be used.

The second parameter is litP which is a pointer to a byte array where the converted value should be stored. The third argument is sizeP, which is a pointer to a integer that should be filled in with the number of LITTLENUMs emitted into the byte array. (LITTLENUM is defined in gas/bignum.h). The function should return NULL upon success or an error string upon failure.

TC_LARGEST_EXPONENT_IS_NORMAL
This macro is used only by `atof-ieee.c'. It should evaluate to true if floats of the given precision use the largest exponent for normal numbers instead of NaNs and infinities. precision is `F_PRECISION' for single precision, `D_PRECISION' for double precision, or `X_PRECISION' for extended double precision.

The macro has a default definition which returns 0 for all cases.

md_reloc_size
This variable is only used in the original version of gas (not BFD_ASSEMBLER and not MANY_SEGMENTS). It holds the size of a relocation entry.

WORKING_DOT_WORD
md_short_jump_size
md_long_jump_size
md_create_short_jump
md_create_long_jump
TC_CHECK_ADJUSTED_BROKEN_DOT_WORD
If WORKING_DOT_WORD is defined, GAS will not do broken word processing (see section 6. Broken words). Otherwise, you should set md_short_jump_size to the size of a short jump (a jump that is just long enough to jump around a number of long jumps) and md_long_jump_size to the size of a long jump (a jump that can go anywhere in the function). You should define md_create_short_jump to create a short jump around a number of long jumps, and define md_create_long_jump to create a long jump. If defined, the macro TC_CHECK_ADJUSTED_BROKEN_DOT_WORD will be called for each adjusted word just before the word is output. The macro takes two arguments, an addressT with the adjusted word and a pointer to the current struct broken_word.

md_estimate_size_before_relax
This function returns an estimate of the size of a rs_machine_dependent frag before any relaxing is done. It may also create any necessary relocations.

md_relax_frag
This macro may be defined to relax a frag. GAS will call this with the segment, the frag, and the change in size of all previous frags; md_relax_frag should return the change in size of the frag. See section 5. Relaxation.

TC_GENERIC_RELAX_TABLE
If you do not define md_relax_frag, you may define TC_GENERIC_RELAX_TABLE as a table of relax_typeS structures. The machine independent code knows how to use such a table to relax PC relative references. See `tc-m68k.c' for an example. See section 5. Relaxation.

md_prepare_relax_scan
If defined, it is a C statement that is invoked prior to scanning the relax table.

LINKER_RELAXING_SHRINKS_ONLY
If you define this macro, and the global variable `linkrelax' is set (because of a command line option, or unconditionally in md_begin), a `.align' directive will cause extra space to be allocated. The linker can then discard this space when relaxing the section.

TC_LINKRELAX_FIXUP (segT)
If defined, this macro allows control over whether fixups for a given section will be processed when the linkrelax variable is set. The macro is given the N_TYPE bits for the section in its segT argument. If the macro evaluates to a non-zero value then the fixups will be converted into relocs, otherwise they will be passed to md_apply_fix3 as normal.

md_convert_frag
GAS will call this for each rs_machine_dependent fragment. The instruction is completed using the data from the relaxation pass. It may also create any necessary relocations. See section 5. Relaxation.

TC_FINALIZE_SYMS_BEFORE_SIZE_SEG
Specifies the value to be assigned to finalize_syms before the function size_segs is called. Since size_segs calls cvt_frag_to_fill which can call md_convert_frag, this constant governs whether the symbols accessed in md_convert_frag will be fully resolved. In particular it governs whether local symbols will have been resolved, and had their frag information removed. Depending upon the processing performed by md_convert_frag the frag information may or may not be necessary, as may the resolved values of the symbols. The default value is 1.

TC_VALIDATE_FIX (fixP, seg, skip)
This macro is evaluated for each fixup (when linkrelax is not set). It may be used to change the fixup in struct fix *fixP before the generic code sees it, or to fully process the fixup. In the latter case, a goto skip will bypass the generic code.

md_apply_fix3 (fixP, valP, seg)
GAS will call this for each fixup that passes the TC_VALIDATE_FIX test when linkrelax is not set. It should store the correct value in the object file. struct fix *fixP is the fixup md_apply_fix3 is operating on. valueT *valP is the value to store into the object files, or at least is the generic code's best guess. Specifically, *valP is the value of the fixup symbol, perhaps modified by MD_APPLY_SYM_VALUE, plus fixP->fx_offset (symbol addend), less MD_PCREL_FROM_SECTION for pc-relative fixups. segT seg is the section the fix is in. fixup_segment performs a generic overflow check on *valP after md_apply_fix3 returns. If the overflow check is relevant for the target machine, then md_apply_fix3 should modify *valP, typically to the value stored in the object file.

TC_FORCE_RELOCATION (fix)
If this macro returns non-zero, it guarantees that a relocation will be emitted even when the value can be resolved locally, as fixup_segment tries to reduce the number of relocations emitted. For example, a fixup expression against an absolute symbol will normally not require a reloc. If undefined, a default of (S_FORCE_RELOC ((fix)->fx_addsy)) is used.

TC_FORCE_RELOCATION_ABS (fix)
Like TC_FORCE_RELOCATION, but used only for fixup expressions against an absolute symbol. If undefined, TC_FORCE_RELOCATION will be used.

TC_FORCE_RELOCATION_LOCAL (fix)
Like TC_FORCE_RELOCATION, but used only for fixup expressions against a symbol in the current section. If undefined, fixups that are not fx_pcrel or fx_plt or for which TC_FORCE_RELOCATION returns non-zero, will emit relocs.

TC_FORCE_RELOCATION_SUB_SAME (fix, seg)
This macro controls resolution of fixup expressions involving the difference of two symbols in the same section. If this macro returns zero, the subtrahend will be resolved and fx_subsy set to NULL for md_apply_fix3. If undefined, the default of ! SEG_NORMAL (seg) || TC_FORCE_RELOCATION (fix) will be used.

TC_FORCE_RELOCATION_SUB_ABS (fix)
Like TC_FORCE_RELOCATION_SUB_SAME, but used when the subtrahend is an absolute symbol. If the macro is undefined a default of 0 is used.

TC_FORCE_RELOCATION_SUB_LOCAL (fix)
Like TC_FORCE_RELOCATION_SUB_ABS, but the subtrahend is a symbol in the same section as the fixup.

TC_VALIDATE_FIX_SUB (fix)
This macro is evaluated for any fixup with a fx_subsy that fixup_segment cannot reduce to a number. If the macro returns false an error will be reported.

MD_APPLY_SYM_VALUE (fix)
This macro controls whether the symbol value becomes part of the value passed to md_apply_fix3. If the macro is undefined, or returns non-zero, the symbol value will be included. For ELF, a suitable definition might simply be 0, because ELF relocations don't include the symbol value in the addend.

S_FORCE_RELOC (sym, strict)
This macro (or function, for BFD_ASSEMBLER gas) returns true for symbols that should not be reduced to section symbols or eliminated from expressions, because they may be overridden by the linker. ie. for symbols that are undefined or common, and when strict is set, weak, or global (for ELF assemblers that support ELF shared library linking semantics).

EXTERN_FORCE_RELOC
This macro controls whether S_FORCE_RELOC returns true for global symbols. If undefined, the default is true for ELF assemblers, and false for non-ELF.

tc_gen_reloc
A BFD_ASSEMBLER GAS will call this to generate a reloc. GAS will pass the resulting reloc to bfd_install_relocation. This currently works poorly, as bfd_install_relocation often does the wrong thing, and instances of tc_gen_reloc have been written to work around the problems, which in turns makes it difficult to fix bfd_install_relocation.

RELOC_EXPANSION_POSSIBLE
If you define this macro, it means that tc_gen_reloc may return multiple relocation entries for a single fixup. In this case, the return value of tc_gen_reloc is a pointer to a null terminated array.

MAX_RELOC_EXPANSION
You must define this if RELOC_EXPANSION_POSSIBLE is defined; it indicates the largest number of relocs which tc_gen_reloc may return for a single fixup.

tc_fix_adjustable
You may define this macro to indicate whether a fixup against a locally defined symbol should be adjusted to be against the section symbol. It should return a non-zero value if the adjustment is acceptable.

MD_PCREL_FROM_SECTION (fixp, section)
If you define this macro, it should return the position from which the PC relative adjustment for a PC relative fixup should be made. On many processors, the base of a PC relative instruction is the next instruction, so this macro would return the length of an instruction, plus the address of the PC relative fixup. The latter can be calculated as fixp->fx_where + fixp->fx_frag->fr_address .

md_pcrel_from
This is the default value of MD_PCREL_FROM_SECTION. The difference is that md_pcrel_from does not take a section argument.

tc_frob_label
If you define this macro, GAS will call it each time a label is defined.

md_section_align
GAS will call this function for each section at the end of the assembly, to permit the CPU backend to adjust the alignment of a section. The function must take two arguments, a segT for the section and a valueT for the size of the section, and return a valueT for the rounded size.

md_macro_start
If defined, GAS will call this macro when it starts to include a macro expansion. macro_nest indicates the current macro nesting level, which includes the one being expanded.

md_macro_info
If defined, GAS will call this macro after the macro expansion has been included in the input and after parsing the macro arguments. The single argument is a pointer to the macro processing's internal representation of the macro (macro_entry *), which includes expansion of the formal arguments.

md_macro_end
Complement to md_macro_start. If defined, it is called when finished processing an inserted macro expansion, just before decrementing macro_nest.

DOUBLEBAR_PARALLEL
Affects the preprocessor so that lines containing '||' don't have their whitespace stripped following the double bar. This is useful for targets that implement parallel instructions.

KEEP_WHITE_AROUND_COLON
Normally, whitespace is compressed and removed when, in the presence of the colon, the adjoining tokens can be distinguished. This option affects the preprocessor so that whitespace around colons is preserved. This is useful when colons might be removed from the input after preprocessing but before assembling, so that adjoining tokens can still be distinguished if there is whitespace, or concatenated if there is not.

tc_frob_section
If you define this macro, a BFD_ASSEMBLER GAS will call it for each section at the end of the assembly.

tc_frob_file_before_adjust
If you define this macro, GAS will call it after the symbol values are resolved, but before the fixups have been changed from local symbols to section symbols.

tc_frob_symbol
If you define this macro, GAS will call it for each symbol. You can indicate that the symbol should not be included in the object file by defining this macro to set its second argument to a non-zero value.

tc_frob_file
If you define this macro, GAS will call it after the symbol table has been completed, but before the relocations have been generated.

tc_frob_file_after_relocs
If you define this macro, GAS will call it after the relocs have been generated.

md_post_relax_hook
If you define this macro, GAS will call it after relaxing and sizing the segments.

LISTING_HEADER
A string to use on the header line of a listing. The default value is simply "GAS LISTING".

LISTING_WORD_SIZE
The number of bytes to put into a word in a listing. This affects the way the bytes are clumped together in the listing. For example, a value of 2 might print `1234 5678' where a value of 1 would print `12 34 56 78'. The default value is 4.

LISTING_LHS_WIDTH
The number of words of data to print on the first line of a listing for a particular source line, where each word is LISTING_WORD_SIZE bytes. The default value is 1.

LISTING_LHS_WIDTH_SECOND
Like LISTING_LHS_WIDTH, but applying to the second and subsequent line of the data printed for a particular source line. The default value is 1.

LISTING_LHS_CONT_LINES
The maximum number of continuation lines to print in a listing for a particular source line. The default value is 4.

LISTING_RHS_WIDTH
The maximum number of characters to print from one line of the input file. The default value is 100.

TC_COFF_SECTION_DEFAULT_ATTRIBUTES
The COFF .section directive will use the value of this macro to set a new section's attributes when a directive has no valid flags or when the flag is w. The default value of the macro is SEC_LOAD | SEC_DATA.

DWARF2_FORMAT ()
If you define this, it should return one of dwarf2_format_32bit, dwarf2_format_64bit, or dwarf2_format_64bit_irix to indicate the size of internal DWARF section offsets and the format of the DWARF initial length fields. When dwarf2_format_32bit is returned, the initial length field will be 4 bytes long and section offsets are 32 bits in size. For dwarf2_format_64bit and dwarf2_format_64bit_irix, section offsets are 64 bits in size, but the initial length field differs. An 8 byte initial length is indicated by dwarf2_format_64bit_irix and dwarf2_format_64bit indicates a 12 byte initial length field in which the first four bytes are 0xffffffff and the next 8 bytes are the section's length.

If you don't define this, dwarf2_format_32bit will be used as the default.

This define only affects .debug_info and .debug_line sections generated by the assembler. DWARF 2 sections generated by other tools will be unaffected by this setting.

DWARF2_ADDR_SIZE (bfd)
It should return the size of an address, as it should be represented in debugging info. If you don't define this macro, the default definition uses the number of bits per address, as defined in bfd, divided by 8.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated by system on December, 2 2004 using texi2html