SB-Projects - SB-Assembler

DIY Crosses

So you would like to build your own cross overlay. That's great news. Let me explain how it's done be walking you through the steps I took to write the SC/MP cross overlay. This should cover most of the skills needed to roll your own cross overlay.
You can also study the source listings of the cross overlays I have written if you want to learn some more about how it all works.

And please consider sharing your own Cross Overlay with me. I can include it into the SB-Assembler package so that everyone can enjoy your work. I will of course give you full credit if you do share it.

If you're not interested in creating your own Cross Overlay you can safely skip this page. However reading the page anyway will give you some understanding on how the SB-Assembler works.

O, and by the way, creating your own Cross Overlays is only possible for Version 3 of the SB-Assembler. Version 2 is a bit too complicated to explain and it would require an ancient 8086 assembler to assemble the program.

Tools

The SB-Assembler V3 is written in Python, which is an interpreter language. So if you can run the assembler you already have most of the tools required to write your own Cross Overlays for the SB-Assembler.

It will come as no surprise that you'll need a text editor too. But I don't think I'll have to explain that to you, because you need one to write your assembly source codes too.
There's one thing I should mention though. I prefer to use spaces for indentation, instead of tabs. The entire SB-Assembler is written with space indentation. And because indentation is not just for decoration in Python, it is important that you use spaces too. You could use tabs, but you may run into situations where you might have to adjust some indentation to make it work properly again.
So the easiest way to be compatible is to set your text editor to use leading spaces for indentation.

Finally a few small remarks about my coding style:
Global variable names are in CamelCase_With_Underscore.
My function names are in CamelCaseWithoutUnderscores.
Local variables are in all lower case.

If you stick to this convention it'll quite easy to distinguish function names and local and global variables from each other.

There's one more little tool inside the SB-Assembler. Normally the assembler will catch programming errors in the Cross Overlay. This means that you won't get detailed error information while you're still working on your Cross Overlay.
You can disable this error catching, so you will again see the detailed information Python provides to you whenever it encounters an error. This is simply done by including the Cross Overlay using the .CRD directive. The extra D behind the CR directive obviously enables the debugging mode.

Template

To make it easier on you (and myself) I have provided a Cross Overlay template, called cr0000.py. You can start your own Cross Overlay by making a copy of this template. The template itself contains all the mandatory entry point functions (the function Help and all the functions beginning with Cross).
You'll have to make some changes to most of these functions. I'll describe these functions below.

The template contains ample comments, explaining what is expected from you. Once you have read these comments (and acted accordingly) you may delete these comments.

Naming Your Cross Overlay

Every cross overlay file name should start with the characters cr, which stand for cross. These two letters are followed by the actual name of the Cross Overlay. Finally every file name should end with the .py extension. In our example for the SC/MP processor I'm going to call the overlay file crscmp.py. I'll simply forget about the slash in the processor name because that will obviously cause problems in naming a file.
Later you dynamically link this overlay by using the .CR directive, followed by the cross name, which is scmp in our case.

Don't make the file names too long, because that might upset our good old plain DOS users, who are still restricted to the 8.3 naming convention. I also recommend to use lower case characters only in the file name.

Default Changes

Before we can begin to write our own Cross Overlay there are some default changes to be made to the template. Let's start with the header, it should reflect the current project of course. You may add some more information in there if you like, for instance the author's name and date of creation.

#------------------------------------------------------------------------------
#
#   crscmp.py
#
#   Package module file for the SB-Assembler sbasm
#   See www.sbprojects.net for details
#
#	Author: San Bergmans
#	Date  : 2015-12-21
#
#   Cross Overlay for the SC/MP micro processor
#
#------------------------------------------------------------------------------

Next comes the version numbers. There are two. The variable crossversion specifies the current version number of the Cross Overlay. I recommend to begin with version 3.00.00 for the initial release of the Cross Overlay. And increment the last part of the version number for bug fixes, after you have officially released the Cross Overlay. The number in the middle can be incremented if you have added new features to the Cross Overlay.

crossversion = '3.00.00'
minversion = '3.00.00'

The variable minversion specifies the minimum version number of the SB-Assembler required to run this cross overlay. Normally this would be the main version number, version 3.00.00. Unless you need a specific feature from the main SB-Assembler which was added or modified after the initial release. Usually you keep the entire sbapack together, so you may safely enter your current version number here.

Entry Point Functions

Each cross overlay has 5 entry point functions. These functions must be present in all cross overlays, even if they don't have to do anything. The SB-Assembler will call these functions to perform cross overlay specific tasks.

def Help()

The SB-Assembler is prepared for a built in help system. But that's as far as I've got to, the preparation. This function will allow you to add cross specific help pages.
However, since the help system hasn't been implemented yet, it's no use to change this empty function yet.

You'll find the main help function in the file help.py. And as promised, it is still empty.

def CrossInit()

This function is called every time the .CR directive is executed and thus the Cross Overlay is loaded into memory. I think it's quite obvious what the purpose of this function is. It is to initialize the freshly loaded Cross Overlay.
Here's what needs to be initialized:

dec.Asm.Instructions is a Global dictionary containing a list of all the possible processor Mnemonics. This dictionary is used by the line parser to find out how to process the given Mnemonics.
Every item of this dictionary is a key name, followed by a tupple which contains the details about this instruction. The tupple should at least hold the name of the function which knows how to parse the given instruction as a first element. For the rest it may contain the opcode, or a tupple of opcodes for this instruction, and it can hold the instruction time or a tupple of instruction times for this instruction.
In any case this information represent the opcode and execution times of the instructions. Some instructions translate directly into an opcode, while others may have to be combined with one or more parameters to get the complete opcode. Still other instructions can yield into several different opcodes, depending on the addressing modes supported by the instruction.
All this information should be present in the tupple of this dictionary and will be decoded by the named instruction later on in the CrossMnemonic function.

The code below shows you some examples, taken from the AVR Cross Overlay. Usually I tend to group the Mnemonics according to the supported operand types or addressing modes. This is not a strict rule though, you might prefer to order all instructions in alphabetical or even in random order if you like.

# Instructions requiring two registers
    'ADC'   : (RegReg,4+2+1,int('1C00',16),'1'),
    'ADD'   : (RegReg,4+2+1,int('0C00',16),'1'),

# Instructions requiring an instruction and immediate data
    'ADIW'  : (RegImm,4+0+0,int('9600',16),'2'),
    'ANDI'  : (RegImm,4+2+1,int('7000',16),'1'),

# Instructions requiring one register
    'ASR'   : (RegOnly,4+2+1,int('9405',16),'1'),
    'CLR'   : (RegOnly,4+2+1,int('2400',16),'1'),

# Instructions using relative jump address
    'BRBC'  : (RelJmp,4+2+1,int('F400',16),'1/2'),
    'BRBS'  : (RelJmp,4+2+1,int('F000',16),'1/2'),

# Call and Jump instructions
    'CALL'  : (CallJmp,4+0+0,int('940E',16),'4/5'),
    'JMP'   : (CallJmp,4+0+0,int('940C',16),'3'),

# Bit manipulating instructions
    'BLD'   : (BitInst,4+2+1,int('F800',16),'1'),
    'BST'   : (BitInst,4+2+1,int('FA00',16),'1'),

Anyway I think you'll get the idea if you study the CrossInit() function of some of the available Cross Overlays.

dec.Asm.Timing_Length is a variable which holds the maximum string length of the timing field in the instruction dictionary. Simply find the longest string in your dictionary table, count the number of characters and assign this number to this variable.
Your assembly listings may look a bit messy if you set a wrong value here.

dec.Asm.Memory is normally set to 0, it selects output to Program memory which is what you normally want.

dec.Asm.Max_Address holds the maximum memory size of the processor. Our SC/MP can address up to 64kB, which is the same as a 16 bit address range.

dec.Asm.PP_TA_Factor holds the number of bytes per opcode. Older 8 bit processors usually store one byte per opcode, just like our SC/MP does. Modern micro controllers, like the PIC and AVR store multiple bytes per opcode.
This variable actually determines the PC to Target Address ratio.

dec.Flags.BigEndian = False holds the endian mode of the processor. We're a bit in the dark here because the SC/MP doesn't really specify how 16 bit words are stored. Therefore I opted for the most commonly used Little Endian mode, which means that low order bytes are store before high order bytes in 16 bit words.

errors.Error_List is an array which can hold error messages. You can add extra error messages, specifically for your Cross Overlay. Below you see an example of three extra error messages added for the SC/MP processor.

errors.Error_List[dec.Cross.Name + 'pagex'] = 'Instruction crossed a page boundary'
errors.Error_List[dec.Cross.Name + 'pagebeg'] = 'Instruction starts at page boundary'
errors.Error_List[dec.Cross.Name + 'offset'] = 'Offset is -128, E register conflict'

You can name the error anything you want, but I would recommend that you prefix the name with the Cross Overlay name to avoid conflicts with other error messages.

You may want to set some other variables which hold values which have to be used per default for your Cross Overlay. Remember that the SB-Assembler is a two pass assembler. So you must make sure it enters the second pass in exactly the same state as the first pass.

def CrossDirective()

Every time the SB-Assembler finds a dot as the first character in the opcode field it knows it has to decode a directive. The two characters following the dot form the name of the directive. The SB-Assembler knows a whole bunch of directives already. But sometimes a Cross Overlay may want to add some more of its own. At other times the behaviour of a directive has to change from the default behavriour.

Therefore the assembler will transfer control to this function first.
If the Cross Overlay isn't interested in changing the behaviour of the given directive the function simply returns False, indicating that the directive has not been parsed yet.
But if the Cross Overlay does want to handle this directive it should do so, whether it's a new directive or an existing one. At the end the function should return True, indicating to the SB-Assembler that the directive has already been parsed.

Here's a piece of code taken from the 8048 Cross Overlay, which demonstrates how to add the extra directives .OT and .CT.

    global Asm

    if len(dec.Asm.Mnemonic) > 1:
        directive = dec.Asm.Mnemonic[1:3].upper()
    else:
        directive = dec.Asm.Mnemonic

    if directive == 'CT':
        DirCT()
        return True

    if directive == 'OT':
        DirOT()
        return True

    return False

def CrossCleanUp()

This one is easy. This function is called just before ending the assembler, or when you switch to a different Cross Overlay. The purpose of this function is to give the Cross Overlay a last chance of cleaning up.

An example is given in the cravr.py Cross Overlay. An AVR stores 2 bytes per instruction. However it can store single bytes of data in Flash memory. Therefore you may have stored just half a word of data at the end of the program. That's where the clean up function comes in, it will store a second padding byte to make the word complete so it can be written to Flash memory. Otherwise the last byte of data would have been lost.

Another example is the 8048 assembler. If you open a table, just before ending the program, you might forget to close the table. Thus you will never get a warning when the table crosses a page boundary. I have added one line in this function, which takes care of closing the open file for you.

Normally this function may remain empty though.

def CrossMnemonic()

This is the work horse of the Cross Overlay. There's no need to change things here though.

Every non empty assembly line which doesn't contain a directive should contain a mnemonic. The SB-Assembler calls this function to parse this mnemonic. The function tries to find the mnemonic in the dec.Asm.Instructions dictionary which is defined in the CrossInit function.
If it finds the mnemonic control is transferred to the function which belongs to this particular mnemonic. That function will then parse the remainder of the assembly line to find all the expected operands.
If the mnemonic wasn't found in the dictionary an error is raised informing the programmer that he/she has used an unknown mnemonic.

def MissingOperand()

This function checks to see if a parameter is given in the operand field. If not it will raise an error and return True. If an operand is present, it returns False, indicating that there's no error, yet.

def NoMore()

Call this function if you don't expect any more operands on your assembly line. A warning is given if this function finds additional operands anyway. This is not necessarily an error as we can simply ignore unexpected operands. However it is advised to check your code again in order to make a clean program.

Important Variables And Functions

So far I have only described the contents of the Cross Overlay template. However the main SB-Assembler core contains quite a number of interesting variables and functions. As you can see in the beginning of the Cross Overlay template these variables and functions are all defined in the files assem.py, dec.py, errors.py and target.py.
Below I will describe some of the most important variables and functions, which can be used to create your own Cross Overlays.

dec.py

This is where all globally used variables are declared. I highly recommend you have a look at that file, so you'll know what variables are at your disposal already. I will only present a small expert here.

Default values

These are some default values, which declare the main SB-Assembler's version number, the default environment variable name and the exit codes for the various error levels. None of these are of much interest for you as Cross Overlay programmer though.

Some constants

These constants may be interesting for you. I guess that the comments behind the constants adequately explain the purpose of each one of them.

Global variables

This is where it gets interesting. I deliberately joined all the global variables into a single struct. This makes it a lot easier for you to import them in your name space. There are plenty examples in the source files which explain how to do that.

Again there are ample comments behind all the variable names. When in doubt you can always find some examples in the existing Cross Overlays.

Cross overlay functions

You can ignore this block. It holds the dynamically assigned function names and is filled automagically when a new Cross Overlay is selected by the .CR directive. That's why you shouldn't rename the functions which start with "Cross" in the template.

Flags

Again an interesting block of variables for you. They are also contained in a struct, which can be imported into your name space in one blow.

assem.py

First of all I think this file can be educational in case you want to know how the SB-Assembler works. Further more the file contains some interesting functions which can come in handy when you are creating your own Cross Overlay.

FindNextNonSpace():

This function searches the current assembly line for the next non white space character. Normally it will return a pointer which points to the next non white space character in the current assembly line. Unless there isn't one, in which case it will return -1.

IncParsePointer():

This simple function will increment the parse pointer by one character, unless we're already at the end of the line.

NowChar():

This function will return the character currently pointed to by the Parse_Pointer. If you pass it a True it will also increment the parse pointer after fetching the character.
Remember that the returned character can be either upper or lower case. You can force the output to upper case by using assem.NowChar(True).upper().

MoreParameters():

Call this function if you want to know whether more parameters follow the one you have already parsed. This is when the current parse pointer points to a comma, or a comma followed by a space.
It will generate an error if the current character is neither a space or a comma.

GetWord():

This function is a bit more complicated, but therefore quite versatile. It returns a word of text from the parse line. An empty string is returned if the first character of the word starts with one of the comment markers. A typical use case for this function is when you are parsing register names for instance.
With the optional parameters legal1 and legal2 you can control what characters are legal at the beginning and the rest of the word. Collecting characters for the word is stopped when a character is found which is not included as a legal2 character.
If you don't specify the optional parameter endchars the word ends normally when a space or a comma is encountered.
Remember that the returned word can contain either upper or lower case characters. You can force the output to upper case by using assem.GetWord().upper().

EvalExpr():

This function evaluates an expression into a value. Every time you need a number, a mask, a value, an offset, or an address you can call this function. It doesn't matter whether the user provided a number in any radix, a label, predefined values, or more or less complex expressions, they are all reduced to a single 64 bit value.
The function returns a tupple. The first element of the tupple is the 64 bit result of the expression found on the current parse position of the assembly line. The second element is a flag, indicating whether a forward referenced label was used in the expression. The third and final element is the memory mode which is used (normally it is 0 for program memory).

errors.py

Inevitably there comes a time when the SB-Assembler encounters an error in the assembly program it is currently processing. Two functions in the errers.py file can be used to raise errors and warnings. The file also contains a dictionary which contains all the predefined error and warning texts.

Error_List

This is the dictionary which holds all the possible error texts. As described above you can expand this dictionary if you need specific texts which are not yet defined.

DoError():

Whenever you encounter a programming error in the source code call this function and pass it the name of the error and a flag. You can choose from any of error names in the Error_List dictionary.
The flag you provide tells the assembler whether it is a fatal error or not. When the flag is True the error is fatal, causing the assembler to quit immediately.

DoWarning():

Sometimes an error is too severe, and you can do with a warning instead. Use this function to raise a warning instead of an error. You can choose from any of error names in the Error_List dictionary.
The flag you provide tells the assembler whether it is a fatal warning or not. When the flag is True the error is fatal, causing the assembler to quit immediately.

target.py

And because the whole purpose of the Cross Overlay is to generate code we'll need a way to store this code to our target file. That's what we need the target.py file for.

CodeByte():

This is a wrapper function around the SaveByte() function. It takes the same parameters as the SaveByte() function. Its purpose is to write a single byte to program memory and to give a warning (only once) if the currently selected memory is not program memory.
This is usually the function to use if you want to save a single byte, for instance an instruction or a data byte, to the target file.

CodeWord():

Again this a wrapper function around the SaveWord() function. It takes the same parameters as the SaveWord() function. Its purpose is to write a 2 byte word to program memory and to give a warning (only once) if the currently selected memory is not program memory. Writing the 2 byte word respects the endian model selected by the Cross Overlay.
This is usually the function to use if you want to save a data word or address to the target file. Saving 2 byte instructions is best done by calling CodeByte() twice to avoid endian confusion.

CodeLong():

Uhm, well, I guess you can figure out by now what this function does. Hint, it saves 4 bytes of code to program memory.

SaveByte():

This function saves one byte to target memory, whether it is program memory, EEPROM memory or RAM. It takes one or two parameters. The first parameter is mandatory and holds the byte to be saved.
The second parameter is optional and determines if the byte has to be listed in the listing which is created during assembly. If the parameter is omitted the byte is listed. So if you don't want the byte to be listed you should pass False as second parameter.
The purpose of this list flag is to give you the opportunity not to list long series of bytes if you don't want to. It has nothing to do with the listing control offered by the .LI directive though.

SaveWord():

This function does the same as the SaveByte() function, with the exception that it stores a 2 byte word instead of a single byte. It accepts the same parameters. Bytes are stored in the endian model selected by the Cross Overlay.

SaveLong():

This function does the same as the SaveByte() function, with the exception that it stores a 4 byte long words instead of a single byte. It accepts the same parameters. Bytes are stored in the endian model selected by the Cross Overlay.

BoundarySync():

When you are creating a Cross Overlay for a processor which stores more than 1 byte per instruction, like the AVR or PIC micro controllers, it is important to start saving every instruction on the right boundary. Suppose you have saved an odd number of data bytes, prior to storing the double byte instructions. Then the instruction words would not be aligned properly.
By invoking this function, prior to saving the next instruction, you can make sure that the address pointer is aligned properly again. It stores some stuffing bytes only if the address pointer is miss aligned. If the address pointer was already aligned it does nothing.

Some Final Remarks

Range Checking

Some times you may want to check whether the value of an expression falls withing a certain range. This is mainly true for branch instructions, which usually can't branch further than +127 or -128 bytes from the current position. But what I am about to say about range checking is true for any sort of range checking, no matter what its purpose is.
Obviously you can't check a range if an undefined label is used in the expression which evaluates to the value to be checked. If at least one of the labels is unknown, the final answer will be unknown.

In order to understand the implications about this you need to know how a 2 pass assembler works. In the first pass the assembler simply counts the number of bytes which will be generated by the program. At the same time all labels gradually get their values. Finally, at the end of pass 1, all labels should have received their values. Labels won't get assigned new values during pass 2. Only their value will be compared to the value they've received during pass 1 to make sure there are no errors in the assembler. It goes without saying that the assembler must generate exactly the same amount of code during both passes.

Whenever the assembler wants to use a label's value it will know whether the label is already known to the assembler. If the value is not known yet it will receive a forward reference flag, telling the assembler that the value was not known yet in pass 1. The label should have a value in pass 2 though, otherwise the assembler will report an undefined label error.

You can expect 2 problems with labels which are forward referenced:

The label's value is unknown in pass 1, so you can't check whether it falls withing any legal range. So before you check the value of an expression in pass 1, check to see whether it makes sense to check it's value by examining the forward referenced flag fisrt (contained in the result tupple).
An other way to attack this issue is to check the range in pass 2 only, regardless whether there is a forward referenced label or not. The only down side to this approach is that the assembler may only report a range error in pass 2 for non forward referenced labels. But that is not a problem.
Because the range can not be checked, the assembler can not make the decision on how many address bits it needs for processors which support a variable length addressing modes. This addressing mode is usually called Zero Page or Direct addressing mode.
So because the assembler can't decide yet, it must assume the worst case scenario and use the longest addressing mode available. This should be done both in pass 1 and pass 2 of course, otherwise the assembler will generate different amounts of code in the two passes.

Optional Parameters

Sometimes an instruction or a directive may have an optional parameter. It may be there, or it may be absent. Because comment fields at the end of each line are automatic in the SB-Assembler we have to set a limit to the maximum distance of such an optional parameter. When will it be the optional parameter, or when will it be a comment field?

I have stated that an optional parameter should start within 10 spaces from the end of the mnemonic or directive. If there are more than 10 spaces any following text will be regarded as comments. A comment field may start sooner if you like, but the programmer would have to use a comment separator as first character of the comment field to do so.

So if you intend to allow for an optional parameter, you can simply check the dec.Asm.Optional boolean variable. If this boolean flag is false, the optional parameter may be considered empty. If it's true you can start parsing the parameter.

Storing Bytes To The Target File

All data generating standard directives save their bytes through the vector dec.Asm.SaveByte. You can change this vector in order to modify the data before saving it to the target file if your Cross Overlay needs it.
One obvious application for this would be some PIC Cross Overlays. PIC processors tend to save bytes as RETLW instructions. Changing the vector to go through your code allows you to modify all generated data bytes for CODE memory to represent the appropriate RETLW instructions.

You don't have to bother about restoring the original vector as a Cross Overlay cleaning action. The SB-Assembler will ensure that the vector is reset to its default value every time an other Cross Overlay is loaded.

End of Parsing

If you have parsed your entire line and found no errors on it you're done with this line and you go on to the next. Wait a minute. There were no errors on the assembly line so far. But are you sure the programmer didn't make a mistake at the end?
If you study the next two lines you'll know what I mean:

        LD  A,B     This one is correct
        LD  A,B,C   Too many parameters given

If your Cross Overlay doesn't look beyond the second operand, it doesn't notice that there is an unexpected 3rd operand. You can easily check if no unexpected operands follow by calling the function NoMore(). This function will automatically raise an error if unexpected extra parameters do follow. Otherwise the function does nothing at all.

Testing

Testing is probably the most boring, difficult and tedious task for a programmer. Unfortunately it won't be any different for programming Cross Overlays. What I usually do is write an assembly source code which handles all possible instructions in combination with all possible addressing modes. But doing just that is not enough to thoroughly test your code.
You'll also have to test if the assembler reacts in the right way when you feed it unsupported code. It must raise errors when values exceed beyond predefined ranges. And you''ll have to find the pain points which might give different results depending on the range of specific values.

Usually I do all the tests, one by one. Finally I end up with an assembly program which does everything right for all possible instruction/addressing mode combinations. The final file won't do anything it is not allowed to. In other words it won't generate errors. But that doesn't mean that those tests haven't been performed.

So if you are writing your own Cross Overlay, please also spend the time to cover all possible code decisions in your program.

Navigation

How to navigate