03 TAJGA FASM Tutorial

CHAPTER 3 - Labels & Addresses & Variables

Okay, let's get to variables. In previous chapter i wrote that variable is general term for space which stores some value. Registers are variables for example. But there is limited number of registers (VERY limited, some 8 + few special), and this is nearly always not enough. For this reason memory (RAM - random access memory) is used.

NOTE: when someone says "variable" he almost always means memory variable.

3.1. Labels
Problem is that you have to know WHERE in memory is some value stored. Position in memory (called "address") is given by number. But it is quite hard to remember this number (address) for every variable.

term: address
number which gives position in memory

Another problem with addresses is that when you change your program, address can be changed too, and so you would have to correct this number everywhere where it is used. For this reason addresses are represented by "labels". Label is just some word (not string, it is not enclosed in apostrophes), which, in your program, represents address in memory. When you compile your program, compiler will replace label with proper address. Label consists of alphabet characters ("a" to "z", "A" to "Z") numbers ("0" to "9"), underscores ("_") and dots ("."). But first character of label can't be number or dot. Label also can't have same name as directive or instruction (instruction mnemonics). Labels are case sensitive in FASM ('a' is NOT same as 'A'). Example of labels:

`name`	is label
`a`	is label
`A`	is label, different from "a"
`name2`	is label
`name.NAME2`	is label
`name._NAME2`	is label
`_name`	is label
`_`	is label
`.name`	is not label, because is starts with dot (labels starting with dot have special meaning in FASM, which you will learn later)
`1`	is not label because it starts with number
`1st_name`	is not label for same reason
`name1 name2`	is not label, because it contains space
`mov`	is not label, because "mov" is instruction mnemonics

term: label
Placeholder for some address, eg. placeholder for some number, because address is number. In FASM you can use label same way as any other number (not really, but it doesn't matter for you too much now).

You can define label using directive "label". This directive should be followed with label itself (label name). For example:

`label name`	is label definition, it defines label "name"
`label _name`	is label definition, it defines label "_name"
`label label`	is not label definition, because "label" can't be name of label as decribed in previous paragraph

this will define label that will represent address of data defined behind it

directive: label

label definition
directive label followed by label name

Shorter way to define label is just writing label name followed by colon (:)

name:
_name:

but we won't use this way for some time.

3.2. Variable definition
Now how we can return to problem with variables: how to define variable in memory. Program you create (compiled program, in machine code) is loaded to memory at execution time, where processor executes it instruction by instruction. Look at this program:

org 256
mov al,10
db 'this is a string'
int 20h

This program will probably crash, because after processor executes

mov
al,10

then it reaches string. But in program there is no difference between string and instructions in machine code. Both are translated into array of numeric values (bytes). There is no way processor can differ whether numeric value is translation of string or translation of instruction. In this example, processor will execute instructions whose numeric representation (in machine code) is same as ASCII representation of string "this is a string". Now look at this:

org 256
mov al,10
int 20h
db 'this is a string'

This program will not crash, because before reaching bytes defined by string processor reaches instruction int 20h, which ends execution of program. So bytes defined with string will not be executed, it will just take some space. This is way how you can define variable - define some data at place where processor won't try to execute it (behind int 20h in this case).

So code with byte-sized variable of value 105

org 256
mov al,10
int 20h
db 105

Last line defines byte variable containing 105.

Now how to access variable? First we must know address of variable. For this we can use label (described above, reread it if you have forgotten)

org 256
mov al,10
int 20h
label my_first_variable
db 105

So we already know address of variable. It is represented by label my_first_variable. Now how to access it? You may think it is, for example

mov al,my_first_variable

but no! Remember i told that label (my_first_variable in this case) stands for address of variable. So this instruction will move address of variable to al register, not variable's contents. To access contents of variable (or contents of any memory location) you must enclose it's address in brackets ([ and ]). So to access contents of our variable, and copy it's value to al we use

mov al,[my_first_variable]

Now we will define two variables:

org 256
<some instructions>
int 20h
label variable1
db 100
label variable2
db 200

So to copy value of variable1 to al we use

mov al,[variable1]

To copy al to variable1 use

mov [variable1],al

To set value of variable1 (exact: to set value of variable which is stored at address represented by variable1) to 10 we could try

mov [variable1],10

but this will cause error (try it if you want). Problem is that you know that you are changing variable at address variable1 to 10. But what is size of variable? In previous two cases byte-size could be determined because you used al register which is byte sized, so compiler decided that variable at variable1 is byte sized too, because you can't move between operands with different sizes. But in this case, value 10 can be of any size, so it can't decide size of memory variable. To solve this we use "size operators". We will talk about two size operators for now: byte and word. You can put size operator before instruction operand when accessing it to let compiler know what the variable size is:

mov byte [variable1],10

Another way to make this is

mov [variable1], byte 10

in this case compiler knows that moved value 10 is byte sized so it decides that variable is byte-sized too (because we can move byte sized value only to byte sized variable).

But it would be hard to always remember and always write size of variable when you access it. For this reason you can assign size of variable to label when you define it. Just write size operator behind label name in definition:

label variable1 byte
db 100

label variable1 word
dw 1000

now everytime you use [variable1] it will have same meaning as byte [variable1] (or word [variable1] in second example). So mov [variable1],10 will work, in first case it will store value 10 to byte at address variable1, in second case it will store to word.

size operator

NOTE: You can't move value between variables with different size:

mov byte [variable1], word 10

mov [variable1],al
...
label variable1 word
dw 0

NOTE: You can't access two memory locations in one instruction (except for same special instructions). This is wrong, it won't be compiled:

mov [variable1],[variable2]

use this:

mov al,[variable2]
mov [variable1],al

This will cause you some problems in the beginning but it will force you to write faster code, and that is biggest reason to code assembly.

NOTE: size operator assigned to label at definition has lower priority than size operator before access to variable in instruction, so:

mov byte [variable],10
label variable word
dw 0

will access BYTE, while

mov [variable],10

will access WORD.

I think you noticed that having two lines to define one variable is little too much. There is a shorter way to define variable:

variable1 db 100

is same as

label variable1 byte
db 100

notice that size of variable is defined too. In general, if data definiton (using db or dw directive) is preceded by label, then it will define this label too, and assign size of defined data as size of label. It can be used with words too

variable2 dw 100

Some example of using variables:

mov ah,2
mov dl,[character_to_write]
int 21h
int 20h
character_to_write db 'a'

3.3. Addresses and basics of segmentation
Now we will discuss addresses little more. I have told that address is number (!) which gives some position in memory. You have learnt how to represent this number with labels, so numeric addresses were maintained by compiler. But you still don't know anything about format of this number. I will try to explain it a little in this chapter.

As you probably know, data in memory are stored in "bits" which can have value 0 or 1. You can consider memory as a (one dimensional) array of bits. 8 consecutive bits make one byte. Address is number (index, position in array) of byte. For example address "0" is address of first bit of memory (or address of first byte), address "1" is address of eight bit (or address of second byte) of memory etc. Easiest to comprehend is to take memory as (one dimensional) array of bytes

Address in .COM files is word-sized number, so

label var1
<some data>
mov al,var1

is wrong. It may work if var1 is lesser than 256 so it fits into byte sized register, but in general, store addresses in word-sized variables, we will talk about them little later.

Now some examples on addresses. Check this file

label variable1
db 10
label variable2
db 20
label variable3
db 30

here address represented by variable1 is 0, variable2 stands for 1, variable3 is 2.

OK, this looks nice but it is not true at all. Problem is that there are usually more programs loaded in memory at same time (operating system, mouse driver, you program etc.). When using this way, program would have to know WHERE in memory will it be loaded so it can access it's variables. For this reason addresses are "relative". It means that for every program that is loaded is reserved some region in memory called "segment". All addresses in memory accessed by this program is then relative to begginning of this area. So [0] doesn't mean first byte of memory, but first byte of segment.

segment
consecutive region of memory reserved for one program

How this works? Processor has few special registers (segment registers) which holds address of segment (address of first byte of segment). Every time you access memory in your program then contents of this segment register is added to address given by you so mov al,[0] accesses first byte of your segment.

NOTE: I have told that memory addresses in .COM programs are words. That means they can be in range 0 to 65535. So maximal size of one segment is 65536 bytes. This can be "tricked" by changing contents of segment registers, but don't care about this now.

NOTE: Segment is region in memory. But term "segment" is often used for address of beginning of this region. Sad but true.

So absolute address in memory has two parts: segment (exact: address of beggining of segment) and second part, word sized value called "offset" which is address relative to segment (address of beginning of segment).

offset
address relative to segment, or address "inside" segment. (first definition is more exact, but second is easier to comprehend)

NOTE: (important) I said labels represent address of variable. In fact, labels in FASM represent offset of variable. That is why it called "flat" (you will comprehend this later (much much later :))

I won't get deeper into segment registers, how is address of begginning of segment stored in them (there IS difference), take segment registers as some kind of black box for now, it works and we can ignore it now.

3.4. 'org' directive explained
As your program is loaded, it often needs some external info from program that runned it. Best example is command line arguments, or it may need know WHO runned him etc. This value must be, of course, stored in same segment in program. In .COM files these data (passed to your program by program that runned you) is stored in first 256 bytes of segment. So your program is loaded from offset 256.

NOTE: 256 byte structure in beginning of .COM file is called "PSP" which stands for "program segment prefix"

Now imagine this .COM program:

mov al,[variable1]
int 20h
variable1 db 0

(notice - no org 256 directive). Instruction

mov al,
[variable1]

takes 3 bytes, int 20h takes 2 bytes, so variable1 will stand for offset 5. So instruction

mov
al,[variable1]

is mov al,[5]. So this instruction access 6th byte of segment (first byte is at offset 0). But I already told you that in first 256 bytes of segment are stored some informations, and your program is loaded behind them, from offset 256. So you don't want to variable1 to be 5, you want it to be 256+5. And this is what org directive does. It sets "origin" of file addresses. org 256 will tell FASM to add 256 to offset held by every label defined behind this directive (before next org directive). And this is exactly what we want in .COM files.

So code upwards won't access variable you want, it will access something in PSP (first 256 bytes of segment). To make it work properly use:

org 256
mov al,[variable1]
int 20h
variable1 db 0

NOTE: org affects labels at time of defintion (for example at label variable byte or variable db 0), not when they are used (like at mov ax,[variable]). That means, that if you change addresses "origin" with org directive after defining some label, then label will still hold same value before and behind org directive.

I won't tell you about data contained in PSP, you dont have to care about them now.