CHAPTER 3 - Labels & Addresses & Variables
Okay, let's get to variables. In previous chapter i wrote that variable is
general term for space which stores some value. Registers are variables for
example. But there is limited number of registers (VERY limited, some 8 + few
special), and this is nearly always not enough. For this reason memory (RAM -
random access memory) is used.
NOTE: when someone says "variable" he almost always means memory
variable.
3.1. Labels
Problem is that you have to know WHERE in memory is some value stored.
Position in memory (called "address") is given by number. But it is quite hard
to remember this number (address) for every variable.
term: address
number which gives position in memory
Another problem with addresses is that when you change your program, address
can be changed too, and so you would have to correct this number everywhere
where it is used. For this reason addresses are represented by "labels".
Label is just some word (not string, it is not enclosed in apostrophes),
which, in your program, represents address in memory. When you compile your
program, compiler will replace label with proper address. Label consists of
alphabet characters ("a" to "z", "A" to "Z") numbers ("0" to "9"), underscores
("_") and dots ("."). But first character of label can't be number or dot.
Label also can't have same name as directive or instruction (instruction
mnemonics). Labels are case sensitive in FASM ('a' is NOT same as 'A').
Example of labels:
name | is label |
a | is label |
A | is label, different from "a" |
name2 | is label |
name.NAME2 | is label |
name._NAME2 | is label |
_name | is label |
_ | is label |
.name | is not label, because is starts with dot (labels starting with
dot have special meaning in FASM, which you will learn later) |
1 | is not label because it starts with number |
1st_name | is not label for same reason |
name1 name2 | is not label, because it contains space |
mov | is not label, because "mov" is instruction mnemonics |
term: label
Placeholder for some address, eg. placeholder for some number, because address is number. In FASM you can
use label same way as any other number (not really, but it doesn't matter for you too much now).
You can define label using directive "label". This directive should be
followed with label itself (label name). For example:
label name | is label definition, it defines label "name" |
label _name | is label definition, it defines label "_name" |
label label | is not label definition, because "label" can't be name of label
as decribed in previous paragraph
|
this will define label that will represent address of data defined behind it
directive: label
label definition
directive label followed by label name
Shorter way to define label is just writing label name followed by colon
(:
)
name:
_name:
but we won't use this way for some time.
3.2. Variable definition
Now how we can return to problem with variables: how to define variable in
memory. Program you create (compiled program, in machine code) is loaded to
memory at execution time, where processor executes it instruction by
instruction. Look at this program:
org 256
mov al,10
db 'this is a string'
int 20h
This program will probably crash, because after processor executes mov
al,10
then it reaches string. But in program there is no difference between
string and instructions in machine code. Both are translated into array of numeric
values (bytes). There is no way processor can differ whether numeric value is
translation of string or translation of instruction. In this example,
processor will execute instructions whose numeric representation (in machine
code) is same as ASCII representation of string "this is a string".
Now look at this:
org 256
mov al,10
int 20h
db 'this is a string'
This program will not crash, because before reaching bytes defined by string
processor reaches instruction int 20h
, which ends execution of program. So
bytes defined with string will not be executed, it will just take some space.
This is way how you can define variable - define some data at place where
processor won't try to execute it (behind int 20h
in this case).
So code with byte-sized variable of value 105
org 256
mov al,10
int 20h
db 105
Last line defines byte variable containing 105.
Now how to access variable? First we must know address of variable. For this
we can use label (described above, reread it if you have forgotten)
org 256
mov al,10
int 20h
label my_first_variable
db 105
So we already know address of variable. It is represented by label
my_first_variable
. Now how to access it? You may think it is, for example
mov al,my_first_variable
but no! Remember i told that label (my_first_variable
in this
case) stands for address of variable. So this instruction will move address of
variable to al
register, not variable's contents. To access
contents of variable (or contents of any memory location) you must enclose it's
address in brackets ([
and ]
). So to access contents
of our variable, and copy it's value to al
we use
mov al,[my_first_variable]
Now we will define two variables:
org 256
<some instructions>
int 20h
label variable1
db 100
label variable2
db 200
So to copy value of variable1
to al
we use
mov al,[variable1]
To copy al
to variable1
use
mov [variable1],al
To set value of variable1
(exact: to set value of variable which is stored
at address represented by variable1
) to 10 we could try
mov [variable1],10
but this will cause error (try it if you want). Problem is that you know that
you are changing variable at address variable1
to 10
.
But what is size of variable? In previous two cases byte-size could be
determined because you used al
register which is byte sized, so
compiler decided that variable at variable1
is byte sized too,
because you can't move between operands with different sizes. But in this case,
value 10 can be of any size, so it can't decide size of memory variable. To
solve this we use "size operators". We will talk about two size operators for
now: byte
and word
. You can put size operator before
instruction operand when accessing it to let compiler know what the variable
size is:
mov byte [variable1],10
Another way to make this is
mov [variable1], byte 10
in this case compiler knows that moved value 10 is byte sized so it decides
that variable is byte-sized too (because we can move byte sized value only to
byte sized variable).
But it would be hard to always remember and always write size of variable when
you access it. For this reason you can assign size of variable to label when
you define it. Just write size operator behind label name in definition:
label variable1 byte
db 100
or
label variable1 word
dw 1000
now everytime you use [variable1]
it will have same meaning as
byte [variable1]
(or word [variable1]
in second
example). So mov [variable1],10
will work, in first case it will
store value 10 to byte at address variable1
, in second case it
will store to word.
size operator
NOTE: You can't move value between variables with different size:
mov byte [variable1], word 10
or
mov [variable1],al
...
label variable1 word
dw 0
NOTE: You can't access two memory locations in one instruction (except for
same special instructions). This is wrong, it won't be compiled:
mov [variable1],[variable2]
use this:
mov al,[variable2]
mov [variable1],al
This will cause you some problems in the beginning but it will force you
to write faster code, and that is biggest reason to code assembly.
NOTE: size operator assigned to label at definition has lower priority than
size operator before access to variable in instruction, so:
mov byte [variable],10
label variable word
dw 0
will access BYTE, while
mov [variable],10
will access WORD.
I think you noticed that having two lines to define one variable is little too
much. There is a shorter way to define variable:
variable1 db 100
is same as
label variable1 byte
db 100
notice that size of variable is defined too. In general, if data definiton
(using db
or dw
directive) is preceded by label, then
it will define this label too, and assign size of defined data as size of
label.
It can be used with words too
variable2 dw 100
Some example of using variables:
mov ah,2
mov dl,[character_to_write]
int 21h
int 20h
character_to_write db 'a'
3.3. Addresses and basics of segmentation
Now we will discuss addresses little more. I have told that address is number
(!) which gives some position in memory. You have learnt how to represent this
number with labels, so numeric addresses were maintained by compiler. But you
still don't know anything about format of this number. I will try to explain
it a little in this chapter.
As you probably know, data in memory are stored in "bits" which can have value
0 or 1. You can consider memory as a (one dimensional) array of bits. 8
consecutive bits make one byte. Address is number (index, position in array)
of byte. For example address "0" is address of first bit of memory (or address
of first byte), address "1" is address of eight bit (or address of second byte)
of memory etc. Easiest to comprehend is to take memory as (one dimensional)
array of bytes
Address in .COM files is word-sized number, so
label var1
<some data>
mov al,var1
is wrong. It may work if var1
is lesser than 256 so it fits into
byte sized register, but in general, store addresses in word-sized variables,
we will talk about them little later.
Now some examples on addresses. Check this file
label variable1
db 10
label variable2
db 20
label variable3
db 30
here address represented by variable1
is 0, variable2
stands for 1, variable3
is 2.
OK, this looks nice but it is not true at all. Problem is that there are
usually more programs loaded in memory at same time (operating system, mouse
driver, you program etc.). When using this way, program would have to know
WHERE in memory will it be loaded so it can access it's variables. For this
reason addresses are "relative". It means that for every program that is
loaded is reserved some region in memory called "segment". All addresses in
memory accessed by this program is then relative to begginning of this area.
So [0]
doesn't mean first byte of memory, but first byte of segment.
segment
consecutive region of memory reserved for one program
How this works? Processor has few special registers (segment registers) which
holds address of segment (address of first byte of segment). Every time you
access memory in your program then contents of this segment register is added
to address given by you so mov al,[0]
accesses first byte of your
segment.
NOTE: I have told that memory addresses in .COM programs are words. That
means they can be in range 0 to 65535. So maximal size of one segment is
65536 bytes. This can be "tricked" by changing contents of segment registers,
but don't care about this now.
NOTE: Segment is region in memory. But term "segment" is often used for
address of beginning of this region. Sad but true.
So absolute address in memory has two parts: segment (exact: address of
beggining of segment) and second part, word sized value called "offset" which
is address relative to segment (address of beginning of segment).
offset
address relative to segment, or address "inside" segment. (first definition is more
exact, but second is easier to comprehend)
NOTE: (important) I said labels represent address of variable. In fact,
labels in FASM represent offset of variable. That is why it called "flat"
(you will comprehend this later (much much later :))
I won't get deeper into segment registers, how is address of begginning of
segment stored in them (there IS difference), take segment registers as some
kind of black box for now, it works and we can ignore it now.
3.4. 'org' directive explained
As your program is loaded, it often needs some external info from program that
runned it. Best example is command line arguments, or it may need know WHO
runned him etc. This value must be, of course, stored in same segment in
program. In .COM files these data (passed to your program by program that
runned you) is stored in first 256 bytes of segment. So your program is loaded
from offset 256.
NOTE: 256 byte structure in beginning of .COM file is called "PSP" which
stands for "program segment prefix"
Now imagine this .COM program:
mov al,[variable1]
int 20h
variable1 db 0
(notice - no org 256
directive). Instruction mov al,
[variable1]
takes 3 bytes, int 20h
takes 2 bytes, so
variable1
will stand for offset 5. So instruction mov
al,[variable1]
is mov al,[5]
. So this instruction access
6th byte of segment (first byte is at offset 0). But I already told you that
in first 256 bytes of segment are stored some informations, and your program
is loaded behind them, from offset 256. So you don't want to variable1
to be 5, you want it to be 256+5. And this is what org
directive
does. It sets "origin" of file addresses. org 256
will tell FASM
to add 256 to offset held by every label defined behind this directive (before
next org
directive). And this is exactly what we want in .COM
files.
So code upwards won't access variable you want, it will access something in
PSP (first 256 bytes of segment). To make it work properly use:
org 256
mov al,[variable1]
int 20h
variable1 db 0
NOTE: org
affects labels at time of defintion (for example
at label variable byte
or variable db 0
), not when
they are used (like at mov ax,[variable]
). That means, that if you
change addresses "origin" with org
directive after defining some
label, then label will still hold same value before and behind org
directive.
I won't tell you about data contained in PSP, you dont have to care about
them now.