CHAPTER 4 - Endian encodings & word registers


4.1. Endian encodings
We should already have some quite exact idea about byte variables. You already know they are 8 bit large (not so important now) and that they can contain numeric value from 0 to 255. About word variables you know that they are 16 bits long and they contain value 0 to 65535.

Either you see it or not - word is same size as two bytes. Now let's think about how to store value in two bytes. Both bytes can contain value 0 to 255. Combination of this, we get 256*256, that is 65536. But how is this value stored in these bytes? Let's say one of bytes (byte #1) contains 0. Then other byte (byte #2) can hold value 0 to 255. So now we store values from 0 to 255 in our word. Now, when byte #1 contains 1, we can store another 256 number, 256 to 511. When byte #1 contains 2 we can store another 256 number, 512 to 767 etc. So totally it is 256*256, as i said, 65536. It is like in decimal numbers: every digit is value 0 to 9, and "true" value of digit depends on it's position. Last digit holds value 0 to 9, next (?previous?) digit hold 10*(0 to 9), next 100*(0 to 9) etc. It is same in words: One of bytes hold value 0 to 255, other holds value 256*(0 to 255). The one which holds 0..255 is called "low order byte", other (which holds 256*(0..255)) is called "high order byte".
terms: low order byte, high order byte
Examples (word value = high order byte : low order byte)

0     = 0 : 0
1     = 0 : 1
255   = 0 : 255
256   = 1 : 0
257   = 1 : 1
511   = 1 : 255
512   = 2 : 0
513   = 2 : 1	    (513 / 256 = 2, 513 mod 256 = 1)
65535 = 255 : 255   (65535 / 256 = 255, 65535 mod 256 = 255)

Last problem remains: Order of these bytes. (eg: which is first, low order byte or high order byte?). This is different on different computers. On IBM PCs (and compatible) low order byte is first, high order byte comes then. For example:
label variable
dw 0
then byte [variable] is low order byte and byte [variable + 1] is high order byte. (addition + 1 to offset in variable is done by compiler, value of variable is constant, so variable + 1 is constant as well). It means next byte behind offset in variable, i think this clear enough to need any more explaination).

NOTE: When low order byte is first then it is called "little endian encoding", when high order byte then it is called "big endian encoding", but these terms are not important, especially for beginner asm coder.


4.2. Word registers
Processor has except byte registers (like al,ah, dl...) some word registrs too, of course. You know, word is combination of two bytes, and this is same for registers. Word registers are combination of byte registers. First word registers we'll learn are ax, bx, cx and dx. ax is combination of al and ah. al is low order byte, ah is high order byte. Same for bx = bh:bl, cx = ch:cl, dx = dh:dl. If you would like "emulate" register ex in memory it will be:
label ex word
el db 0
eh db 0
el would be low order byte, so it is first.
terms: word register
word registers: ax, bx, cx, dx
NOTE: letters a,b,c,d stays for "accumulator", "base", "counter" and "data", it has nothing to do with alphabetical order. Real order of these registers is ax,cx,dx,bx but it is not important until you want to generate/change machine code yourself.

Now, if you want to set value in register ax to 52 you use
mov ax,52
but you also could use
mov al,52
mov ah,0
or setting dx to 12345
mov dx,12345
but it could be (no reason to do it this way in real coding, this is just to demonstrate word to byte:byte relations)
mov dh,48
mov dl,57
because 48 is equal to 12345 / 256, 57 is 12345 modulo 57 (modulo is remainder from division).

NOTE: You know that instruction operand can be number (numeric constant), like "0", "256", "12345" etc. But every assembler i know allows you to put some expression as operand. During compilation value of expression is evaluated and expression is "replaced" by it's result. So mov dx,(1 + 5) is same as mov dx,6. Or better, code that is upwards can be writen as
mov dh,12345/ 256
mov dl,12345 mod 256
(/ is operator for division, mod is operator which returns remainder from division (modulo). You don't have to know these operators now, anyway you should already know something about expressions).

Processor has also other word registers, sp, bp, si, di. But you can't directly access byte parts of this registers, you must access whole word. This is limitation of processor, there's nothing to do with it. For example if you want set high order byte of si to 17 you must (?) do it like this:
mov ax,si
mov ah,17
mov si,ax
So first you copy value of si to ax. High order byte of ax can be dirctly accessed (it is ah register) so set it. Low order word remains. Then copy value back from ax to si. High order word is changed to 17, low order word remains unchanged.

NOTE: register sp always has special function, bp usually has special function (in code generated by most (all?) non-assembly compilers). Registers si and di can be used whenever you want. This means you shouldn't change sp and bp unless you know what you are doing.


4.3. String output using int 21h/ah=9
This should be part of chapter 3 about addresses, but you need to know dx register which is explained here.

Here we will talk about another usage of int 21h. You already should know that when ah contains 2 then int 21h writes character in dl. But if we want to display some longer text we must set dl for every char and this is bad method. Wouldn't it be better if we just store string we want to display somehere in file (like we did in chapter 1) and then just display it from here?

For this we can use int 21h with value 9 in ah and address of string in dx register. Something like:
mov ah,9
mov dx,address_of_string
int 21h
But another problems comes out - how to determine length of string, eq. number of characters to display from given address. There are more methods about this, we will talk about simplest one, used by int 21h/ah=9. There is just some special character reserved as end-of-string marker. For int 21h/ah=9 it is character "$". So to store string "Hello World", you define "Hello World$", where "$" means end of string. Example of displaying string:
org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'Hello World$'
This program will display "Hello World".

This method of marking end of string has limitation - you can't display character "$". For example:
org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'It costed 50$, maybe more$'
will of course display only "It costed 50". This case can be solved this way:
org 256
mov ah,9
mov dx,text1
int 21h
mov ah,2
mov dl,'$'
int 21h
mov ah,9
mov dx,text2
int 21h
int 20h
label text1
db 'It costed 50$'
label text2
db ', maybe more$'
first part (first int 21h) will write "It costed 50", then int 21h/ah=2, will write "$" and second int 21h/ah=9 will write ", maybe more". We won't care about this limitation anymore for now, this was just to improve explaination.

Deeper about int 21h/ah=9. As you maybe already realized, this will display every character (exact: every character whose ASCII code is in byte) from address in dx to first character "$" behind address in dx.

NOTE: ASCII codes 0 to 31 (i think) have special meaning for int 21h/ah=9. These codes have characters assigned to them (smiling faces, diamonds etc.), but int 21h/ah=9 doesnt display them but does something other. For example character with ascii code 7 will case it to beep for a short while. Try this:
org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Beep',7,'$'
It should write "Beep" and then beep.

Another common values are 10 and 13. 10 cases cursor to return to first column of current row. 13 causes cursor to move one row down (if bottom of screen is reached then screen is scrolled). So combination of this causes cursor to move to first column of next row. These two should (but doesn't always) work in any order, but you always should put 13 first. These two characters are often called EOL (end of line). Try this example:
org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Line 1',13,10,'Line 2$'
it should write:
Line 1
Line 2
NOTE: ASCII code 13 is called CR (carriage return) and code 10 is called LF (line feed).

Another example on addresses (previous chapter), but with word registers. Check yourself whether you comprehended chapter 3:
org 256
mov ah,9
mov dx,[address_of_text]
int 21h
text db 'Hello World$'
address_of_text dw text
Here we load dx register with contents of address_of_text variable, which holds value text, and as we know, text is placeholder for offset of 'Hello World$' string. So word-sized variable address_of_text holds offset of that string. And thus loading dx with contents of address_of_text will load it with offset of string we want to write. I hope you got it.