CHAPTER 4 - Endian encodings & word registers

4.1. Endian encodings
We should already have some quite exact idea about byte variables. You already know they are 8 bit large (not so important now) and that they can contain numeric value from 0 to 255. About word variables you know that they are 16 bits long and they contain value 0 to 65535.

Either you see it or not - word is same size as two bytes. Now let's think about how to store value in two bytes. Both bytes can contain value 0 to 255. Combination of this, we get 256*256, that is 65536. But how is this value stored in these bytes? Let's say one of bytes (byte #1) contains 0. Then other byte (byte #2) can hold value 0 to 255. So now we store values from 0 to 255 in our word. Now, when byte #1 contains 1, we can store another 256 number, 256 to 511. When byte #1 contains 2 we can store another 256 number, 512 to 767 etc. So totally it is 256*256, as i said, 65536. It is like in decimal numbers: every digit is value 0 to 9, and "true" value of digit depends on it's position. Last digit holds value 0 to 9, next (?previous?) digit hold 10*(0 to 9), next 100*(0 to 9) etc. It is same in words: One of bytes hold value 0 to 255, other holds value 256*(0 to 255). The one which holds 0..255 is called "low order byte", other (which holds 256*(0..255)) is called "high order byte".
terms: low order byte, high order byte
Examples (word value = high order byte : low order byte)

```0     = 0 : 0
1     = 0 : 1
255   = 0 : 255
256   = 1 : 0
257   = 1 : 1
511   = 1 : 255
512   = 2 : 0
513   = 2 : 1	    (513 / 256 = 2, 513 mod 256 = 1)
65535 = 255 : 255   (65535 / 256 = 255, 65535 mod 256 = 255)
```

Last problem remains: Order of these bytes. (eg: which is first, low order byte or high order byte?). This is different on different computers. On IBM PCs (and compatible) low order byte is first, high order byte comes then. For example:
```label variable
dw 0
```
then `byte [variable]` is low order byte and ```byte [variable + 1]``` is high order byte. (addition + 1 to offset in `variable` is done by compiler, value of `variable` is constant, so `variable + 1` is constant as well). It means next byte behind offset in `variable`, i think this clear enough to need any more explaination).

NOTE: When low order byte is first then it is called "little endian encoding", when high order byte then it is called "big endian encoding", but these terms are not important, especially for beginner asm coder.

4.2. Word registers
Processor has except byte registers (like `al`,`ah`, `dl`...) some word registrs too, of course. You know, word is combination of two bytes, and this is same for registers. Word registers are combination of byte registers. First word registers we'll learn are `ax`, `bx`, `cx` and `dx`. `ax` is combination of `al` and `ah`. `al` is low order byte, `ah` is high order byte. Same for bx = bh:bl, cx = ch:cl, dx = dh:dl. If you would like "emulate" register `ex` in memory it will be:
```label ex word
el db 0
eh db 0
```
`el` would be low order byte, so it is first.
terms: word register
word registers: ax, bx, cx, dx
NOTE: letters a,b,c,d stays for "accumulator", "base", "counter" and "data", it has nothing to do with alphabetical order. Real order of these registers is ax,cx,dx,bx but it is not important until you want to generate/change machine code yourself.

Now, if you want to set value in register `ax` to 52 you use
```mov ax,52
```
but you also could use
```mov al,52
mov ah,0
```
or setting `dx` to 12345
```mov dx,12345
```
but it could be (no reason to do it this way in real coding, this is just to demonstrate word to byte:byte relations)
```mov dh,48
mov dl,57
```
because 48 is equal to 12345 / 256, 57 is 12345 modulo 57 (modulo is remainder from division).

NOTE: You know that instruction operand can be number (numeric constant), like "0", "256", "12345" etc. But every assembler i know allows you to put some expression as operand. During compilation value of expression is evaluated and expression is "replaced" by it's result. So `mov dx,(1 + 5)` is same as `mov dx,6`. Or better, code that is upwards can be writen as
```mov dh,12345/ 256
mov dl,12345 mod 256
```
(`/` is operator for division, `mod` is operator which returns remainder from division (modulo). You don't have to know these operators now, anyway you should already know something about expressions).

Processor has also other word registers, `sp`, `bp`, `si`, `di`. But you can't directly access byte parts of this registers, you must access whole word. This is limitation of processor, there's nothing to do with it. For example if you want set high order byte of `si` to 17 you must (?) do it like this:
```mov ax,si
mov ah,17
mov si,ax
```
So first you copy value of `si` to `ax`. High order byte of `ax` can be dirctly accessed (it is `ah` register) so set it. Low order word remains. Then copy value back from `ax` to `si`. High order word is changed to 17, low order word remains unchanged.

NOTE: register `sp` always has special function, `bp` usually has special function (in code generated by most (all?) non-assembly compilers). Registers `si` and `di` can be used whenever you want. This means you shouldn't change `sp` and `bp` unless you know what you are doing.

4.3. String output using int 21h/ah=9
This should be part of chapter 3 about addresses, but you need to know `dx` register which is explained here.

Here we will talk about another usage of `int 21h`. You already should know that when `ah` contains 2 then `int 21h` writes character in `dl`. But if we want to display some longer text we must set `dl` for every char and this is bad method. Wouldn't it be better if we just store string we want to display somehere in file (like we did in chapter 1) and then just display it from here?

For this we can use `int 21h` with value 9 in `ah` and address of string in `dx` register. Something like:
```mov ah,9
mov dx,address_of_string
int 21h
```
But another problems comes out - how to determine length of string, eq. number of characters to display from given address. There are more methods about this, we will talk about simplest one, used by int 21h/ah=9. There is just some special character reserved as end-of-string marker. For int 21h/ah=9 it is character "\$". So to store string "Hello World", you define "Hello World\$", where "\$" means end of string. Example of displaying string:
```org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'Hello World\$'
```
This program will display "Hello World".

This method of marking end of string has limitation - you can't display character "\$". For example:
```org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'It costed 50\$, maybe more\$'
```
will of course display only "It costed 50". This case can be solved this way:
```org 256
mov ah,9
mov dx,text1
int 21h
mov ah,2
mov dl,'\$'
int 21h
mov ah,9
mov dx,text2
int 21h
int 20h
label text1
db 'It costed 50\$'
label text2
db ', maybe more\$'
```
first part (first `int 21h`) will write "It costed 50", then `int 21h/ah=2,` will write "\$" and second `int 21h/ah=9` will write ", maybe more". We won't care about this limitation anymore for now, this was just to improve explaination.

Deeper about `int 21h/ah=9`. As you maybe already realized, this will display every character (exact: every character whose ASCII code is in byte) from address in `dx` to first character "\$" behind address in `dx`.

NOTE: ASCII codes 0 to 31 (i think) have special meaning for ```int 21h/ah=9```. These codes have characters assigned to them (smiling faces, diamonds etc.), but int 21h/ah=9 doesnt display them but does something other. For example character with ascii code 7 will case it to beep for a short while. Try this:
```org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Beep',7,'\$'
```
It should write "Beep" and then beep.

Another common values are 10 and 13. 10 cases cursor to return to first column of current row. 13 causes cursor to move one row down (if bottom of screen is reached then screen is scrolled). So combination of this causes cursor to move to first column of next row. These two should (but doesn't always) work in any order, but you always should put 13 first. These two characters are often called EOL (end of line). Try this example:
```org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Line 1',13,10,'Line 2\$'
```
it should write:
```Line 1
Line 2
```
NOTE: ASCII code 13 is called CR (carriage return) and code 10 is called LF (line feed).

Another example on addresses (previous chapter), but with word registers. Check yourself whether you comprehended chapter 3:
```org 256
mov ah,9
mov dx,[address_of_text]
int 21h
text db 'Hello World\$'
address_of_text dw text
```
Here we load `dx` register with contents of `address_of_text` variable, which holds value `text`, and as we know, `text` is placeholder for offset of 'Hello World\$' string. So word-sized variable `address_of_text` holds offset of that string. And thus loading `dx` with contents of `address_of_text` will load it with offset of string we want to write. I hope you got it.