CHAPTER 2 - First program


You may have asked why am i fooling with creating some text files when you want to learn assembly. But text files are just some "arrays" of bytes. You didn't just learn how to create text file, you learnt how to define file containing any data you want. And this is what runnable program is - special "data" file, array of numeric values, called "machine code". You only have to know meaning of these values :). Of course it is very hard to remember all values and their's meanings, and this is what assembler is for. It translates programs from human acceptable language to machine code. So you only have to learn this human acceptable language :)

machine code
array of numeric values, that represents instructions to processor (CPU)
Now we will care about DOS .COM programs (rarely called "memory image", you will learn why later, when you get into thing). These are most-simple executable (runable) files under DOS and Windows.

So let's create first .COM file, which won't do anything.
  org 256
  int 20h
Compile this to .COM file and run it. Nothing should happen. Now let's look what that two lines means. (This will be funny...)
  org 256
I won't explain what this directive does now. Just put this line in the begginning of every .COM file! It doesn't define any data, even does nothing you can notice now. We will get to this later.
  int 20h
This is "instruction". Instruction is command for processor, which is stored in created file as one or more bytes. When you run .COM executable file, processor walks thru it and decodes instructions from machine code and does what these instructions instruct it to. Instruction int 20h says that this is end of execution of file. So first instruction in this says processor to stop execution, so executable file does nothing, as you saw.

instruction
single command to processor
(by the way - int 20h is NOT instruction to processor which ends execution of .COM program. It is instructs processor to call some system procedure. System procedure is chosen by number following int, in this case number 20h (it IS sort of number) which means procedure to end .COM file . int can be followed by another number and another system procedure will be called. But for now we can abstract from this, forget about it and take int 20h as instruction to stop program.)

So "machine code" is set of "instructions". Differ between directives and instructions. Directive is command for compiler, how it should define data and what data should it define. Instructions are defined data which encodes what processor will do when you execute program. For example db 0,0 is directive which defines two zero bytes, but it is instruction in case it is executed, because two zero bytes have special meaning for processor (don't care what is their meaning). org 256 is directive, not instruction, because it doesn't define any data. You will get into this by practice.

Instruction int 20h is simple, it don't need any arguments (=parameters, or values which changes it's effect). But what if some instruction DOES need any arguments? For this reason processor has it's own "variables" (variable is general term for space which stores some value). These variables are called "registers". First registers we'll learn are al, ah, bl, bh, cl, ch, dl, dh which are byte sized (they contain value in range 0 to 255).

register
"internal" processor's variable
(by the way int 20h takes argument in AL register, but, again, we can abstract from this. And, in fact, value 20h is instruction argument too, but we abstracted from this before. This is what i was talking about when i wrote that "it will be funny").

Now how to set value of register? There is a instruction which does this, for example:
  mov al,10
this instruction sets value of al register to 10. mov stands for "move". Destination of "moving" follows mov (separated with spaces), in this case it is al register. Then comes source of moving separated with comma (,), in this case it is number 10. So this instruction "moves" value 10 to register al. Another example of moving:
  mov al,bl
This copies value in bl register to al register. It won't change value in bl register. Source of mov always stays unchanged. NOTE: You will often (always?) see some people talking about mov instruction. But mov is not instruction, and int is not an instruction too. mov al,bl or int 20h is instruction for example. mov or int is called "instruction mnemonics". But accept this, everyone calls it "instruction" and you probably will too after some time (and i probably will too, sorry :). Arguments of instruction (part of instruction without instruction mnemonics, like al and 10 in mov al,10 are called "instruction operands" (or "instruction arguments" :)
instruction mnemonics (this term is not so improtant now)
instruction operand
Now let's go to usage of registers. We will use int 21h instruction which can do MANY things depending on value in ah register. We won't learn meaning of all values, now we will talk only about value 2. If value 2 is in ah register when int 21h instruction is executed then character in dl (more extactly: character whose ASCII code is in dl) is writen to screen (console).

NOTE: if you are using some Windows and file manager (like Total Commander) You will see window appear for very short time and then disappear. But our character is displayed in this window and you probably can't notice that. You must run shell (cmd on XP, command on older windozes) and run your program from it. Anyway, if you can't handle this, forget about assembly for a while and learn using your operating system. And then, dont forget to return to assembly!

Okay so let's look on program which writes character "a":
  org 256
  mov ah,2
  mov dl,'a'
  int 21h
  int 20h
so anylisis
  mov ah,2
sets value of "ah" register to 2, this should be clear
  mov dl,'a'
this moves "a" character into dl register. (In fact, there is nothing like "character a" in assembly. You could have noticed i wrote registers can contain numeric values. Nothing about characters. Way this works is that compiler translates character enclosed in apostrophes into it's numeric (ASCII) code, which is recognized by int 21h as code for this character. In assembly character "a" means ASCII code for character "a")
  int 21h
In this case, when "ah" contains value 2, this writes character in "dl"
  int 20h
And we can't forget to stop execution. Otherwise program will most-probably crash.

NOTE: in assembly character enclosed in apostrophes is same as ASCII code for this character

So writing multiple characters ("ab") is:
  org 256
  mov ah,2
  mov dl,'a'
  int 21h
  mov dl,'b'
  int 21h
  int 20h
we don't have to set ah to 2 again for second int 21h because value 2 will reamin in ah from previous settings. Value in dl will remain too, so code
  org 256
  mov ah,2
  mov dl,'a'
  int 21h
  int 21h
  int 21h
  int 20h
will write "aaa".