I have been learning Assembly and I have a question. The textbook presents the following example:
Assume that the printer data port is memory-mapped to address 0FFE0h
and the printer status port is bit zero of memory-mapped port 0FFE2h.
The following code waits until the printer is ready to accept a byte
of data and then it writes the byte in the L.O. byte of ax to the
printer port:
0000: mov bx, [FFE2]
0003: and bx, 1
0006: cmp bx, 0
0009: je 0000
000C: mov [FFE0], ax
. .
. .
. .
The first instruction fetches the data at the status input port. The
second instruction logically ands this value with one to clear bits
one through fifteen and set bit zero to the current status of the
printer port. Note that this produces the value zero in bx if the
printer is busy, it produces the value one in bx if the printer is
ready to accept additional data. The third instruction checks bx to
see if it contains zero (i.e., the printer is busy). If the printer is
busy, this program jumps back to location zero and repeats this
process over and over again until the printer status bit is one.
Why must we perform the second instruction, and bx, 1? Can't we just go straight to cmp bx, 0?
Also, can you please clarify or reword "The second instruction logically ands this value with one to clear bits one through fifteen and set bit zero to the current status of the printer port"? I don't understand what it means right now because English isn't my first language.
Thank you for
The bit field of the status byte may contain other flags in other bits. You're only interested in the bit 0 (the least significant bit) in this case, so you ignore the rest of bits by anding the value with 1, and then testing the value against 0.
Let's say that memory address 0xFFE2 contains a byte with 8 bits, for instance something like this: 00010100.
Only the last bit contains information about printer status. All other bits don't matter for this purpose. How would you extract the last bit from this byte?
The solution given by the book (and one that is used overall) is to zero out all bits that don't matter by using bitwise and operator:
00010100 # content of the memory cell (0x14)
and 00000001 # 0x1
---------------
00000000
...or...
00010101 # content of the memory cell (0x15)
and 00000001 # 0x1
---------------
00000001
You see where this is going, don't you? By comparing the result of the operation with 0, you can get definite answer if the last bit was 0 or not and hence if the printer is ready or not. Thus, in this case, and operator is just a way of extracting single bit from a byte, nothing more.
Because, as the problem states, "the printer status port is bit zero". If you don't clear away the other bits with that AND instruction, these could cause you to not take the jump even when the bit of interest is zero.
Since you're a student, you probably need more than just a quick answer:
BX is a 16 bit register and only the first bit (bit 0) is of interest.
As an example, use a value of 10101111 00101000 (AF28h) for bx.
cmp bx, 0 would return false even though bit 0 has a value of zero because cmp compares the value of the whole register, not just bit 0.
In other words, cmp bx, 0 = false because AF28h <> 0
The line and bx, 1 changes the value of bx to 0 if the first bit is 0, or 1 if the first bit is 1
In my example, and bx, 1 sets bx = 0 because bit 0 has a value of zero
Related
I've just started learning Assembly and I got stuck now...
%include 'io.inc'
global main
section .text
main:
; read a
mov eax, str_a
call io_writestr
call io_readint
mov [nb_array], eax
call io_writeln
; read b
mov eax, str_b
call io_writestr
call io_readint
mov [nb_array + 2], eax
call io_writeln
mov eax, [nb_array]
call io_writeint
call io_writeln
mov eax, [nb_array + 2]
call io_writeint
section .data
nb_array dw 0, 0
str_a db 'a = ', 0
str_b db 'b = ', 0
So, I have a 2 elem sized array and when I try to print the first element, it doesn't print the right value. Although I try to print the second element, it prints the right value. Could someone help me understand why is this happening?
The best answer is probably "because there are no arrays in Assembly". You have computer memory available, which is addressable by bytes. And you have several instructions to manipulate those bytes, either by single byte, or by groups of them forming "word" (two bytes) or "dword" (four bytes), or even more (depends on platform and extended instructions you use).
To use the memory in any "structured" way in Assembly: it's up to you to write piece of code like that, and it takes some practice to be accurate enough and to spot all bugs in debugger (as just running the code with correct output doesn't mean much, if you would do only single value input, your program would output correct number, but the "a = " would be destroyed anyway - you should rather every new piece of code walk instruction by instruction in debugger and verify everything works as expected).
Bugs in similar code were so common, that people rather used much worse machine code produced by C compiler, as the struct and C arrays were much easier to use, not having to guard by_size multiplication of every index, and allocating correct amount of memory for every element.
What you see as result is exactly what you did with the memory and particular bytes (fix depends whether you want it to work for 16b or 32b input numbers, you either have to fix instructions storing/reading the array to work with 16b only, or fix the array allocation and offsets to accompany two 32b values).
Here's a small snippet of assembly code (TASM) where I simply try to increment the value at the current index of the array. The idea is that the "freq" array will store a number (DWord size) that represents how many times that ASCII character was seen in the file. To keep the code short, "b" stores the current byte being read.
Declared in data segment
freq DD 256 DUP (0)
b DB ?
___________
Assume b contains current byte
mov bl, b
sub bh, bh
add bx, bx
inc freq[bx]
I receive this error at compilation time at the line containing "inc freq[bx]": ERROR Argument to operation or instruction has illegal size.
Any insight is greatly appreciated.
There is no inc that can increment a dword in 16 bit mode. You will have to synthesize it from add/adc, such as:
add freq[bx], 1
adc freq[bx + 2], 0
You might need to add a size override, such as word ptr or change your array definition to freq DW 512 DUP (0).
Also note that you have to scale the index by 4, not 2.
I recently got arrays working in masm32, but I'm running into a very confusing snag. I have a procedure (AddValue) that accepts one argument and adds that argument to an element in an array called bfmem. Which element to affect is determined by a variable called index. However, index appears to be changing its value where i would not expect it to.
If index is greater than 0, the program behaves normally. However, if index equals 0, its value will be changed to whatever value was passed to the procedure. This is utterly baffling to me, especially as this only happens when index is set to zero. I don't know much MASM so forgive me if this is a really simple problem.
.386
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\kernel32.lib
.data
bfmem dd 0 dup(30000)
index dd 0
_output db 10 dup(0)
.code
AddValue proc val ; adds val to bfmem[index]
invoke dwtoa, index, addr _output
invoke StdOut, addr _output ;prints '0' (as expected)
mov ecx, index ;mov index to register for arithmatic
mov eax, val ;mov val to register for arithmatic
add [ecx * 4 + bfmem], eax ;multiply index by 4 for dword
;then add to array to get the element's
;memory address, and add value
invoke dwtoa, index, addr _output
invoke StdOut, addr _output ;prints '72' (not as expected)
ret
AddValue endp
main:
mov index, 0
invoke AddValue, 72
invoke StdIn, addr _output, 1
invoke ExitProcess, 0
end main
The only thing I can think of is that the assembler is doing some kind of arithmetic optimization (noticing ecx is zero and simplifying the [ecx * 4 + bfmem] expression in some way that changes the output). If so, how can I fix this?
Any help would be appreciated.
The problem is that your declaration:
bfmem dd 0 dup(30000)
Says to allocate 0 bytes initialized with the value 30000. So when index is 0, you are overwriting the value of index (the address of index and bfmem coincide). Larger indexes you don't see the problem because you're overwriting other memory, like your output buffer. If you want to test to see that this is what's happening, try this:
bfmem dd 0 dup(30000)
index dd 0
_messg dd "Here is an output message", 13, 10, 0
Run your program with an index value of 1, 2, 3 and then display the message (_messg) using invoke StdOut.... You'll see that it overwrites parts of the message.
I assume you meant:
bfmem dd 30000 dup(0)
Which is 30000 bytes initialized to 0.
I'm trying to learn emulation programming. I've done a CHIP-8 emulator, Under 40 instructions, and lived because of my music. I'm now hoping to do something A bit more complex, like an SNES. The problem I'm encountering is the sheer number of CPU instructions. Looking through the wiki.SuperFamicom.org 65c816 instruction listing, It look's like a pain in the rear. And I've seen notes here and there on various internet pages that the CPU is the easyest part of an emulator to impliment.
Under the assumption that it was so hard because I was doing it wrong, I looked around and found a simple implimentation: SNES Emulator in 15 minutes which is about 900 lines of code. Easy enough to work through.
So then, from the SNES Emulator in 15 minutes Source, I found where the CPU instructions are. It look's a lot simpler than what I was thinking. I dont really understand it, but it's a few lines of code as opposed to a large mass of code. First thing I notice is that the instructions only have 1 implimentation each. If you look at the table in SuperFamicom then you see that it has
ADC #const
ADC (_db_),X
ADC (_db_,X)
ADC addr
ADC long
...
And The emulator source for (I think) ALL of those is:
// Note: op 0x100 means "NMI", 0x101 means "Reset", 0x102 means "IRQ". They are implemented in terms of "BRK".
// User is responsible for ensuring that WB() will not store into memory while Reset is being processed.
unsigned addr=0, d=0, t=0xFF, c=0, sb=0, pbits = op<0x100 ? 0x30 : 0x20;
// Define the opcode decoding matrix, which decides which micro-operations constitute
// any particular opcode. (Note: The PLA of 6502 works on a slightly different principle.)
const unsigned o8 = op / 32, o8m = 1u << (op%32);
// Fetch op'th item from a bitstring encoded in a data-specific variant of base64,
// where each character transmits 8 bits of information rather than 6.
// This peculiar encoding was chosen to reduce the source code size.
// Enum temporaries are used in order to ensure compile-time evaluation.
#define t(w8,w7,w6,w5,w4,w3,w2,w1,w0) if( \
(o8<1?w0##u : o8<2?w1##u : o8<3?w2##u : o8<4?w3##u : \
o8<5?w4##u : o8<6?w5##u : o8<7?w6##u : o8<8?w7##u : w8##u) & o8m)
t(0,0xAAAAAAAA,0x00000000,0x00000000,0x00000000,0xAAAAA2AA,0x00000000,0x00000000,0x00000000) { c = t; t += A + P.C; P.V = (c^t) & (A^t) & 0x80; P.C = t & 0x100; }
In short, my General question:
Condensing the phenomenal cosmic power of CPU instructions into an itty bitty piece of code
Questions specific to the SNES emulator in 15 minutes source (portion posted above):
How does t(0, 0xAAAAAAAA, 0x00000000, ....) parse the instruction? I see the if statment, but I dont know where the number's for any of the arguments come from, or what they mean to the overall code.
Why o8 = op / 32 and o8m = 1u << (op%32)?
The opcodes for ADC has ADC #const which has a 2 byte operand, or ADC addr which has a 3 byte operand. And the code t(0, 0xAAAAAAAA, ...) impliments both cases?
While I'm asking:
what do the dp, _dp_ and sr that appear in ADC dp, ADC (_dp_) and ADC sr,S mean?
what is the difference between ADC (_dp_,X) and ADC dp,X? (probably redundand given the question above.)
I can't answer all of this, but dp stands for Direct Page, meaning that the instruction takes a single-byte operand which is a memory address within the Direct Page. Direct Page addressing is an extension of the Zero Page addressing mode of the 6502, where the single-byte addresses referred to memory locations $00 through $FF. The 16-bit derivatives of the 6502 have a configuration register which basically relocates the Zero Page to an alternate location.f
In the wiki page you linked to, some of the dp in the table have underscores on them, and the others are in italics. I assume that they are all intended to be italic, and the wiki markup isn't working. A quick check of the Edit link supports this assumption (in the wiki source, they all have underscores). So don't read anything into that.
In 6502 assembly and derivatives of it, ADC dp,X means... let's take a concrete example instead... ADC $10,X means to add $10 to the value in register X to obtain an address, then load a value from that address and add it to the accumulator. ADC ($10,X) adds an extra level of indirection: add $10 to X to obtain an address, load a value from that address, interpret the loaded value as another address, and load the value from that address and add it to the accumulator. Parenthesized operands always add a level of indirection.
Note that the available modes include (dp,X) and (dp),Y and the placement of the parentheses relative to the comma and register is significant. With (dp),Y the value of Y is added to the first loaded value to get the address to use in the second load.
As for that emulator... code golf doesn't lead to enhanced readability! I don't think the portion you've posted is actually understandable by itself, and I don't feel like tracking down and reading the rest of it. But the key concept in the t macro is bitstring. Its arguments are a series of 9 bitmasks, each 32 bits long, for a total of 288 bits. Every possible opcode (256 of them), plus the 3 pseudo-opcodes mentioned in the first comment, is therefore represented by a single bit in this 288-bit-long bitstring, with 29 bits left over.
That explains the construction of o8 and o8m. The 8-bit value is split into a 3-bit portion (to select an argument from the 8 arguments supplied to t) and a 5-bit portion (to select a single bit from the selected argument). The big ?: chain does the first selection and the combination of & and 1 << ... does the select selection.
And then, oh look we have a variable called t too. It's not related to the macro. Giving them the same name was just cruel.
Maybe I can figure out what that bitstring is doing. When the opcode is a low number, o8 (the high bits) will be 0, so the ?: chain will use w0, which is the last argument to the macro. As the opcode increases, the selected argument moves leftward through the argument list to w1, then w2... and the o8m selector likewise starts at the right and moves left (& (1<<0) is the rightmost bit, & (1<<1) is the next one, etc.) and the if condition will be true when the selected bit is 1. Values are:
0, # opcodes $100 and up
0xAAAAAAAA, # opcodes $E0 to $FF
0x00000000, # opcodes $C0 to $DF
0x00000000, # opcodes $A0 to $BF
0x00000000, # opcodes $80 to $9F
0xAAAAA2AA, # opcodes $60 to $7F
0x00000000, # opcodes $40 to $5F
0x00000000, # opcodes $20 to $3F
0x00000000 # opcodes $00 to $1F
or in binary
0, # opcodes $100 and up
0b10101010101010101010101010101010, # opcodes $E0 to $FF
0b00000000000000000000000000000000, # opcodes $C0 to $DF
0b00000000000000000000000000000000, # opcodes $A0 to $BF
0b00000000000000000000000000000000, # opcodes $80 to $9F
0x10101010101010101010001010101010, # opcodes $60 to $7F
0b00000000000000000000000000000000, # opcodes $40 to $5F
0b00000000000000000000000000000000, # opcodes $20 to $3F
0b00000000000000000000000000000000 # opcodes $00 to $1F
Reading each line from right to left, the 1's are in positions corresponding to these opcodes: $61 $63 $65 $67 $69 $6D $6F $71 $73 $75 $77 $79 $7B $7D $7F $E1 $E3 $E5 $E7 $E9 $EB $ED $EF $F1 $F3 $F5 $F7 $F9 $FB $FD $FF
Hmm... that sort of resembles the list of ADC and SBC opcodes, but some of them are wrong.
Oh (I finally gave up and looked at some more of the emulator code) that's a NES emulator, not a SNES emulator, so it only has 6502 opcodes.
In x86 assembly, the overflow flag is set when an add or sub operation on a signed integer overflows, and the carry flag is set when an operation on an unsigned integer overflows.
However, when it comes to the inc and dec instructions, the situation seems to be somewhat different. According to this website, the inc instruction does not affect the carry flag at all.
But I can't find any information about how inc and dec affect the overflow flag, if at all.
Do inc or dec set the overflow flag when an integer overflow occurs? And is this behavior the same for both signed and unsigned integers?
============================= EDIT =============================
Okay, so essentially the consensus here is that INC and DEC should behave the same as ADD and SUB, in terms of setting flags, with the exception of the carry flag. This is also what it says in the Intel manual.
The problem is I can't actually reproduce this behavior in practice, when it comes to unsigned integers.
Consider the following assembly code (using GCC inline assembly to make it easier to print out results.)
int8_t ovf = 0;
__asm__
(
"movb $-128, %%bh;"
"decb %%bh;"
"seto %b0;"
: "=g"(ovf)
:
: "%bh"
);
printf("Overflow flag: %d\n", ovf);
Here we decrement a signed 8-bit value of -128. Since -128 is the smallest possible value, an overflow is inevitable. As expected, this prints out: Overflow flag: 1
But when we do the same with an unsigned value, the behavior isn't as I expect:
int8_t ovf = 0;
__asm__
(
"movb $255, %%bh;"
"incb %%bh;"
"seto %b0;"
: "=g"(ovf)
:
: "%bh"
);
printf("Overflow flag: %d\n", ovf);
Here I increment an unsigned 8-bit value of 255. Since 255 is the largest possible value, an overflow is inevitable. However, this prints out: Overflow flag: 0.
Huh? Why didn't it set the overflow flag in this case?
The overflow flag is set when an operation would cause a sign change. Your code is very close. I was able to set the OF flag with the following (VC++) code:
char ovf = 0;
_asm {
mov bh, 127
inc bh
seto ovf
}
cout << "ovf: " << int(ovf) << endl;
When BH is incremented the MSB changes from a 0 to a 1, causing the OF to be set.
This also sets the OF:
char ovf = 0;
_asm {
mov bh, 128
dec bh
seto ovf
}
cout << "ovf: " << int(ovf) << endl;
Keep in mind that the processor does not distinguish between signed and unsigned numbers. When you use 2's complement arithmetic, you can have one set of instructions that handle both. If you want to test for unsigned overflow, you need to use the carry flag. Since INC/DEC don't affect the carry flag, you need to use ADD/SUB for that case.
IntelĀ® 64 and IA-32 Architectures Software Developer's Manuals
Look at the appropriate manual Instruction Set Reference, A-M. Every instruction is precisely documented.
Here is the INC section on affected flags:
The CF flag is not affected. The OF, SZ, ZF, AZ, and PF flags are set according to the result.
try changing your test to pass in the number rather than hard code it, then have a loop that tries all 256 numbers to find the one if any that affects the flag. Or have the asm perform the loop and exit out when it hits the flag and or when it wraps around to the number it started with (start with something other than 0x00, 0x7f, 0x80, or 0xFF).
EDIT
.globl inc
inc:
mov $33, %eax
top:
inc %al
jo done
jmp top
done:
ret
.globl dec
dec:
mov $33, %eax
topx:
dec %al
jo donex
jmp topx
donex:
ret
Inc overflows when it goes from 0x7F to 0x80. dec overflows when it goes from 0x80 to 0x7F, I suspect the problem is in the way you are using inline assembler.
As many of the other answers have pointed out, INC and DEC do not affect the CF, whereas ADD and SUB do.
What has not been said yet, however, is that this might make a performance difference. Not that you'd usually be bothered by that unless you are trying to optimise the hell out of a routine, but essentially not setting the CF means that INC/DEC only write to part of the flags register, which can cause a partial flag register stall, see Intel 64 and IA-32 Architectures Optimization Reference Manual or Agner Fog's optimisation manuals.
Except for the carry flag inc sets the flags the same way as add operand 1 would.
The fact that inc does not affect the carry flag is very important.
http://oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-2.html#HEADING2-117
The CPU/ALU is only capable of handling unsigned binary numbers, and then it uses OF, CF, AF, SF, ZF, etc., to allow you to decide whether to use it as a signed number (OF), an unsigned number (CF) or a BCD number (AF).
About your problem, remember to consider the binary numbers themselves, as unsigned.
**Also, the overflow and the OF require 3 numbers: The input number, a second number to use in the arithmetic, and the result number.
Overflow is activated only if the first and second numbers have the same value for the sign bit (the most significant bit) and the result has a different sign. As in, adding 2 negative numbers resulted in a positive number, or adding 2 positive numbers resulted in a negative number:
if( (Sign_Num1==Sign_Num2) && (Sign_Result!=Sign_Num1) ) OF=1;
else OF=0;
For your first problem, you are using -128 as the first number. The second number is implicitly -1, used by the DEC instruction. So we really have the binary numbers 0x80 and 0xFF. Both them have the sign bit set to 1. The result is 0x7F, which is a number with the sign bit set to 0. We got 2 initial numbers with the same sign, and a result with a different sign, so we indicate an overflow. -128-1 resulted in 127, and thus the overflow flag is set to indicate a wrong signed result.
For your second problem, you are using 255 as the first number. The second number is implicitly 1, used by the INC instruction. So we really have the binary numbers 0xFF and 0x01. Both them have a different sign bit, so it is not possible to get an overflow (it is only possible to overflow when basically adding 2 numbers of the same sign, but it is never possible to overflow with 2 numbers of a different sign because they will never lead to go beyond the possible signed value). The result is 0x00, and it doesn't set the overflow flag because 255+1, or more exactly, -1+1 gives 0, which is obviously correct for signed arithmetic.
Remember that for the overflow flag to be set, the 2 numbers being added/subtracted need to have the sign bit with the same value, and then the result must have a sign bit with a value different from them.
What the processor does is set the appropriate flags for the results of these instructions (add, adc, dec, inc, sbb, sub) for both the signed and unsigned cases i e two different flag results for every op. The alternative would be having two sets of instructions where one sets signed-related flags and the other the unsigned-related. If the issuing compiler is using unsigned variables in the operation it will test carry and zero (jc, jnc, jb, jbe etc), if signed it tests overflow, sign and zero (jo, jno, jg, jng, jl, jle etc).