Just started to learn C, and i feel little bit confused.
I have some questions:
If i have the following code:
signed char x = 56;
// In the RAM, I will see 00111000 yes/no?
signed char z = -56;
// In the RAM, I will see 11001000 yes/no?
unsigned char y = 200;
// In the RAM, I will see 11001000 yes/no?
I have the following code:
if (z<0){
printf("0 is bigger then z ");
}
After compiling, how the assembly instructions know if z is -56 and not 200?(there is a special ASM instructions for signed and unsigned?).
As i mentioned in question number 1, the value of z and y is 11001000, and there is not any indicate to know if its signed or unsigned.
Apologize if i didn't find the right way to ask my question, hope you understand me
Thanks
Numbers are stored in binary. Negative numbers are usually stored as two's complement form, but C language allows different representations. So this one:
signed char z = -56;
// In the RAM, I will see 11001000 yes/no?
usually yes, but may be not on some exotic platforms.
Second question is too implementation specific. For example comparison against zero on x86 may be performed as self-comparison, and flags register would be affected, for unsigned comparison sign flag (SF) is ignored.
The compiler will generate the appropriate instructions for the signed and unsigned cases. I think it might better to see an example. The following code
void foobar();
void foo(unsigned char a)
{
if (a < 10)
foobar();
}
void bar(char a)
{
if (a < 10)
foobar();
}
Will translate to this MIPS code with GCC 5.4 using -O3 flag.
foo:
andi $4,$4,0x00ff
sltu $4,$4,10
bne $4,$0,$L4
nop
j $31
nop
$L4:
j foobar
nop
bar:
sll $4,$4,24
sra $4,$4,24
slt $4,$4,10
bne $4,$0,$L7
nop
j $31
nop
$L7:
j foobar
nop
This is the interesting part of the foo function (which use unsigned char type)
foo:
andi $4,$4,0x00ff
sltu $4,$4,10
As you can see sltu command used which is the unsinged version of slt. (You don't really have to know what it does)
While if we looking at the function bar relevants part
bar:
sll $4,$4,24
sra $4,$4,24
slt $4,$4,10
You can see that slt used which will treat its register operand as signed. The sll and sra pair doing sign extension since here the operands a was signed so its needed, while in unsigned case its not.
So you could see that different instructions generated with respect to the signdess of the operands.
The compiler will generate different instructions depending on whether it is an unsigned or signed type. And that is what tells the processor which way to treat it. So yes there are seperate instructions for signed and unsigned. With Intel processors, there are also seperate instructions depending on the width (char, short, int)
there is a special ASM instructions for signed and unsigned?
Yes, hardware generally has machine code instructions (or instruction sequences) that can
sign extend a byte to word size
zero extend a byte to word size
compare signed quantities for the various relations <, <=, >, >=
compare unsigned quantities for the various relations <, <=, >, >=
how the assembly instructions know if z is -56 and not 200?
In high level languages we associate a type with a variable. From then on the compiler knows the default way to interpret code that uses the variable. (We can override or change that default interpretation using a cast at usages of the variable.)
In machine code, there are only bytes, either in memory or in CPU registers. So, it is not how it is stored that matter (for signed vs. unsigned), but what instructions are used to access the storage. The compiler will use the right set of machine code instructions every time the variable is accessed.
While we store lots of things in memory, the processor has no concept of variable declarations. The processor only sees machine code instructions, and interprets all data types through the eyes of the instruction it is being told to execute.
As an assembly programmer, it is your job to apply the proper instructions (here signed vs. unsigned) to the same variable each time it is used. Using a byte as a signed variable and later as an unsigned variable, is a logic bug that is easy to do in assembly language.
Some assemblers will help if you use the wrong size to access a variable, but none that I know help if you use the proper size but incorrect signed-ness.
Computers do not know nor care about such things. Unsigned and signed is only relevant to the programmer. The value 0xFF can at the same time be -1, 255 or an address or a portion of an address. Part of a floating point number and so on. The computer does not care. HOW the programmer conveys their interpretation of the bits is through the program. Understanding that addition and subtraction also do not care about signed vs unsigned because it is the same logic, but other instructions like multiplies where the result is larger than the inputs or divide where the result is smaller than the inputs then there are unsigned and signed versions of the instructions or your processor may only have one and you have to synthesize the other (or none and you have to synthesize both).
int fun0 ( void )
{
return(5);
}
unsigned int fun1 ( void )
{
return(5);
}
00000000 <fun0>:
0: e3a00005 mov r0, #5
4: e12fff1e bx lr
00000008 <fun1>:
8: e3a00005 mov r0, #5
c: e12fff1e bx lr
no special bits nor nomenclature...bits is bits.
The compiler driven by the users indication of signed vs unsigned in the high level language drive instructions and data values that cause the alu to output flags that indicate greater than, less than, and equal through single flags or combinations, then a conditional branch can be taken based on the flag. Often but not always the compiler will generate the opposite if z < 0 then do something the compiler will say if z >= 0 then jump over the something.
Related
I have an assignment of expaining some seemingly strange behaviors of C code (running on x86). I can easily complete everything else but this one has really confused me.
Code snippet 1 outputs -2147483648
int a = 0x80000000;
int b = a / -1;
printf("%d\n", b);
Code snippet 2 outputs nothing, and gives a Floating point exception
int a = 0x80000000;
int b = -1;
int c = a / b;
printf("%d\n", c);
I well know the reason for the result of Code Snippet 1 (1 + ~INT_MIN == INT_MIN), but I can't quite understand how can integer division by -1 generate FPE, nor can I reproduce it on my Android phone (AArch64, GCC 7.2.0). Code 2 just output the same as Code 1 without any exceptions. Is it a hidden bug feature of x86 processor?
The assignment didn't tell anything else (including CPU architecture), but since the whole course is based on a desktop Linux distro, you can safely assume it's a modern x86.
Edit: I contacted my friend and he tested the code on Ubuntu 16.04 (Intel Kaby Lake, GCC 6.3.0). The result was consistent with whatever the assignment stated (Code 1 output the said thing and Code 2 crashed with FPE).
There are four things going on here:
gcc -O0 behaviour explains the difference between your two versions: idiv vs. neg. (While clang -O0 happens to compile them both with idiv). And why you get this even with compile-time-constant operands.
x86 idiv faulting behaviour vs. behaviour of the division instruction on ARM
If integer math results in a signal being delivered, POSIX require it to be SIGFPE: On which platforms does integer divide by zero trigger a floating point exception? But POSIX doesn't require trapping for any particular integer operation. (This is why it's allowed for x86 and ARM to be different).
The Single Unix Specification defines SIGFPE as "Erroneous arithmetic operation". It's confusingly named after floating point, but in a normal system with the FPU in its default state, only integer math will raise it. On x86, only integer division. On MIPS, a compiler could use add instead of addu for signed math, so you could get traps on signed add overflow. (gcc uses addu even for signed, but an undefined-behaviour detector might use add.)
C Undefined Behaviour rules (signed overflow, and division specifically) which let gcc emit code which can trap in that case.
gcc with no options is the same as gcc -O0.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.
This explains the difference between your two versions:
Not only does gcc -O0 not try to optimize, it actively de-optimizes to make asm that independently implements each C statement within a function. This allows gdb's jump command to work safely, letting you jump to a different line within the function and act like you're really jumping around in the C source. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? explains more about how and why -O0 compiles the way it does.
It also can't assume anything about variable values between statements, because you can change variables with set b = 4. This is obviously catastrophically bad for performance, which is why -O0 code runs several times slower than normal code, and why optimizing for -O0 specifically is total nonsense. It also makes -O0 asm output really noisy and hard for a human to read, because of all the storing/reloading, and lack of even the most obvious optimizations.
int a = 0x80000000;
int b = -1;
// debugger can stop here on a breakpoint and modify b.
int c = a / b; // a and b have to be treated as runtime variables, not constants.
printf("%d\n", c);
I put your code inside functions on the Godbolt compiler explorer to get the asm for those statements.
To evaluate a/b, gcc -O0 has to emit code to reload a and b from memory, and not make any assumptions about their value.
But with int c = a / -1;, you can't change the -1 with a debugger, so gcc can and does implement that statement the same way it would implement int c = -a;, with an x86 neg eax or AArch64 neg w0, w0 instruction, surrounded by a load(a)/store(c). On ARM32, it's a rsb r3, r3, #0 (reverse-subtract: r3 = 0 - r3).
However, clang5.0 -O0 doesn't do that optimization. It still uses idiv for a / -1, so both versions will fault on x86 with clang. Why does gcc "optimize" at all? See Disable all optimization options in GCC. gcc always transforms through an internal representation, and -O0 is just the minimum amount of work needed to produce a binary. It doesn't have a "dumb and literal" mode that tries to make the asm as much like the source as possible.
x86 idiv vs. AArch64 sdiv:
x86-64:
# int c = a / b from x86_fault()
mov eax, DWORD PTR [rbp-4]
cdq # dividend sign-extended into edx:eax
idiv DWORD PTR [rbp-8] # divisor from memory
mov DWORD PTR [rbp-12], eax # store quotient
Unlike imul r32,r32, there's no 2-operand idiv that doesn't have a dividend upper-half input. Anyway, not that it matters; gcc is only using it with edx = copies of the sign bit in eax, so it's really doing a 32b / 32b => 32b quotient + remainder. As documented in Intel's manual, idiv raises #DE on:
divisor = 0
The signed result (quotient) is too large for the destination.
Overflow can easily happen if you use the full range of divisors, e.g. for int result = long long / int with a single 64b / 32b => 32b division. But gcc can't do that optimization because it's not allowed to make code that would fault instead of following the C integer promotion rules and doing a 64-bit division and then truncating to int. It also doesn't optimize even in cases where the divisor is known to be large enough that it couldn't #DE
When doing 32b / 32b division (with cdq), the only input that can overflow is INT_MIN / -1. The "correct" quotient is a 33-bit signed integer, i.e. positive 0x80000000 with a leading-zero sign bit to make it a positive 2's complement signed integer. Since this doesn't fit in eax, idiv raises a #DE exception. The kernel then delivers SIGFPE.
AArch64:
# int c = a / b from x86_fault() (which doesn't fault on AArch64)
ldr w1, [sp, 12]
ldr w0, [sp, 8] # 32-bit loads into 32-bit registers
sdiv w0, w1, w0 # 32 / 32 => 32 bit signed division
str w0, [sp, 4]
ARM hardware division instructions don't raise exceptions for divide by zero or for INT_MIN/-1 overflow. Nate Eldredge commented:
The full ARM architecture reference manual states that UDIV or SDIV, when dividing by zero, simply return zero as the result, "without any indication that the division by zero occurred" (C3.4.8 in the Armv8-A version). No exceptions and no flags - if you want to catch divide by zero, you have to write an explicit test. Likewise, signed divide of INT_MIN by -1 returns INT_MIN with no indication of the overflow.
AArch64 sdiv documentation doesn't mention any exceptions.
However, software implementations of integer division may raise: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html. (gcc uses a library call for division on ARM32 by default, unless you set a -mcpu that has HW division.)
C Undefined Behaviour.
As PSkocik explains, INT_MIN / -1 is undefined behaviour in C, like all signed integer overflow. This allows compilers to use hardware division instructions on machines like x86 without checking for that special case. If it had to not fault, unknown inputs would require run-time compare-and branch checks, and nobody wants C to require that.
More about the consequences of UB:
With optimization enabled, the compiler can assume that a and b still have their set values when a/b runs. It can then see the program has undefined behaviour, and thus can do whatever it wants. gcc chooses to produce INT_MIN like it would from -INT_MIN.
On a 2's complement system, the most-negative number is its own negative. This is a nasty corner-case for 2's complement, because it means abs(x) can still be negative.
https://en.wikipedia.org/wiki/Two%27s_complement#Most_negative_number
int x86_fault() {
int a = 0x80000000;
int b = -1;
int c = a / b;
return c;
}
compile to this with gcc6.3 -O3 for x86-64
x86_fault:
mov eax, -2147483648
ret
but clang5.0 -O3 compiles to (with no warning even with -Wall -Wextra`):
x86_fault:
ret
Undefined Behaviour really is totally undefined. Compilers can do whatever they feel like, including returning whatever garbage was in eax on function entry, or loading a NULL pointer and an illegal instruction. e.g. with gcc6.3 -O3 for x86-64:
int *local_address(int a) {
return &a;
}
local_address:
xor eax, eax # return 0
ret
void foo() {
int *p = local_address(4);
*p = 2;
}
foo:
mov DWORD PTR ds:0, 0 # store immediate 0 into absolute address 0
ud2 # illegal instruction
Your case with -O0 didn't let the compilers see the UB at compile time, so you got the "expected" asm output.
See also What Every C Programmer Should Know About Undefined Behavior (the same LLVM blog post that Basile linked).
Signed int division in two's complement is undefined if:
the divisor is zero, OR
the dividend is INT_MIN (==0x80000000 if int is int32_t) and the divisor is -1 (in two's complement,
-INT_MIN > INT_MAX, which causes integer overflow, which is undefined behavior in C)
(https://www.securecoding.cert.org recommends wrapping integer operations in functions that check for such edge cases)
Since you're invoking undefined behavior by breaking rule 2, anything can happen, and as it happens, this particular anything on your platform happens to be an FPE signal being generated by your processor.
With undefined behavior very bad things could happen, and sometimes they do happen.
Your question has no sense in C (read Lattner on UB). But you could get the assembler code (e.g. produced by gcc -O -fverbose-asm -S) and care about machine code behavior.
On x86-64 with Linux integer overflow (and also integer division by zero, IIRC) gives a SIGFPE signal. See signal(7)
BTW, on PowerPC integer division by zero is rumored to give -1 at the machine level (but some C compilers generate extra code to test that case).
The code in your question is undefined behavior in C. The generated assembler code has some defined behavior (depends upon the ISA and processor).
(the assignment is done to make you read more about UB, notably Lattner 's blog, which you should absolutely read)
On x86 if you divide by actually using the idiv operation (which is not really necessary for constant arguments, not even for variables-known-to-be-constant, but it happened anyway), INT_MIN / -1 is one of the cases that results in #DE (divide error). It's really a special case of the quotient being out of range, in general that is possible because idiv divides an extra-wide dividend by the divisor, so many combinations cause overflow - but INT_MIN / -1 is the only case that isn't a div-by-0 that you can normally access from higher level languages since they typically do not expose the extra-wide-dividend capabilities.
Linux annoyingly maps the #DE to SIGFPE, which has probably confused everyone who dealt with it the first time.
Both cases are weird, as the first consists in dividing -2147483648 by -1 and should give 2147483648, and not the result you are receiving. The division by -1 (as the multiplication) should change the sign of the dividend to become a positive number. But there's no such positive number in int (this is what raises U.B.)
0x80000000 is not a valid int number in a 32 bit architecture (as stated in the standard) that represents numbers in two's complement. If you calculate its negative value, you'll get again to it, as it has no opposite number around zero.
When you do arithmetic with signed integers, it works well for integer addition and substraction (always with care, as you are quite easy to overflow, when you add the largest value to some int) but you cannot safely use it to multiply or divide. So in this case, you are invoking Undefined Behaviour. You always invoke undefined behaviour (or implementation defined behaviour, which is similar, but not the same) on overflow with signed integers, as implementations vary widely in implementing that.
I'll try to explain what can be happening (with no trustness), as the compiler is free to do anything, or nothing at all.
Concretely, 0x80000000 as represented in two's complement is
1000_0000_0000_0000_0000_0000_0000
if we complement this number, we get (first complement all bits, then add one)
0111_1111_1111_1111_1111_1111_1111 + 1 =>
1000_0000_0000_0000_0000_0000_0000 !!! the same original number.
suprisingly the same number.... You had an overflow (there's no counterpart positive value to this number, as we overflown when changing sign) then you take out the sign bit, masking with
1000_0000_0000_0000_0000_0000_0000 &
0111_1111_1111_1111_1111_1111_1111 =>
0000_0000_0000_0000_0000_0000_0000
which is the number you use as dividend.
But as I said before, this is what can be happening on your system, but not sure, as the standard says this is Undefined behaviour and, as so, you can get whatever different behaviour from your computer/compiler.
The different results you are obtaining are probably the result of the first operation being done by the compiler, while the second one is done by the program itself. In the first case you are assigning 0x8000_0000 to the variable, while in the second you are calculating the value in the program. Both cases are undefined behaviour and you are seeing it happening in front your eyes.
#NOTE 1
As the compiler is concerned, and the standard doesn't say anything about the valid ranges of int that must be implemented (the standard doesn't include normally 0x8000...000 in two's complement architectures) the correct behaviour of 0x800...000 in two's complement architectures should be, as it has the largest absolute value for an integer of that type, to give a result of 0 when dividing a number by it. But hardware implementations normally don't allow to divide by such a number (as many of them doesn't even implement signed integer division, but simulate it from unsigned division, so many simply extract the signs and do an unsigned division) That requires a check before division, and as the standard says Undefined behaviour, implementations are allowed to freely avoid such a check, and disallow dividing by that number. They simply select the integer range to go from 0x8000...001 to 0xffff...fff, and then from 0x000..0000 to 0x7fff...ffff, disallowing the value 0x8000...0000 as invalid.
I have just a little theoretical curiosity. The == operator in C returns 1 in case of positive equality, 0 otherwise. My knowledge of assembly is very limited. However I was wondering if it could be possible, theoretically, to implement a new operator that returns ~0 in case of positive equality, 0 otherwise – but at one condition: it must produce the same number of assembly instructions as the == operator. It's really just a theoretical curiosity, I have no practical uses in mind.
EDIT
My question targets x86 CPUs, however I am very curious to know if there are architectures that natively do that.
SECOND EDIT
As Sneftel has pointed out, nothing similar to the SETcc instructions [1] – but able to convert flag register bits into 0/~0 values (instead of the classical 0/1) – exists. So the answer to my question seems to be no.
THIRD EDIT
A little note. I am not trying to represent a logical true as ~0, I am trying to understand if a logical true can also be optionally represented as ~0 when needed, whithout further effort, within a language that already normally represents true as 1. And for this I had hypothized a new operator that “returns” numbers, not booleans (the natural logical true “returned” by == remains represented as 1) – otherwise I would have asked whether == could be re-designed to “return” ~0 instead of 1. You can think of this new operator as half-belonging to the family of bitwise operators, which “return” numbers, not booleans (and by booleans I don't mean boolean data types, I mean anything outside of the number pair 0/1, which is what a boolean is intended in C as a result of a logical operation).
I know that all of this might sound futile, but I had warned: it is a theoretical question.
However here my question seems to be addressed explicitly:
Some languages represent a logical one as an integer with all bits set. This representation can be obtained by choosing the logically opposite condition for the SETcc instruction, then decrementing the result. For example, to test for overflow, use the SETNO instruction, then decrement the result.
So it seems there is no direct instruction, since using SETNE and then decrementing means adding one more instruction.
EDIT: as other people are pointing out, there are some flavors of "conditionally assign 0/1" out there. Kind of undermines my point :) Apparently, the 0/1 boolean type admits a slightly deeper optimization than a 0/~0 boolean.
The "operator returns a value" notion is a high level one, it's not preserved down to the assembly level. That 1/0 may only exist as a bit in the flags register, or not even that.
In other words, assigning the C-defined value of the equality operator to an int sized variable is not a primitive on the assembly level. If you write x = (a == b), the compiler might implement it as
cmp a, b ; set the Z flag
cmovz x, 1 ; if equals, assign 1
cmovnz x, 0 ; if not equals, assign 0
Or it can be done with conditional jumps. As you can see, assigning a ~0 as the value for TRUE will take the same commands, just with a different operand.
None of the architectures that I'm familiar with implement equality comparison as "assign a 1 or 0 to a general purpose register".
There is no assembly implementation of a C operator. For instance, there is no x86 instruction which compares two arguments and results in a 0 or 1, only one which compares two arguments and puts the result in a bit in the flag register. And that's not usually what happens when you use ==.
Example:
void foo(int a, int b) {
if(a == b) { blah(); }
}
produces the following assembly, more or less:
foo(int, int):
cmp %edi, %esi
je .L12
rep ret
.L12:
jmp blah()
Note that nothing in there involves a 0/1 value. If you want that, you have to really ask for it:
int bar(int a, int b) {
return a == b;
}
which becomes:
bar(int, int):
xor %eax, %eax
cmp %edi, %esi
sete %al
ret
I suspect the existence of the SETcc instructions is what prompted your question, since they convert flag register bits into 0/1 values. There is no corresponding instruction which converts them into 0/~0: GCC instead does a clever little DEC to map them. But in general, the result of == exists only as an abstract and optimizer-determined difference in machine state between the two.
Incidentally, I would not be surprised at all if some x86 implementations chose to fuse SETcc and a following DEC into a single micro-op; I know this is done with other common instruction pairs. There is no simple relationship between a stream of instructions and a number of cycles.
For just 1 extra cycle you can just negate the /output/.
Internally in 8086, the comparison operations only exist in the flags. Getting the value of the flags into a variable takes extra code. It is pretty much the same code whether you want true as 1 or -1. Generally a compiler doesn't actually generate the value 0 or 1 when evaluating an if statement, but uses the Jcc instructions directly on the flags generated by comparison operations. https://pdos.csail.mit.edu/6.828/2006/readings/i386/Jcc.htm
With 80386, SETcc was added, which only ever sets 0 or 1 as the answer, so that is the preferred arrangement if the code insists on storing the answer. https://pdos.csail.mit.edu/6.828/2006/readings/i386/SETcc.htm
And there are lots of new compare instructions that save results to registers going forward. The flags have been seen as a bottleneck for instruction pipeline stalls in modern processors, and very much are disfavoured by code optimisation.
Of course there are all sorts of tricks you can do to get 0, 1, or -1 given a particular set of values to compare. Needless to say the compiler has been optimised to generate 1 for true when applying these tricks, and wherever possible, it doesn't actually store the value at all, but just reorganises your code to avoid it.
SIMD vector comparisons do produce vectors of 0 / -1 results. This is the case on x86 MMX/SSE/AVX, ARM NEON, PowerPC Altivec, etc. (They're 2's complement machines, so I like to write -1 instead of ~0 to represent the elements of all-zero / all-one bits).
e.g. pcmpeqd xmm0, xmm1 replaces each element of xmm0 with xmm0[i] == xmm1[i] ? -1 : 0;
This lets you use them as AND masks, because SIMD code can't branch separately on each vector element without unpacking to scalar and back. It has to be branchless. How to use if condition in intrinsics
e.g. to blend 2 vectors based on a condition, without SSE4.1 pblendvb / blendvps, you'd compare and then AND / ANDNOT / OR. e.g. from Substitute a byte with another one
__m128i mask = _mm_cmpeq_epi8(inp, val); // movdqa xmm1, xmm0 / PCMPEQB xmm1, xmm2
// zero elements in the original where there was a match (that we want to replace)
inp = _mm_andnot_si128(mask, inp); // inp &= ~mask; // PANDN xmm0, xmm1
// zero elements where we keep the original
__m128i tmp = _mm_and_si128(newvals, mask); // newvals & mask; // PAND xmm3, xmm1
inp = _mm_or_si128(inp, tmp); // POR xmm0, xmm1
But if you want to count matches, you can subtract the compare result. total -= -1 avoids having to negate the vector elements. How to count character occurrences using SIMD
Or to conditionally add something, instead of actually blending, just do total += (x & mask), because 0 is the identity element for operations like ADD (and some others like XOR and OR).
See How to access a char array and change lower case letters to upper case, and vice versa and Convert a String In C++ To Upper Case for examples in C with intrinsics and x86 asm.
All of this has nothing to do with C operators and implicit conversion from boolean to integer.
In C and C++, operators return a boolean true/false condition, which in asm for most machines for scalar code (not auto-vectorized) maps to a bit in a flag register.
Converting that to an integer in a register is a totally separate thing.
But fun fact: MIPS doesn't have a flags register: it has some compare-and-branch instructions for simple conditions like reg == reg or reg != reg (beq and bne). And branch on less-than-zero (branch on the sign bit of one register): bltz $reg, target.
(And an architectural $zero register that always reads as zero, so you can use that implement branch if reg !=0 or reg == 0).
For more complex conditions, you use slt (set on less-than) or sltu (set on less-than-unsigned) to compare into an integer register. Like slt $t4, $t1, $t0 implements t4 = t1 < t0, producing a 0 or 1. Then you can branch on that being 0 or not, or combine multiple conditions with boolean AND / OR before branching on that. If one of your inputs is an actual bool that's already 0 or 1, it can be optimized into this without an slt.
Incomplete instruction listing of classic MIPS instructions (not including pseudo-instructions like blt that assemble to slt into $at + bne: http://www.mrc.uidaho.edu/mrc/people/jff/digital/MIPSir.html
But MIPS32r6 / MIPS64r6 changed this: instructions generating truth values now generate all zeroes or all ones instead of just clearing/setting the 0-bit, according to https://en.wikipedia.org/wiki/MIPS_architecture#MIPS32/MIPS64_Release_6. MIPS32/64 r6 is not binary compatible with previous MIPS ISAs, it also rearranged some opcodes. And because of this change, not even asm source compatible! But it's a definite change for the better.
Fun fact, there is an undocumented 8086 SALC instruction (set AL from carry) that's still supported in 16/32-bit mode by modern Intel (and AMD?) CPUs.
It's basically like sbb al,al without setting flags: AL = CF ? -1 : 0. http://os2museum.com/wp/undocumented-8086-opcodes.
Subtract-with-borrow with the same input twice does x-x - CF on x86, where CF is a borrow for subtraction. And x-x is of course always zero. (On some other ISAs, like ARM, the carry flag meaning is opposite for subtraction, C set means "no borrow".)
In general, you can do sbb edx,edx (or any register you want) to convert CF into a 0 / -1 integer. But this only works for CF; the carry flag is special and there's nothing equivalent for other flags.
Some AMD CPUs even recognize sbb same,same as independent of the old value of the register, only dependent on CF, like xor-zeroing. On other CPUs it still has the same architectural effect, but with a microarchitectural false dependency on the old value of EDX.
I have an assignment of expaining some seemingly strange behaviors of C code (running on x86). I can easily complete everything else but this one has really confused me.
Code snippet 1 outputs -2147483648
int a = 0x80000000;
int b = a / -1;
printf("%d\n", b);
Code snippet 2 outputs nothing, and gives a Floating point exception
int a = 0x80000000;
int b = -1;
int c = a / b;
printf("%d\n", c);
I well know the reason for the result of Code Snippet 1 (1 + ~INT_MIN == INT_MIN), but I can't quite understand how can integer division by -1 generate FPE, nor can I reproduce it on my Android phone (AArch64, GCC 7.2.0). Code 2 just output the same as Code 1 without any exceptions. Is it a hidden bug feature of x86 processor?
The assignment didn't tell anything else (including CPU architecture), but since the whole course is based on a desktop Linux distro, you can safely assume it's a modern x86.
Edit: I contacted my friend and he tested the code on Ubuntu 16.04 (Intel Kaby Lake, GCC 6.3.0). The result was consistent with whatever the assignment stated (Code 1 output the said thing and Code 2 crashed with FPE).
There are four things going on here:
gcc -O0 behaviour explains the difference between your two versions: idiv vs. neg. (While clang -O0 happens to compile them both with idiv). And why you get this even with compile-time-constant operands.
x86 idiv faulting behaviour vs. behaviour of the division instruction on ARM
If integer math results in a signal being delivered, POSIX require it to be SIGFPE: On which platforms does integer divide by zero trigger a floating point exception? But POSIX doesn't require trapping for any particular integer operation. (This is why it's allowed for x86 and ARM to be different).
The Single Unix Specification defines SIGFPE as "Erroneous arithmetic operation". It's confusingly named after floating point, but in a normal system with the FPU in its default state, only integer math will raise it. On x86, only integer division. On MIPS, a compiler could use add instead of addu for signed math, so you could get traps on signed add overflow. (gcc uses addu even for signed, but an undefined-behaviour detector might use add.)
C Undefined Behaviour rules (signed overflow, and division specifically) which let gcc emit code which can trap in that case.
gcc with no options is the same as gcc -O0.
-O0
Reduce compilation time and make debugging produce the expected results. This is the default.
This explains the difference between your two versions:
Not only does gcc -O0 not try to optimize, it actively de-optimizes to make asm that independently implements each C statement within a function. This allows gdb's jump command to work safely, letting you jump to a different line within the function and act like you're really jumping around in the C source. Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? explains more about how and why -O0 compiles the way it does.
It also can't assume anything about variable values between statements, because you can change variables with set b = 4. This is obviously catastrophically bad for performance, which is why -O0 code runs several times slower than normal code, and why optimizing for -O0 specifically is total nonsense. It also makes -O0 asm output really noisy and hard for a human to read, because of all the storing/reloading, and lack of even the most obvious optimizations.
int a = 0x80000000;
int b = -1;
// debugger can stop here on a breakpoint and modify b.
int c = a / b; // a and b have to be treated as runtime variables, not constants.
printf("%d\n", c);
I put your code inside functions on the Godbolt compiler explorer to get the asm for those statements.
To evaluate a/b, gcc -O0 has to emit code to reload a and b from memory, and not make any assumptions about their value.
But with int c = a / -1;, you can't change the -1 with a debugger, so gcc can and does implement that statement the same way it would implement int c = -a;, with an x86 neg eax or AArch64 neg w0, w0 instruction, surrounded by a load(a)/store(c). On ARM32, it's a rsb r3, r3, #0 (reverse-subtract: r3 = 0 - r3).
However, clang5.0 -O0 doesn't do that optimization. It still uses idiv for a / -1, so both versions will fault on x86 with clang. Why does gcc "optimize" at all? See Disable all optimization options in GCC. gcc always transforms through an internal representation, and -O0 is just the minimum amount of work needed to produce a binary. It doesn't have a "dumb and literal" mode that tries to make the asm as much like the source as possible.
x86 idiv vs. AArch64 sdiv:
x86-64:
# int c = a / b from x86_fault()
mov eax, DWORD PTR [rbp-4]
cdq # dividend sign-extended into edx:eax
idiv DWORD PTR [rbp-8] # divisor from memory
mov DWORD PTR [rbp-12], eax # store quotient
Unlike imul r32,r32, there's no 2-operand idiv that doesn't have a dividend upper-half input. Anyway, not that it matters; gcc is only using it with edx = copies of the sign bit in eax, so it's really doing a 32b / 32b => 32b quotient + remainder. As documented in Intel's manual, idiv raises #DE on:
divisor = 0
The signed result (quotient) is too large for the destination.
Overflow can easily happen if you use the full range of divisors, e.g. for int result = long long / int with a single 64b / 32b => 32b division. But gcc can't do that optimization because it's not allowed to make code that would fault instead of following the C integer promotion rules and doing a 64-bit division and then truncating to int. It also doesn't optimize even in cases where the divisor is known to be large enough that it couldn't #DE
When doing 32b / 32b division (with cdq), the only input that can overflow is INT_MIN / -1. The "correct" quotient is a 33-bit signed integer, i.e. positive 0x80000000 with a leading-zero sign bit to make it a positive 2's complement signed integer. Since this doesn't fit in eax, idiv raises a #DE exception. The kernel then delivers SIGFPE.
AArch64:
# int c = a / b from x86_fault() (which doesn't fault on AArch64)
ldr w1, [sp, 12]
ldr w0, [sp, 8] # 32-bit loads into 32-bit registers
sdiv w0, w1, w0 # 32 / 32 => 32 bit signed division
str w0, [sp, 4]
ARM hardware division instructions don't raise exceptions for divide by zero or for INT_MIN/-1 overflow. Nate Eldredge commented:
The full ARM architecture reference manual states that UDIV or SDIV, when dividing by zero, simply return zero as the result, "without any indication that the division by zero occurred" (C3.4.8 in the Armv8-A version). No exceptions and no flags - if you want to catch divide by zero, you have to write an explicit test. Likewise, signed divide of INT_MIN by -1 returns INT_MIN with no indication of the overflow.
AArch64 sdiv documentation doesn't mention any exceptions.
However, software implementations of integer division may raise: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4061.html. (gcc uses a library call for division on ARM32 by default, unless you set a -mcpu that has HW division.)
C Undefined Behaviour.
As PSkocik explains, INT_MIN / -1 is undefined behaviour in C, like all signed integer overflow. This allows compilers to use hardware division instructions on machines like x86 without checking for that special case. If it had to not fault, unknown inputs would require run-time compare-and branch checks, and nobody wants C to require that.
More about the consequences of UB:
With optimization enabled, the compiler can assume that a and b still have their set values when a/b runs. It can then see the program has undefined behaviour, and thus can do whatever it wants. gcc chooses to produce INT_MIN like it would from -INT_MIN.
On a 2's complement system, the most-negative number is its own negative. This is a nasty corner-case for 2's complement, because it means abs(x) can still be negative.
https://en.wikipedia.org/wiki/Two%27s_complement#Most_negative_number
int x86_fault() {
int a = 0x80000000;
int b = -1;
int c = a / b;
return c;
}
compile to this with gcc6.3 -O3 for x86-64
x86_fault:
mov eax, -2147483648
ret
but clang5.0 -O3 compiles to (with no warning even with -Wall -Wextra`):
x86_fault:
ret
Undefined Behaviour really is totally undefined. Compilers can do whatever they feel like, including returning whatever garbage was in eax on function entry, or loading a NULL pointer and an illegal instruction. e.g. with gcc6.3 -O3 for x86-64:
int *local_address(int a) {
return &a;
}
local_address:
xor eax, eax # return 0
ret
void foo() {
int *p = local_address(4);
*p = 2;
}
foo:
mov DWORD PTR ds:0, 0 # store immediate 0 into absolute address 0
ud2 # illegal instruction
Your case with -O0 didn't let the compilers see the UB at compile time, so you got the "expected" asm output.
See also What Every C Programmer Should Know About Undefined Behavior (the same LLVM blog post that Basile linked).
Signed int division in two's complement is undefined if:
the divisor is zero, OR
the dividend is INT_MIN (==0x80000000 if int is int32_t) and the divisor is -1 (in two's complement,
-INT_MIN > INT_MAX, which causes integer overflow, which is undefined behavior in C)
(https://www.securecoding.cert.org recommends wrapping integer operations in functions that check for such edge cases)
Since you're invoking undefined behavior by breaking rule 2, anything can happen, and as it happens, this particular anything on your platform happens to be an FPE signal being generated by your processor.
With undefined behavior very bad things could happen, and sometimes they do happen.
Your question has no sense in C (read Lattner on UB). But you could get the assembler code (e.g. produced by gcc -O -fverbose-asm -S) and care about machine code behavior.
On x86-64 with Linux integer overflow (and also integer division by zero, IIRC) gives a SIGFPE signal. See signal(7)
BTW, on PowerPC integer division by zero is rumored to give -1 at the machine level (but some C compilers generate extra code to test that case).
The code in your question is undefined behavior in C. The generated assembler code has some defined behavior (depends upon the ISA and processor).
(the assignment is done to make you read more about UB, notably Lattner 's blog, which you should absolutely read)
On x86 if you divide by actually using the idiv operation (which is not really necessary for constant arguments, not even for variables-known-to-be-constant, but it happened anyway), INT_MIN / -1 is one of the cases that results in #DE (divide error). It's really a special case of the quotient being out of range, in general that is possible because idiv divides an extra-wide dividend by the divisor, so many combinations cause overflow - but INT_MIN / -1 is the only case that isn't a div-by-0 that you can normally access from higher level languages since they typically do not expose the extra-wide-dividend capabilities.
Linux annoyingly maps the #DE to SIGFPE, which has probably confused everyone who dealt with it the first time.
Both cases are weird, as the first consists in dividing -2147483648 by -1 and should give 2147483648, and not the result you are receiving. The division by -1 (as the multiplication) should change the sign of the dividend to become a positive number. But there's no such positive number in int (this is what raises U.B.)
0x80000000 is not a valid int number in a 32 bit architecture (as stated in the standard) that represents numbers in two's complement. If you calculate its negative value, you'll get again to it, as it has no opposite number around zero.
When you do arithmetic with signed integers, it works well for integer addition and substraction (always with care, as you are quite easy to overflow, when you add the largest value to some int) but you cannot safely use it to multiply or divide. So in this case, you are invoking Undefined Behaviour. You always invoke undefined behaviour (or implementation defined behaviour, which is similar, but not the same) on overflow with signed integers, as implementations vary widely in implementing that.
I'll try to explain what can be happening (with no trustness), as the compiler is free to do anything, or nothing at all.
Concretely, 0x80000000 as represented in two's complement is
1000_0000_0000_0000_0000_0000_0000
if we complement this number, we get (first complement all bits, then add one)
0111_1111_1111_1111_1111_1111_1111 + 1 =>
1000_0000_0000_0000_0000_0000_0000 !!! the same original number.
suprisingly the same number.... You had an overflow (there's no counterpart positive value to this number, as we overflown when changing sign) then you take out the sign bit, masking with
1000_0000_0000_0000_0000_0000_0000 &
0111_1111_1111_1111_1111_1111_1111 =>
0000_0000_0000_0000_0000_0000_0000
which is the number you use as dividend.
But as I said before, this is what can be happening on your system, but not sure, as the standard says this is Undefined behaviour and, as so, you can get whatever different behaviour from your computer/compiler.
The different results you are obtaining are probably the result of the first operation being done by the compiler, while the second one is done by the program itself. In the first case you are assigning 0x8000_0000 to the variable, while in the second you are calculating the value in the program. Both cases are undefined behaviour and you are seeing it happening in front your eyes.
#NOTE 1
As the compiler is concerned, and the standard doesn't say anything about the valid ranges of int that must be implemented (the standard doesn't include normally 0x8000...000 in two's complement architectures) the correct behaviour of 0x800...000 in two's complement architectures should be, as it has the largest absolute value for an integer of that type, to give a result of 0 when dividing a number by it. But hardware implementations normally don't allow to divide by such a number (as many of them doesn't even implement signed integer division, but simulate it from unsigned division, so many simply extract the signs and do an unsigned division) That requires a check before division, and as the standard says Undefined behaviour, implementations are allowed to freely avoid such a check, and disallow dividing by that number. They simply select the integer range to go from 0x8000...001 to 0xffff...fff, and then from 0x000..0000 to 0x7fff...ffff, disallowing the value 0x8000...0000 as invalid.
I had take up a computer organization course a year ago and now I have a follow up to it as 'Computer architecture' , I am using the 3rd edition of John Hennessy's book 'Quantitative approach to computer architecture', I went through the MIPS ISA but still need some help , can you explain this line of code in greater detail
Source code:
for(i=1000; i>0; i--)
x[i] = x[i] + s;
Assembly code:
Loop: L.D F0, 0(R1) ; F0 = array element
ADD.D F4, F0, F2 ; add scalar
S.D F4, 0(R1) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer
BNE R1, R2, Loop ; branch if R1 != R2
This is given as an example for loop unrolling to exploit ILP , I have a few doubts . I do get it that the array starts at Mem[0+R1] and goes backwards till Mem[R+8](as given in the text) , any reason for this or they just randomly took this location?
Also why use a DADDUI (unsigned ) when we are adding a signed number (-8) ?
Please give a detailed overview of this so that i can follow along the rest of the topics.
Thanks
The memory accesses are performed to the addesses and in the order as specified by the loop in the source code.
The daddiu instruction is sufficient to perform such address arithmetic. The "negative" value accomplishes subtraction in two's-complement. Addresses are neither negative nor positive; they are just bit-patterns. Refer to an ISA reference to learn more about MIPS and instructions.
The 16-bit signed immediate is added to the 64-bit value in GPR rs and
the 64-bit arithmetic result is placed into GPR rt . No Integer
Overflow exception occurs under any circumstances.
…
The term “unsigned” in the instruction name is a misnomer; this operation is 64-bit modulo arithmetic that does not
trap on overflow. It is appropriate for unsigned arithmetic such as address arithmetic, or integer arithmetic environments that ignore overflow, such as C language arithmetic.
The example is not optimized or unrolled. It's just a literal translation of the source.
In Assembler i can use the MUL command and get a 64 bit Result EAX:EDX,
how can i do the same in C ? http://siyobik.info/index.php?module=x86&id=210
My approach to use a uint64_t and shift the Result don't work^^
Thank you for your help (=
Me
Any decent compiler will just do it when asked.
For example using VC++ 2010, the following code:
unsigned long long result ;
unsigned long a = 0x12345678 ;
unsigned long b = 0x87654321 ;
result = (unsigned long long)a * b ;
generates the following assembler:
mov eax,dword ptr [b]
mov ecx,dword ptr [a]
mul eax,ecx
mov dword ptr [result],eax
mov dword ptr [a],edx
Post some code. This works for me:
#include <inttypes.h>
#include <stdio.h>
int main(void) {
uint32_t x, y;
uint64_t z;
x = 0x10203040;
y = 0x3000;
z = (uint64_t)x * y;
printf("%016" PRIX64 "\n", z);
return 0;
}
See if you can get the equivalent of __emul or __emulu for your compiler(or just use this if you've got an MS compiler). though 64bit multiply should automatically work unless your sitting behind some restriction or other funny problem(like _aulmul)
You mean to multiply two 32 bit quantities to obtain a 64 bit result?
This is not foreseen in C by itself, either you have tow 32 bit in such as uint32_t and then the result is of the same width. Or you cast before to uint64_t but then you loose the advantage of that special (and fast) multiply.
The only way I see is to use inline assembler extensions. gcc is quite good in this, you may produce quite optimal code. But this isn't portable between different versions of compilers. (Many public domain compilers adopt the gcc, though, I think)
#include
/* The name says it all. Multiply two 32 bit unsigned ints and get
* one 64 bit unsigned int.
*/
uint64_t mul_U32xU32_u64(uint32_t a, uint32_t x) {
return a * (uint64_t)b; /* Note about the cast below. */
}
This produces:
mul_U32xU32_u64:
movl 8(%esp), %eax
mull 4(%esp)
popl %ebp
ret
When compiled with:
gcc -m32 -O3 -fomit-frame-pointer -S mul.c
Which uses the mul instruction (called mull here for multiply long, which is how the gnu assembler for x86 likes it) in the way that you want.
In this case one of the parameters was pulled directly from the stack rather than placed in a register (the 4(%esp) thing means 4 bytes above the stack pointer, and the 4 bytes being skipped over are the return address) because the numbers were passed into the function and would have been pushed onto the stack (as per the x86 ABI (application binary interface) ).
If you inlined the function or just did the math in it in your code it would most likely result in using the mul instruction in many cases, though optimizing compilers may also replace some multiplications with simpler code if they can tell that it would work (for instance it could turn this into a shift or even a constant if the one or more of the arguments were known).
In the C code at least one of the arguments had to be cast to a 64 bit value so that the compiler would produce a 64 bit result. Even if the compiler had to use code that produced a 64 bit result when multiplying 32 bit values, it may have not considered the top half of it to be important because according to the rules of C operations usually result in a value with the same type as the value with the largest range out of its components (except you can sometimes argue that is not really exactly what it does).
You cannot do exactly that in C, i.e. you cannot multiply two N-bit values and obtain a 2N-bit value as the result. Semantics of C multiplication is different from that of your machine multiplication. In C the multiplication operator is always applied to values of the same type T (so called usual arithmetic conversions take care of that) and produces the result of the same type T.
If you run into overflow on multiplication, you have to use a bigger type for the operands. If there's no bigger type, you are out of luck (i.e. you have no other choice but to use library-level implementation of large multiplication).
For example, if the largest integer type of your platform is a 64-bit type, then at assembly level on your machine you have access to mul operation producing the correct 128-bit result. At the language level you have no access to such multiplication.