I tried to set an external variable and get its value afterwards, but the value I got was not correct. Do external variables always need to be volatile when compiled with gcc?
The code is as follows (updated the complete code, the previous access to the memory address 0x00100000 is changed to the another variable):
main.c
extern unsigned int ex_var;
extern unsigned int another;
int main ()
{
ex_var = 56326;
unsigned int *pos=&ex_var+16;
for (int i = 0; i < 6; i++ )
{
*pos++ = 1;
}
another = ex_var;
}
another.c
unsigned int ex_var; // the address of this variable is set to right before valid_address
unsigned int valid_address[1024]; // to indicate valid memory address
unsigned int another;
And the value set to another is not 56326.
Note: another.c seems to be strange to indicate that the memory region after ex_var is valid. For the actual running example on bear metal, please refer to this post.
Here is the disassembly of main.c. It is compiled with x86-64 gcc 12.2 with -O1 option:
main:
mov eax, OFFSET FLAT:ex_var+64
.L2:
add rax, 4
mov DWORD PTR [rax-4], 1
cmp rax, OFFSET FLAT:ex_var+88
jne .L2
mov eax, DWORD PTR ex_var[rip]
mov DWORD PTR another[rip], eax
mov eax, 0
ret
It can be found that the code for setting the external variable ex_var is optimized out.
I tried several versions of gcc, including x86-64 gcc, x86 gcc, arm64 gcc, and arm gcc, and it seems that all tested gcc versions above 8.x have such issue. Note that optimization option -O1 or above is needed to reproduce this issue.
The code can be found at this link at Compiler Explorer.
Update:
This bug in the above code is not related to external references.
Here is another example code that has the same bug. Note that it should be compiled with -O1 or above. You can try it at Compiler Explorer.
#include <stdio.h>
unsigned int var;
// In embedded environment, LD files can be used to make valid_address stores right after var
volatile unsigned int valid_address[1024];
int main ()
{
var = 56326;
unsigned int *ttb=&var;
ttb += 16;
for (int i = 0; i < 8; i++ )
{
*ttb++ = 1;
}
valid_address[0] = var;
printf("Value is: %d", valid_address[0]);
}
If you compile this code with gcc like
gcc -O1 main1.c
and execute this code, you might get the following output:
Value is: 0
Which is not correct.
The calculation &ex_var+16 is not defined by the C standard (because it only defines pointer arithmetic within an object, including to the address just beyond its end) and the assignment *pos++ = 1 is not defined by the C standard (because, for the purposes of the standard, pos does not point to an object). When there is behavior not defined by the C standard on a code path, the standard does not define any behavior on the code path.
You can make the behavior defined, to the extent the compiler can see, by declaring ex_var as an array of unknown size, so that the address calculation and the assignments would be defined if this translation unit were linked with another that defined ex_var to be an array of sufficient size:
extern unsigned int ex_var[];
int main ()
{
ex_var[0] = 56326;
unsigned int *pos = ex_var+16;
for (int i = 0; i < 6; i++ )
{
*pos++ = 1;
}
*(volatile unsigned int*)(0x00100000) = ex_var[0];
}
(Note that *(volatile unsigned int*)(0x00100000) = remains not defined by the C standard, but GCC is intended for some use in bare-metal environments and appears to work with this. Additional compilation switches might be necessary to ensure it is defined for GCC’s purposes.)
This yields assembly that sets ex_var[0] and uses it in the assignment to 0x00100000:
main:
mov DWORD PTR ex_var[rip], 56326
…
mov eax, DWORD PTR ex_var[rip]
mov DWORD PTR ds:1048576, eax
mov eax, 0
ret
Related
Recently I read the code of one public C library and found below function definition:
void* block_alloc(void** block, size_t* len, size_t type_size)
{
return malloc(type_size);
(void)block;
(void)len;
}
I wonder whether it will arrive at the statements after return. If not, what's the purpose of these 2 statements that convert some data to void ?
As Basil notes, the (void) statements are likely intended to silence compiler warnings about the unused parameters. But - you can move the (void) statements before the return to make them less confusing, and with the same effect.
In fact, there's yet another way to achieve the same effect, without resorting to any extra statements. It's supported by many compilers already today, although it's not officially in the C standard before C2X:
void* block_alloc(void**, size_t*, size_t type_size)
{
return malloc(type_size);
}
if you don't name the parameters, typical compilers don't expect you to be using them.
First, these statements appearing in the block after a return will never be executed.
Check by reading some C standard like n1570.
Second, on some compilers (perhaps GCC 10 invoked as gcc -Wall -Wextra) the useless statements might avoid some warnings.
In my opinion, coding these statements before the return won't change the machine code emitted by an optimizing compiler (use gcc -Wall -Wextra -O2 -fverbose-asm -S to check and emit the assembler code) and makes the C source code more understandable.
GCC provides, as an extension, the variable __attribute__ named unused.
Perhaps in your software your block_alloc is assigned to some important function pointer (whose signature is requested)
It is used to silence the warnings. Some programming standards required all the parameters to be used in the function body, and their static analyzers will not pass the code without it.
It is added after the return to prevent a generation of the code in some circumstances:
int foo(volatile unsigned x)
{
(void)x;
return 0;
}
int foo1(volatile unsigned x)
{
return 0;
(void)x;
}
foo:
mov DWORD PTR [rsp-4], edi
mov eax, DWORD PTR [rsp-4]
xor eax, eax
ret
foo1:
mov DWORD PTR [rsp-4], edi
xor eax, eax
ret
In a C program, there is a swap function and this function takes a parameter called x.I expect it to return it by changing the x value in the swap function inside the main function.
When I value the parameter as a variable, I want it, but when I set an integer value directly for the parameter, the program produces random outputs.
#include <stdio.h>
int swap (int x) {
x = 20;
}
int main(void){
int y = 100;
int a = swap(y);
printf ("Value: %d", a);
return 0;
}
Output of this code: 100 (As I wanted)
But this code:
#include <stdio.h>
int swap (int x) {
x = 20;
}
int main(void){
int a = swap(100);
printf ("Value: %d", a);
return 0;
}
Return randomly values such as Value: 779964766 or Value:1727975774.
Actually, in two codes, I give an integer type value into the function, even the same values, but why are the outputs different?
First of all, C functions are call-by-value: the int x arg in the function is a copy. Modifying it doesn't modify the caller's copy of whatever they passed, so your swap makes zero sense.
Second, you're using the return value of the function, but you don't have a return statement. In C (unlike C++), it's not undefined behaviour for execution to fall off the end of a non-void function (for historical reasons, before void existed, and function returns types defaulted to int). But it is still undefined behaviour for the caller to use a return value when the function didn't return one.
In this case, returning 100 was the effect of the undefined behaviour (of using the return value of a function where execution falls off the end without a return statement). This is a coincidence of how GCC compiles in debug mode (-O0):
GCC -O0 likes to evaluate non-constant expressions in the return-value register, e.g. EAX/RAX on x86-64. (This is actually true for GCC across architectures, not just x86-64). This actually gets abused on codegolf.SE answers; apparently some people would rather golf in gcc -O0 as a language than ANSI C. See this "C golfing tips" answer and the comments on it, and this SO Q&A about why i=j inside a function putting a value in RAX. Note that it only works when GCC has to load a value into registers, not just do a memory-destination increment like add dword ptr [rbp-4], 1 for x++ or whatever.
In your case (with your code compiled by GCC10.2 on the Godbolt compiler explorer)
int y=100; stores 100 directly to stack memory (the way GCC compiles your code).
int a = swap(y); loads y into EAX (for no apparent reason), then copies to EDI to pass as an arg to swap. Since GCC's asm for swap doesn't touch EAX, after the call, EAX=y, so effectively the function returns y.
But if you call it with swap(100), GCC doesn't end up putting 100 into EAX while setting up the args.
The way GCC compiles your swap, the asm doesn't touch EAX, so whatever main left there is treated as the return value.
main:
...
mov DWORD PTR [rbp-4], 100 # y=100
mov eax, DWORD PTR [rbp-4] # load y into EAX
mov edi, eax # copy it to EDI (first arg-passing reg)
call swap # swap(y)
mov DWORD PTR [rbp-8], eax # a = EAX as the retval = y
...
But with your other main:
main:
... # nothing that touches EAX
mov edi, 100
call swap
mov DWORD PTR [rbp-4], eax # a = whatever garbage was there on entry to main
...
(The later ... reloads a as an arg for printf, matching the ISO C semantics because GCC -O0 compiles each C statement to a separate block of asm; thus the later ones aren't affected by the earlier UB (unlike in the general case with optimization enabled), so do just print whatever's in a's memory location.)
The swap function compiles like this (again, GCC10.2 -O0):
swap:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-4], 20
nop
pop rbp
ret
Keep in mind none of this has anything to do with valid portable C. This (using garbage left in memory or registers) one of the kinds of things you see in practice from C that invokes undefined behaviour, but certainly not the only thing. See also What Every C Programmer Should Know About Undefined Behavior from the LLVM blog.
This answer is just answering the literal question of what exactly happened in asm. (I'm assuming un-optimized GCC because that easily explains the result, and x86-64 because that's a common ISA, especially when people forget to mention any ISA.)
Other compilers are different, and GCC will be different if you enable optimization.
You need to use return or use pointer.
Using return function.
#include <stdio.h>
int swap () {
return 20;
}
int main(void){
int a = swap(100);
printf ("Value: %d", a);
return 0;
}
Using pointer function.
#include <stdio.h>
int swap (int* x) {
(*x) = 20;
}
int main(void){
int a;
swap(&a);
printf ("Value: %d", a);
return 0;
}
Example:
a : ++i;
b : i++;
c : i += 1;
d : i = i + 1;
Assuming each of them abcd are called completely simultaneous, which one of them will be performed first ?
Using gcc 5.2 to compile this program:
#include<stdio.h>
int main()
{
int i = 0;
++i;
i++;
i += 1;
i = i + 1;
return 0;
}
It gives this ASM:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0
add DWORD PTR [rbp-4], 1 #++i
add DWORD PTR [rbp-4], 1 #i++
add DWORD PTR [rbp-4], 1 #i += 1
add DWORD PTR [rbp-4], 1 #i = i + 1
mov eax, 0
pop rbp
ret
Which means that with gcc 5.2 it's the exact same speed of execution.
It's seems to be the very same for version from 4.4.7 to 5.2.
In this particular example all four expressions have the exact same externally observable result so a competent compiler should generate the exact same code for them.
The compiler doesn't slavishly read the code and generate a few instructions for each statement, the compiler reasons about what the result of the code should be according to the standard and generates the code needed for a whole program to behave as required. Therefore asking performance questions about single statements is almost always meaningless. Let me show an example:
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
Which one is faster? Function foo or bar? Many would say "of course multiplication is slower", but most likely the answer is: both are equally fast because a very simple dead store optimization will see that nothing uses i, so there's no need to compute it, so the compiler can optimize the functions down to nothing. Let's try it:
$ cat > foo.c
void foo(unsigned int a, unsigned int b) { unsigned int i = a * b; }
void bar(unsigned int a, unsigned int b) { unsigned int i = a + b; }
$ cc -S -fomit-frame-pointer -O2 foo.c
$ cat foo.s
[... I edited out irrelevant spam to make this more readable ...]
_foo: ## #foo
retq
_bar: ## #bar
retq
The only instruction in both functions is retq which just returns from the function.
Modern compilers are smart enough to optimize all of the four cases to improve the performance.
You should note that in the last expression i = i+1, i will be evaluated twice.
In Programming, Unary operators are having higher priority than the other operators. Unary Operators are executed before the execution of the other operators. Pre and Post Increment operators are the examples of the Unary operators while c and d are binary operators, hence executed later.Also c is just the short-hand notation for d, hence both take the same time and from a and b, a is executed earlier than b as post increment is faster than pre increment.
I hope this answer helps.
I was reading some answers and questions on here and kept coming up with this suggestion but I noticed no one ever actually explained "exactly" what you need to do to do it, On Windows using Intel and GCC compiler. Commented below is exactly what I am trying to do.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
//assembly code begin
/*
push x into stack; < Need Help
x=y; < With This
pop stack into y; < Please
*/
//assembly code end
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
You can't just push/pop safely from inline asm, if it's going to be portable to systems with a red-zone. That includes every non-Windows x86-64 platform. (There's no way to tell gcc you want to clobber it). Well, you could add rsp, -128 first to skip past the red-zone before pushing/popping anything, then restore it later. But then you can't use an "m" constraints, because the compiler might use RSP-relative addressing with offsets that assume RSP hasn't been modified.
But really this is a ridiculous thing to be doing in inline asm.
Here's how you use inline-asm to swap two C variables:
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm("" // no actual instructions.
: "=r"(y), "=r"(x) // request both outputs in the compiler's choice of register
: "0"(x), "1"(y) // matching constraints: request each input in the same register as the other output
);
// apparently "=m" doesn't compile: you can't use a matching constraint on a memory operand
printf("x=%d,y=%d\n",x,y);
// getchar(); // Set up your terminal not to close after the program exits if you want similar behaviour: don't embed it into your programs
return 0;
}
gcc -O3 output (targeting the x86-64 System V ABI, not Windows) from the Godbolt compiler explorer:
.section .rodata
.LC0:
.string "x=%d,y=%d"
.section .text
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov edx, 1
mov esi, 2
#APP
# 8 "/tmp/gcc-explorer-compiler116814-16347-5i3lz1/example.cpp" 1
# I used "\n" instead of just "" so we could see exactly where our inline-asm code ended up.
# 0 "" 2
#NO_APP
call printf
xor eax, eax
add rsp, 8
ret
C variables are a high level concept; it doesn't cost anything to decide that the same registers now logically hold different named variables, instead of swapping the register contents without changing the varname->register mapping.
When hand-writing asm, use comments to keep track of the current logical meaning of different registers, or parts of a vector register.
The inline-asm didn't lead to any extra instructions outside the inline-asm block either, so it's perfectly efficient in this case. Still, the compiler can't see through it, and doesn't know that the values are still 1 and 2, so further constant-propagation would be defeated. https://gcc.gnu.org/wiki/DontUseInlineAsm
#include <stdio.h>
int main()
{
int x=1;
int y=2;
printf("x::%d,y::%d\n",x,y);
__asm__( "movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y)
:"r"(x)
:"%eax"
);
printf("x::%d,y::%d\n",x,y);
return 0;
}
/* Load x to eax
Load eax to y */
If you want to exchange the values, it can also be done using this way. Please note that this instructs GCC to take care of the clobbered EAX register. For educational purposes, it is okay, but I find it more suitable to leave micro-optimizations to the compiler.
You can use extended inline assembly. It is a compiler feature whicg allows you to write assembly instructions within your C code. A good reference for inline gcc assembly is available here.
The following code copies the value of x into y using pop and push instructions.
( compiled and tested using gcc on x86_64 )
This is only safe if compiled with -mno-red-zone, or if you subtract 128 from RSP before pushing anything. It will happen to work without problems in some functions: testing with one set of surrounding code is not sufficient to verify the correctness of something you did with GNU C inline asm.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm volatile (
"pushq %%rax\n" /* Push x into the stack */
"movq %%rbx, %%rax\n" /* Copy y into x */
"popq %%rbx\n" /* Pop x into y */
: "=b"(y), "=a"(x) /* OUTPUT values */
: "a"(x), "b"(y) /* INPUT values */
: /*No need for the clobber list, since the compiler knows
which registers have been modified */
);
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
Result x=2 y=1, as you expected.
The intel compiler works in a similar way, I think you have just to change the keyword asm to __asm__. You can find info about inline assembly for the INTEL compiler here.
For the sake of curiosity I'm trying to read the flag register and print it out in a nice way.
I've tried reading it using gcc's asm keyword, but i can't get it to work. Any hints how to do it? I'm running a Intel Core 2 Duo and Mac OS X. The following code is what I have. I hoped it would tell me if an overflow happened:
#include <stdio.h>
int main (void){
int a=10, b=0, bold=0;
printf("%d\n",b);
while(1){
a++;
__asm__ ("pushf\n\t"
"movl 4(%%esp), %%eax\n\t"
"movl %%eax , %0\n\t"
:"=r"(b)
:
:"%eax"
);
if(b!=bold){
printf("register changed \n %d\t to\t %d",bold , b);
}
bold = b;
}
}
This gives a segmentation fault. When I run gdb on it I get this:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000005fbfee5c
0x0000000100000eaf in main () at asm.c:9
9 asm ("pushf \n\t"
You can use the PUSHF/PUSHFD/PUSHFQ instruction (see http://siyobik.info/main/reference/instruction/PUSHF%2FPUSHFD for details) to push the flag register onto the stack. From there on you can interpret it in C. Otherwise you can test directly (against the carry flag for unsigned arithmetic or the overflow flag for signed arithmetic) and branch.
(to be specific, to test for the overflow bit you can use JO (jump if set) and JNO (jump if not set) to branch -- it's bit #11 (0-based) in the register)
About the EFLAGS bit layout: http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture#EFLAGS_Register
A very crude Visual C syntax test (just wham-bam / some jumps to debug flow), since I don't know about the GCC syntax:
int test2 = 2147483647; // max 32-bit signed int (0x7fffffff)
unsigned int flags_w_overflow, flags_wo_overflow;
__asm
{
mov ebx, test2 // ebx = test value
// test for no overflow
xor eax, eax // eax = 0
add eax, ebx // add ebx
jno no_overflow // jump if no overflow
testoverflow:
// test for overflow
xor ecx, ecx // ecx = 0
inc ecx // ecx = 1
add ecx, ebx // overflow!
pushfd // store flags (32 bits)
jo overflow // jump if overflow
jmp done // jump if not overflown :(
no_overflow:
pushfd // store flags (32 bits)
pop edx // edx = flags w/o overflow
jmp testoverflow // back to next test
overflow:
jmp done // yeah we're done here :)
done:
pop eax // eax = flags w/overflow
mov flags_w_overflow, eax // store
mov flags_wo_overflow, edx // store
}
if (flags_w_overflow & (1 << 11)) __asm int 0x3 // overflow bit set correctly
if (flags_wo_overflow & (1 << 11)) __asm int 0x3 // overflow bit set incorrectly
return 0;
This maybe the case of the XY problem. To check for overflow you do not need to get the hardware overflow flag as you think because the flag can be calculated easily from the sign bits
An illustrative example is what happens if we add 127 and 127 using 8-bit registers. 127+127 is 254, but using 8-bit arithmetic the result would be 1111 1110 binary, which is -2 in two's complement, and thus negative. A negative result out of positive operands (or vice versa) is an overflow. The overflow flag would then be set so the program can be aware of the problem and mitigate this or signal an error. The overflow flag is thus set when the most significant bit (here considered the sign bit) is changed by adding two numbers with the same sign (or subtracting two numbers with opposite signs). Overflow never occurs when the sign of two addition operands are different (or the sign of two subtraction operands are the same).
Internally, the overflow flag is usually generated by an exclusive or of the internal carry into and out of the sign bit. As the sign bit is the same as the most significant bit of a number considered unsigned, the overflow flag is "meaningless" and normally ignored when unsigned numbers are added or subtracted.
https://en.wikipedia.org/wiki/Overflow_flag
So the C implementation is
int add(int a, int b, int* overflowed)
{
// do an unsigned addition since to prevent UB due to signed overflow
unsigned int r = (unsigned int)a + (unsigned int)b;
// if a and b have the same sign and the result's sign is different from a and b
// then the addition was overflowed
*overflowed = !!((~(a ^ b) & (a ^ r)) & 0x80000000);
return (int)r;
}
This way it works portably on any architectures, unlike your solution which only works on x86. Smart compilers may recognize the pattern and change to using the overflow flag if possible. On most RISC architectures like MIPS or RISC-V there is no flag and all signed/unsigned overflow must be checked in software by analyzing the sign bits like that
Some compilers have intrinsics for checking overflow like __builtin_add_overflow in Clang and GCC. And with that intrinsic you can also easily see how the overflow is calculated on non-flag architectures. For example on ARM it's done like this
add w3, w0, w1 # r = a + b
eon w0, w0, w1 # a = a ^ ~b
eor w1, w3, w1 # b = b ^ r
str w3, [x2] # store sum ([x2] = r)
and w0, w1, w0 # a = a & b = (a ^ ~b) & (b ^ r)
lsr w0, w0, 31 # overflowed = a >> 31
ret
which is just a variation of what I've written above
See also
Checking overflow in C
Detecting signed overflow in C/C++
Is it possible to access the overflow flag register in a CPU with C++?
Very detailed explanation of Overflow and Carry flags evaluation techniques
For unsigned int it's much easier
unsigned int a, b, result = a + b;
int overflowed = (result < a);
The compiler can reorder instructions, so you cannot rely on your lahf being next to the increment. In fact, there may not be an increment at all. In your code, you don't use the value of a, so the compiler can completely optimize it out.
So, either write the increment + check in assembler, or write it in C.
Also, lahf loads only ah (8 bits) from eflags, and the Overflow flag is outside of that. Better use pushf; pop %eax.
Some tests:
#include <stdio.h>
int main (void){
int a=2147483640, b=0, bold=0;
printf("%d\n",b);
while(1){
a++;
__asm__ __volatile__ ("pushf \n\t"
"pop %%eax\n\t"
"movl %%eax, %0\n\t"
:"=r"(b)
:
:"%eax"
);
if((b & 0x800) != (bold & 0x800)){
printf("register changed \n %x\t to\t %x\n",bold , b);
}
bold = b;
}
}
$ gcc -Wall -o ex2 ex2.c
$ ./ex2 # Works by sheer luck
0
register changed
200206 to 200a96
register changed
200a96 to 200282
$ gcc -Wall -O -o ex2 ex2.c
$ ./ex2 # Doesn't work, the compiler hasn't even optimized yet!
0
You can't assume anything about how GCC implemented the a++ operation, or whether it even did the computation before your inline asm, or before a function call.
You could make a an (unused) input to your inline asm, but gcc could still have chosen to use lea to copy-and-add instead of inc or add, or constant-propagation after inlining could have turned it into a mov-immediate.
And of course gcc could have done some other computation that writes FLAGS right before your inline asm.
There is no way to make a++; asm(...) safe for this
Stop now, you're on the wrong track. If you insist on using asm, you need to do the add or inc inside the asm so you can read the flags output. If you only care about the overflow flag, use SETCC, specifically seto %0, to create an 8-bit output value. Or better, use GCC6 flag-output syntax to tell the compiler that a boolean output result is in the OF condition in FLAGS at the end of your inline asm.
Also, signed overflow in C is undefined behaviour, so actually causing overflow in a++ is already a bug. It usually won't manifest itself if you somehow detect it after the fact, but if you use a as an array index or something gcc may have widened it to 64-bit to avoid redoing sign-extension.
GCC has builtins for add with overflow detection, since gcc5
There are builtins for signed/unsigned add, sub, and mul, see the GCC manual, that avoid signed-overflow UB and tell you if there was overflow.
bool __builtin_add_overflow (type1 a, type2 b, type3 *res) is the generic version
bool __builtin_sadd_overflow (int a, int b, int *res) is the signed int version
bool __builtin_saddll_overflow (long long int a, long long int b, long long int *res) is the signed 64-bit long long version.
The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc.
There's a saddl version in case you want the operation for whatever size long is on the target platform. (For x86-64 gcc, int is always 32-bit, long long is always 64-bit, but long depends on Windows vs. non-Windows. For platforms like AVR, int would be 16-bit, and only long would be 32-bit.)
int checked_add_int(int a, int b, bool *of) {
int result;
*of = __builtin_sadd_overflow(a, b, &result);
return result;
}
compiles with gcc -O3 for x86-64 System V to this asm, on Godbolt
checked_add_int:
mov eax, edi
add eax, esi # can't use the normal lea eax, [rdi+rsi]
seto BYTE PTR [rdx]
and BYTE PTR [rdx], 1 # silly compiler, it's already 0/1
ret
ICC19 uses setcc into an integer register and then stores that, same difference as far as uops, but worse code-size.
After inlining to a caller that did if(of) {} it should just jo or jno instead of actually using setcc to create an integer 0/1; in general this should inline efficiently.
Also, since gcc7, there's a builtin to ask if an addition (after promotion to a given type) would overflow, without returning the value.
#include <stdbool.h>
int overflows(int a, int b) {
bool of = __builtin_add_overflow_p(a, b, (int)0);
return of;
}
compiles with gcc -O3 for x86-64 System V to this asm, also on Godbolt
overflows:
xor eax, eax
add edi, esi
seto al
ret
See also Detecting signed overflow in C/C++
Others have offered good alternate code and reasons why what you're trying to do probably doesn't give the result you want, but the actual bug in your code is that you corrupted the stack state by pushing without popping. I would rewrite the asm as:
pushf
pop %0
Or you could just add $4,%%esp at the end of your asm to fix the stack pointer if you prefer the inefficient way.
The following C program will read the FLAGS register when compiled with GCC and any x86 or x86_64 machine following a calling convention in which integers are returned to %eax. You may need to pass the -zexecstack argument to the compiler.
#include<stdio.h>
#include<stdlib.h>
int(*f)()=(void*)L"\xc3589c";
int main( int argc, char **argv ) {
if( argc < 3 ) {
printf( "Usage: %s <augend> <addend>\n", *argv );
return 0;
}
int a=atoi(argv[1])+atoi(argv[2]);
int b=f();
printf("%d CF %d PF %d AF %d ZF %d SF %d TF %d IF %d DF %d OF %d IOPL %d NT %d RF %d VM %d AC %d VIF %d VIP %d ID %d\n", a, b&1, b/4&1, b>>4&1, b>>6&1, b>>7&1, b>>8&1, b>>9&1, b>>10&1, b>>11&1, b>>12&3, b>>14&1, b>>16&1, b>>17&1, b>>18&1, b>>19&1, b>>20&1, b>>21&1 );
}
Try it online!
The funny looking string literal disassembles to
0x0000000000000000: 9C pushfq
0x0000000000000001: 58 pop rax
0x0000000000000002: C3 ret