Understand assembly code in c

Understand assembly code in c - c

I'm reading some C code embedded with a few assembly code. I understand that __asm__ is a statement to run assembly code, but what does __asm__ do in the following code? According to the output (i.e., r = 16), it seems that __asm__ does not effect the variable r. Isn't it?
#include <stdio.h>
static void foo()
{
static volatile unsigned int r __asm__ ("0x0019");
r |= 1 << 4;
printf("foo: %u\n", r);
}
Platform: Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) on OSX Yosemite

Strictly speaking, your "asm" snippet simply loads a constant (0x0019).
Here's a 32-bit example:
#include <stdio.h>
static void foo()
{
static volatile unsigned int r __asm__ ("0x0019");
static volatile unsigned int s __asm__ ("0x1122");
static volatile unsigned int t = 0x3344;
printf("foo: %u %u %u\n", r, s, t);
}
gcc -O0 -S x.c
cat x.c
.file "x.c"
.data
.align 4
.type t.1781, #object
.size t.1781, 4
t.1781:
.long 13124 # Note: 13124 decimal == 0x3344 hex
.local 0x1122
.comm 0x1122,4,4
.local 0x0019
.comm 0x0019,4,4
.section .rodata
.LC0:
.string "foo: %u %u %u\n"
.text
.type foo, #function
foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl t.1781, %eax
movl 0x1122, %edx
movl 0x0019, %ecx
movl %eax, 12(%esp)
movl %edx, 8(%esp)
movl %ecx, 4(%esp)
movl $.LC0, (%esp)
call printf
leave
ret
PS:
The "asm" syntax is applicable to all gcc-based compilers.
PPS:
I absolutely encourage you to experiment with assembly anywhere you please: embedded systems, Ubuntu, Mac OSX - whatever pleases you.
Here is an excellent book. It's about Linux, but it's also very largely applicable to your OSX:
Programming from the Ground Up, Jonathan Bartlett
Also:
https://www.hackerschool.com/blog/7-understanding-c-by-learning-assembly
http://fabiensanglard.net/macosxassembly/
PPS:
x86 assembly syntax comes in two variants: "Intel" and "ATT" syntax. Gcc uses ATT. The ATT syntax is also applicable for any other architecture supported by GCC (MIPS, PPC, etc etc). I encourage you to start off with ATT syntax ("gcc/gas"), rather than Intel ("nasm").

Related

GCC and the Multiply Instruction

I am using GCC in 32-bit mode on a Windows 7 machine under cygwin. I have the following function:
unsigned f1(unsigned x, unsigned y)
{
return x*y;
}
I want the code to do an unsigned multiply and as such I would expect it to generate the mul instruction, not the imul instruction. I compile the program
with the following command:
gcc -m32 -S t4.c
The generated assembly code is:
.file "t4.c"
.text
.globl _f1
.def _f1; .scl 2; .type 32; .endef
_f1:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
imull 12(%ebp), %eax
popl %ebp
ret
.ident "GCC: (GNU) 4.8.2"
I believe that the generated code has the wrong multiply instruction in it but I find it hard to believe that GCC has such a simple bug. Please comment.

The compiler relies on the "as-if" rule: No standard conforming program can detect a difference between what this program does and what the program should do, since the lowest 32 bits of the result are the same for both instructions.

Editing ASM result of an operation in C when compiling in GCC

me and my friend got a computer architecture project and we don't really know how to get to it. I hope you could at least point us in the right direction so we know what to look for. As our professor isn't really good at explaining what we really need to do and the subject is rather vague we'll start from the beginning.
Our task is to somehow "edit" GCC to treat some operations differently. For example when you add two char arguments in a .c program it uses addb. We need to change it to f.e. 16bit registers(addl), without using unnecessary parameters during compilation(just regular gcc p.c -o p). Why or will it work doesn't really matter at this point.
We'd like to know how we could change something inside GCC, where we can even start looking as I can't find any information about similar tasks besides making plugins/extensions. Is there anything we could read about something like this or anything we could use?

In C 'char' variables are normally added together as integers so the C compiler will already use addl. Except when it can see that it makes no difference to the result to use a smaller or faster form.
For example this C code
unsigned char a, b, c;
int i;
void func1(void) { a = b + c; }
void func2(void) { i = b + c; }
Gives this assembler for GCC.
.file "xq.c"
.text
.p2align 4,,15
.globl func1
.type func1, #function
func1:
movzbl c, %eax
addb b, %al
movb %al, a
ret
.size func1, .-func1
.p2align 4,,15
.globl func2
.type func2, #function
func2:
movzbl b, %edx
movzbl c, %eax
addl %edx, %eax
movl %eax, i
ret
.size func2, .-func2
.comm i,4,4
.comm c,1,4
.comm b,1,4
.comm a,1,4
.ident "GCC: (Debian 4.7.2-5) 4.7.2"
.section .note.GNU-stack,"",#progbits
Note that the first function uses addb but the second uses addl because the high bits of the result will be discarded in the first function when the result is stored.
This version of GCC is generating i686 code so the integers are 32bit (addl) depending on exactly what you want you may need to make the result a short or actually get a compiler version that outputs 16bit 8086 code.

How to tell if a function argument is an immediate?

I have an inline function definition that wraps an inline assembly. I wish to choose
different inline assembly implementation based on the fact whether or not the argument
is known in build time or not.
My question is how to ask in C code or inline assembly whether an address value is an known at build time and therefore fit to be an immediate value. If you are thinking __builtin_constant_p - please do read ahead.
Here is a bit of code that illustrates my intent:. I am trying to find a way to implement "is_immediate".
static char arr[5];
void __attribute__((always_inline)) do_something(char * buf)
{
if(is_immediate(buf) {
// Argument is constant, can use immediate form
asm volatile ("insn1 %0" : : "i"(buf));
} else {
// Argument is computed at runtime, use a register
unsigned long tmp = (unsigned long)buf + 1;
asm volatile("insn2 %0" : : "r"(tmp));
}
int main(void)
{
do_something(&arr);
}
At first impression __builtin_constant_p() seems like it is exactly the right bit of magic needed, except it does not work.
The reason is does not work is that while the address of the array will be known after the linker has placed the array in memory and chosen an address for it (and so it does fit the immediate constraint of the inline assembly), it is not known at compile time before the link.
So, what I am looking for is a way to ask - "is this variable fit to be an immediate value?" rather then "is this a constant expression?".

If I have a look on what is produced by my x86 compiler (where this insn doesn't make sense after all), it is here:
static char arr[5];
void __attribute__((always_inline)) do_something(char * buf)
{
// Argument is constant, can use immediate form
asm volatile ("insn1 %0" : : "r"(buf));
unsigned long tmp = (unsigned long)buf;
asm volatile("insn2 %0" : : "r"(tmp));
}
int main(void)
{
char brr[5];
do_something(arr);
do_something(brr);
}
comes to
.file "inl.c"
.text
.globl do_something
.type do_something, #function
do_something:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
.size do_something, .-do_something
.globl main
.type main, #function
main:
pushl %ebp
movl $arr, %eax
movl %esp, %ebp
subl $16, %esp
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
leal -5(%ebp), %eax
#APP
# 8 "inl.c" 1
insn1 %eax
# 0 "" 2
# 12 "inl.c" 1
insn2 %eax
# 0 "" 2
#NO_APP
leave
ret
.size main, .-main
.local arr
.comm arr,5,4
.ident "GCC: (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585]"
.section .comment.SUSE.OPTs,"MS",#progbits,1
.string "OSpwg"
.section .note.GNU-stack,"",#progbits
so I think it could make sense to be immediate as $arr is probably immediate (while leal -5(%ebp), %eax isn't). Probably the said optimization just isn't supported.
But on the other hand,
movl $arr, %eax
insn1 %eax
is not soo much worse than
insn1 $arr
so that it is still acceptable, IMHO.

Can I list the local variables in a stack by using some system functions in C language

The C function backtrace just returns a series of functions calls for the programn, but i want to list all the locals variables in my programn, just like the info locals in gdb.Any idea if this can be done? Thanks

Generally, no. You should move away from thinking about a "stack" as some sort of god given factum. A call stack is merely a common implementation technique for C. It has no intrinsic meaning or required semantics. Automatic variables ("local variables", as you say) have to behave in a certain way, and sometimes that means that they are written onto the call stack. However, it is entirely conceivable that local variables are never realized in memory at all -- they may instead only ever be stored in a processor register, or eliminated entirely if an equivalent program can be formulated without them.
So, no, there is no language-intrinsic mechanism for enumerating local variables. As you say, the debugger can do so to some extent (depending on debug symbols being present and subject to optimizations); perhaps you can find a library that can process debug symbols from within a running program.

If this is just for occasional debugging, then you can invoke the debugger. However, since the debugger itself will freeze your program, you need an intermediary to capture the output. You can, for example, use system, and redirect the output to a file, then read the file afterwards. In the example below, the file gdbcmds.txt contains the line info locals.
char buf[512];
FILE *gdb;
snprintf(buf, sizeof(buf), "gdb -batch -x gdbcmds.txt -p %d > gdbout.txt",
(int)getpid());
system(buf);
gdb = fopen("gdbout.txt", "r");
while (fgets(buf, sizeof(buf), gdb) != 0) {
printf("%s", buf);
}
fclose(gdb);

First, note that backtrace is not a standard C library function, but a GNU-specific extension.
In general, it's difficult to impossible retrieve local variable information from compiled code, especially if it was compiled without debugging or with optimization enabled. If debugging isn't turned on, variable names and types are generally not preserved in the resulting machine code.
For example, take the following ridiculously simple code:
#include <stdio.h>
#include <math.h>
int main(void)
{
int x = 1, y = 2, z;
z = 2 * y - x;
printf("x = %d, y = %d, z = %d\n", x, y, z);
return 0;
}
Here's the resulting machine code, no debugging or optimization:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $1, -4(%ebp)
movl $2, -8(%ebp)
movl -8(%ebp), %eax
movl %eax, %eax
sall $1, %eax
subl -4(%ebp), %eax
movl %eax, -12(%ebp)
pushl -12(%ebp)
pushl -8(%ebp)
pushl -4(%ebp)
pushl $.LC0
call printf
addl $16, %esp
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
x, y, and z are referred to through -4(%ebp), -8(%ebp), and -12(%ebp) respectively. There's nothing to indicate that they're integers other than the instructions used to perform the arithmetic.
It's even better with optimization (-O1) turned on:
.file "varinfo.c"
.version "01.01"
gcc2_compiled.:
.section .rodata.str1.1,"ams",#progbits,1
.LC0:
.string "x = %d, y = %d, z = %d\n"
.text
.align 4
.globl main
.type main,#function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
pushl $3
pushl $2
pushl $1
pushl $.LC0
call printf
movl $0, %eax
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"
In this case, the compiler was able to do some static analysis and compute the value z at compile time; there's no need to set aside any memory for any of the variables at all, because the compiler already knows what those values have to be.

literal constant vs variable in math library

So, I know that in C you need to link the code to the math library, libm, to be able to use its functions. Today, while I was trying to demonstrate this to a friend, and explain why you need to do this, I came across the following situation that I do not understand.
Consider the following code:
#include <math.h>
#include <stdio.h>
/* #define VARIABLE */
int main(void)
{
#ifdef VARIABLE
double a = 2.0;
double b = sqrt(a);
printf("b = %lf\n",b);
#else
double b = sqrt(2.0);
printf("b = %lf\n",b);
#endif
return 0;
}
If VARIABLE is defined, you need to link against libm as you would normally expect; otherwise you get the usual main.c:(.text+0x29): undefined reference to sqrt linking error indicating that the compiler cannot find the definition for the function sqrt. I was surprised to see that if I comment #define VARIABLE, the code runs fine and the result is correct!
Why is it that I need to link to libm when variables are used but I don't need to do so when literal constants are used? How does the compiler find the definition of sqrt when the library is not linked? I'm using gcc 4.4.5 under linux.

GCC can do constant folding for several standard-library functions. Obviously, if the function is folded at compile-time, there is no need for a run-time function call, so no need to link to libm. You could confirm this by taking a looking at the assembler that the compiler produces (using objdump or similar).
I guess these optimizations are only triggered when the argument is a constant expression.

As everyone mentions, yes it has to do with constant folding.
With optimizations off, GCC only seems to do it when sqrt(2.0) is used. Here's the evidence:
Case 1: With the variable.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
fldl 24(%esp)
fsqrt
fucom %st(0)
fnstsw %ax
sahf
jp .L5
je .L2
fstp %st(0)
jmp .L4
.L5:
fstp %st(0)
.L4:
fldl 24(%esp)
fstpl (%esp)
call sqrt
.L2:
fstpl 16(%esp)
movl $.LC1, %eax
fldl 16(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 0
.long 1073741824
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
You can see that it emits a call to the sqrt function. So you'll get a linker error if you don't link the math library.
Case 2: With the Literal.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
movl $.LC1, %eax
fldl 24(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 1719614413
.long 1073127582
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
There's no call to sqrt. Hence no linker error.
With optimizations on, GCC will do constant propagation in both cases. So no linker error in either case.
$ gcc main.c -save-temps
main.o: In function `main':
main.c:(.text+0x30): undefined reference to `sqrt'
collect2: ld returned 1 exit status
$ gcc main.c -save-temps -O2
$

I think it GCC uses its builtin. I compiled your code with: -fno-builtin-sqrt and got the expected linker error.
The ISO C90 functions ... sin, sprintf, sqrt ... are all
recognized as built-in functions unless -fno-builtin is specified

That's because gcc is clever enough to figure out that the square root of the constant 2 is also a constant, so it just generates code like:
mov register, whatever-the-square-root-of-2-is
Hence no need to do a square root calculation at run time, gcc has already done it at compile time.
This is akin to a benchmarking program which does bucketloads of calculations then does nothing with the result:
int main (void) {
// do something rather strenuous
return 0;
}
You're likely (at high optimisation levels) to see all the do something rather strenuous code optimised out of existence.
The gcc docs have a whole page dedicated to these built-ins here and the relevant section in that page for sqrt and others is:
The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
So, quite a lot, really :-)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Understand assembly code in c - c

Related

GCC and the Multiply Instruction

Editing ASM result of an operation in C when compiling in GCC

How to tell if a function argument is an immediate?

Can I list the local variables in a stack by using some system functions in C language

literal constant vs variable in math library

Categories

Resources