me and my friend got a computer architecture project and we don't really know how to get to it. I hope you could at least point us in the right direction so we know what to look for. As our professor isn't really good at explaining what we really need to do and the subject is rather vague we'll start from the beginning.
Our task is to somehow "edit" GCC to treat some operations differently. For example when you add two char arguments in a .c program it uses addb. We need to change it to f.e. 16bit registers(addl), without using unnecessary parameters during compilation(just regular gcc p.c -o p). Why or will it work doesn't really matter at this point.
We'd like to know how we could change something inside GCC, where we can even start looking as I can't find any information about similar tasks besides making plugins/extensions. Is there anything we could read about something like this or anything we could use?
In C 'char' variables are normally added together as integers so the C compiler will already use addl. Except when it can see that it makes no difference to the result to use a smaller or faster form.
For example this C code
unsigned char a, b, c;
int i;
void func1(void) { a = b + c; }
void func2(void) { i = b + c; }
Gives this assembler for GCC.
.file "xq.c"
.text
.p2align 4,,15
.globl func1
.type func1, #function
func1:
movzbl c, %eax
addb b, %al
movb %al, a
ret
.size func1, .-func1
.p2align 4,,15
.globl func2
.type func2, #function
func2:
movzbl b, %edx
movzbl c, %eax
addl %edx, %eax
movl %eax, i
ret
.size func2, .-func2
.comm i,4,4
.comm c,1,4
.comm b,1,4
.comm a,1,4
.ident "GCC: (Debian 4.7.2-5) 4.7.2"
.section .note.GNU-stack,"",#progbits
Note that the first function uses addb but the second uses addl because the high bits of the result will be discarded in the first function when the result is stored.
This version of GCC is generating i686 code so the integers are 32bit (addl) depending on exactly what you want you may need to make the result a short or actually get a compiler version that outputs 16bit 8086 code.
Related
I have this short hello world program:
#include <stdio.h>
static const char* msg = "Hello world";
int main(){
printf("%s\n", msg);
return 0;
}
I compiled it into the following assembly code with gcc:
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world"
.data
.align 4
.type msg, #object
.size msg, 4
msg:
.long .LC0
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl msg, %eax
movl %eax, (%esp)
call puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don't understand. For instance, I don't know what .cfi* is, and I'm wondering if I would need to include this to write this program in assembly.
The absolute bare minimum that will work on the platform that this appears to be, is
.globl main
main:
pushl $.LC0
call puts
addl $4, %esp
xorl %eax, %eax
ret
.LC0:
.string "Hello world"
But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is
.globl main
.type main, #function
main:
subl $24, %esp
pushl $.LC0
call puts
xorl %eax, %eax
addl $28, %esp
ret
.size main, .-main
.section .rodata
.LC0:
.string "Hello world"
Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.
The .cfi_* directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.
The line
.section .note.GNU-stack,"",#progbits
is also important to know about if you are writing assembly language by hand; it is another optional annotation, but a valuable one, because what it means is "nothing in this object file requires the stack to be executable." If all the object files in a program have this annotation, the kernel won't make the stack executable, which improves security a little bit.
(To indicate that you do need the stack to be executable, you put "x" instead of "". GCC may do this if you use its "nested function" extension. (Don't do that.))
It is probably worth mentioning that in the "AT&T" assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line
with a single token on it, ending in a colon, is a label. (I don't remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.
related: How to remove "noise" from GCC/clang assembly output? The .cfi directives are not directly useful to you, and the program would work without them. (It's stack-unwind info needed for exception handling and backtraces, so -fomit-frame-pointer can be enabled by default. And yes, gcc emits this even for C.)
As far as the number of asm source lines needed to produce a value Hello World program, obviously we want to use libc functions to do more work for us.
#Zwol's answer has the shortest implementation of your original C code.
Here's what you could do by hand, if you don't care about the exit status of your program, just that it prints your string.
# Hand-optimized asm, not compiler output
.globl main # necessary for the linker to see this symbol
main:
# main gets two args: argv and argc, so we know we can modify 8 bytes above our return address.
movl $.LC0, 4(%esp) # replace our first arg with the string
jmp puts # tail-call puts.
# you would normally put the string in .rodata, not leave it in .text where the linker will mix it with other functions.
.section .rodata
.LC0:
.asciz "Hello world" # asciz zero-terminates
The equivalent C (you just asked for the shortest Hello World, not one that had identical semantics):
int main(int argc, char **argv) {
return puts("Hello world");
}
Its exit status is implementation-defined but it definitely prints. puts(3) returns "a non-negative number", which could be outside the 0..255 range, so we can't say anything about the program's exit status being 0 / non-zero in Linux (where the process's exit status is the low 8 bits of the integer passed to the exit_group() system call (in this case by the CRT startup code that called main()).
Using JMP to implement the tail-call is a standard practice, and commonly used when a function doesn't need to do anything after another function returns. puts() will eventually return to the function that called main(), just like if puts() had returned to main() and then main() had returned. main()'s caller still has to deal with the args it put on the stack for main(), because they're still there (but modified, and we're allowed to do that).
gcc and clang don't generate code that modifies arg-passing space on the stack. It is perfectly safe and ABI-compliant, though: functions "own" their args on the stack, even if they were const. If you call a function, you can't assume that the args you put on the stack are still there. To make another call with the same or similar args, you need to store them all again.
Also note that this calls puts() with the same stack alignment that we had on entry to main(), so again we're ABI-compliant in preserving the 16B alignment required by modern version of the x86-32 aka i386 System V ABI (used by Linux).
.string zero-terminates strings, same as .asciz, but I had to look it up to check. I'd recommend just using .ascii or .asciz to make sure you're clear on whether your data has a terminating byte or not. (You don't need one if you use it with explicit-length functions like write())
In the x86-64 System V ABI (and Windows), args are passed in registers. This makes tail-call optimization a lot easier, because you can rearrange args or pass more args (as long as you don't run out of registers). This makes compilers willing to do it in practice. (Because as I said, they currently don't like to generate code that modifies the incoming arg space on the stack, even though the ABI is clear that they're allowed to, and compiler generated functions do assume that callees clobber their stack args.)
clang or gcc -O3 will do this optimization for x86-64, as you can see on the Godbolt compiler explorer:
#include <stdio.h>
int main() { return puts("Hello World"); }
# clang -O3 output
main: # #main
movl $.L.str, %edi
jmp puts # TAILCALL
# Godbolt strips out comment-only lines and directives; there's actually a .section .rodata before this
.L.str:
.asciz "Hello World"
Static data addresses always fit in the low 31 bits of address-space, and executable don't need position-independent code, otherwise the mov would be lea .LC0(%rip), %rdi. (You'll get this from gcc if it was configured with --enable-default-pie to make position-independent executables.)
How to load address of function or label into register in GNU Assembler
Hello World using 32-bit x86 Linux int 0x80 system calls directly, no libc
See Hello, world in assembly language with Linux system calls? My answer there was originally written for SO Docs, then moved here as a place to put it when SO Docs closed down. It didn't really belong here so I moved it to another question.
related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.
I am using GCC in 32-bit mode on a Windows 7 machine under cygwin. I have the following function:
unsigned f1(unsigned x, unsigned y)
{
return x*y;
}
I want the code to do an unsigned multiply and as such I would expect it to generate the mul instruction, not the imul instruction. I compile the program
with the following command:
gcc -m32 -S t4.c
The generated assembly code is:
.file "t4.c"
.text
.globl _f1
.def _f1; .scl 2; .type 32; .endef
_f1:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
imull 12(%ebp), %eax
popl %ebp
ret
.ident "GCC: (GNU) 4.8.2"
I believe that the generated code has the wrong multiply instruction in it but I find it hard to believe that GCC has such a simple bug. Please comment.
The compiler relies on the "as-if" rule: No standard conforming program can detect a difference between what this program does and what the program should do, since the lowest 32 bits of the result are the same for both instructions.
I am new to this assembly scene and I was reading a tutorial on creating a simple OS here: http://mikeos.berlios.de/write-your-own-os.html since I really do not want to figure out assembly is there a way that in the middle of the assembly code that it could execute C code.
If you could please give step by step directions on how this can be executed since I am a beginner. I am running Ubuntu 13.10 if that helps at all.
Edit
I apoligise for not being clear. What I would like to do is in the middle of an assembly program, for it to stop executing the assembly commands and instead to start running a C file that has the actual code that I would like to use on my OS such as the scanf and printf functions.
I am running into some trouble. If you could provide a link to a tutorial that just gets me to the point where I have a hello world OS. I can edit the C code from there and be on my way, thanks!
Ok, step by step instructions, they work on any platform, which is good because you didn't specify what platform you were on:
1) Get a C compiler, in this example I will use gcc.
2) Write a c function that calls the function you want to call:
#include <stdio.h>
void somefunc (void)
{
int i = 0;
printf("%d\n", i);
}
3) compile with gcc -S. Note, if you don't like at&t syntax, you can use gcc -S masm=intel. Without that you end up with the at&t syntax, which is given below.
4) this outputs assembler code with the same name as your source file, with the extension .s, in this case, using 32 bit mingw:
.file "call_func.c"
.section .rdata,"dr"
LC0:
.ascii "%d\12\0"
.text
.globl _somefunc
.def _somefunc; .scl 2; .type 32; .endef
_somefunc:
pushl %ebp
movl %esp, %ebp
subl $40, %esp
movl $0, -12(%ebp)
movl -12(%ebp), %eax
movl %eax, 4(%esp)
movl $LC0, (%esp)
call _printf
leave
ret
.def _printf; .scl 2; .type 32; .endef
5) look at what the compiler did, and do the same thing in your assembly code. In this case, the arguments to printf are pushed on the stack in right to left order, because it is using stdcall on Windows. Also note that c functions start with an underscore, this is typical.
6) get to the point where you don't want to do this anymore, and then look up the calling conventions for your platform, and then you will just know how to do it. Assuming you are on x86 or amd64, a reasonable place to start is the wikipedia page on x86 calling conventions.
So, I know that in C you need to link the code to the math library, libm, to be able to use its functions. Today, while I was trying to demonstrate this to a friend, and explain why you need to do this, I came across the following situation that I do not understand.
Consider the following code:
#include <math.h>
#include <stdio.h>
/* #define VARIABLE */
int main(void)
{
#ifdef VARIABLE
double a = 2.0;
double b = sqrt(a);
printf("b = %lf\n",b);
#else
double b = sqrt(2.0);
printf("b = %lf\n",b);
#endif
return 0;
}
If VARIABLE is defined, you need to link against libm as you would normally expect; otherwise you get the usual main.c:(.text+0x29): undefined reference to sqrt linking error indicating that the compiler cannot find the definition for the function sqrt. I was surprised to see that if I comment #define VARIABLE, the code runs fine and the result is correct!
Why is it that I need to link to libm when variables are used but I don't need to do so when literal constants are used? How does the compiler find the definition of sqrt when the library is not linked? I'm using gcc 4.4.5 under linux.
GCC can do constant folding for several standard-library functions. Obviously, if the function is folded at compile-time, there is no need for a run-time function call, so no need to link to libm. You could confirm this by taking a looking at the assembler that the compiler produces (using objdump or similar).
I guess these optimizations are only triggered when the argument is a constant expression.
As everyone mentions, yes it has to do with constant folding.
With optimizations off, GCC only seems to do it when sqrt(2.0) is used. Here's the evidence:
Case 1: With the variable.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
fldl 24(%esp)
fsqrt
fucom %st(0)
fnstsw %ax
sahf
jp .L5
je .L2
fstp %st(0)
jmp .L4
.L5:
fstp %st(0)
.L4:
fldl 24(%esp)
fstpl (%esp)
call sqrt
.L2:
fstpl 16(%esp)
movl $.LC1, %eax
fldl 16(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 0
.long 1073741824
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
You can see that it emits a call to the sqrt function. So you'll get a linker error if you don't link the math library.
Case 2: With the Literal.
.file "main.c"
.section .rodata
.LC1:
.string "b = %lf\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $32, %esp
fldl .LC0
fstpl 24(%esp)
movl $.LC1, %eax
fldl 24(%esp)
fstpl 4(%esp)
movl %eax, (%esp)
call printf
movl $0, %eax
leave
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long 1719614413
.long 1073127582
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
There's no call to sqrt. Hence no linker error.
With optimizations on, GCC will do constant propagation in both cases. So no linker error in either case.
$ gcc main.c -save-temps
main.o: In function `main':
main.c:(.text+0x30): undefined reference to `sqrt'
collect2: ld returned 1 exit status
$ gcc main.c -save-temps -O2
$
I think it GCC uses its builtin. I compiled your code with: -fno-builtin-sqrt and got the expected linker error.
The ISO C90 functions ... sin, sprintf, sqrt ... are all
recognized as built-in functions unless -fno-builtin is specified
That's because gcc is clever enough to figure out that the square root of the constant 2 is also a constant, so it just generates code like:
mov register, whatever-the-square-root-of-2-is
Hence no need to do a square root calculation at run time, gcc has already done it at compile time.
This is akin to a benchmarking program which does bucketloads of calculations then does nothing with the result:
int main (void) {
// do something rather strenuous
return 0;
}
You're likely (at high optimisation levels) to see all the do something rather strenuous code optimised out of existence.
The gcc docs have a whole page dedicated to these built-ins here and the relevant section in that page for sqrt and others is:
The ISO C90 functions abort, abs, acos, asin, atan2, atan, calloc, ceil, cosh, cos, exit, exp, fabs, floor, fmod, fprintf, fputs, frexp, fscanf, isalnum, isalpha, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, isxdigit, tolower, toupper, labs, ldexp, log10, log, malloc, memchr, memcmp, memcpy, memset, modf, pow, printf, putchar, puts, scanf, sinh, sin, snprintf, sprintf, sqrt, sscanf, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, tanh, tan, vfprintf, vprintf and vsprintf are all recognized as built-in functions unless -fno-builtin is specified (or -fno-builtin-function is specified for an individual function).
So, quite a lot, really :-)
In C language, Why does n++ execute faster than n=n+1?
(int n=...; n++;)
(int n=...; n=n+1;)
Our instructor asked that question in today's class. (this is not homework)
That would be true if you are working on a "stone-age" compiler...
In case of "stone-age":
++n is faster than n++ is faster than n=n+1
Machine usually have increment x as well as add const to x
In case of n++, you will have 2 memory access only (read n, inc n, write n )
In case of n=n+1, you will have 3 memory access (read n, read const, add n and const, write n)
But today's compiler will automatically convert n=n+1 to ++n, and it will do more than you may imagine!!
Also on today's out-of-order processors -despite the case of "stone-age" compiler- runtime may not be affected at all in many cases!!
Related
On GCC 4.4.3 for x86, with or without optimizations, they compile to the exact same assembly code, and thus take the same amount of time to execute. As you can see in the assembly, GCC simply converts n++ into n=n+1, then optimizes it into the one-instruction add (in the -O2).
Your instructor's suggestion that n++ is faster only applies to very old, non-optimizing compilers, which were not smart enough to select the in-place update instructions for n = n + 1. These compilers have been obsolete in the PC world for years, but may still be found for weird proprietary embedded platforms.
C code:
int n;
void nplusplus() {
n++;
}
void nplusone() {
n = n + 1;
}
Output assembly (no optimizations):
.file "test.c"
.comm n,4,4
.text
.globl nplusplus
.type nplusplus, #function
nplusplus:
pushl %ebp
movl %esp, %ebp
movl n, %eax
addl $1, %eax
movl %eax, n
popl %ebp
ret
.size nplusplus, .-nplusplus
.globl nplusone
.type nplusone, #function
nplusone:
pushl %ebp
movl %esp, %ebp
movl n, %eax
addl $1, %eax
movl %eax, n
popl %ebp
ret
.size nplusone, .-nplusone
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
Output assembly (with -O2 optimizations):
.file "test.c"
.text
.p2align 4,,15
.globl nplusplus
.type nplusplus, #function
nplusplus:
pushl %ebp
movl %esp, %ebp
addl $1, n
popl %ebp
ret
.size nplusplus, .-nplusplus
.p2align 4,,15
.globl nplusone
.type nplusone, #function
nplusone:
pushl %ebp
movl %esp, %ebp
addl $1, n
popl %ebp
ret
.size nplusone, .-nplusone
.comm n,4,4
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
The compiler will optimize n + 1 into nothingness.
Do you mean n = n + 1?
If so, they will compile to identical assembly. (Assuming that optimizations are on and that they're statements, not expressions)
Who says it does? Your compiler optimizes it all away, really, making it a moot point.
Modern compilers should be able to recognize both forms as equivalent and convert them to the format that works best on your target platform. There is one exception to this rule: variable accesses that have side effects. For example, if n is some memory-mapped hardware register, reading from it and writing to it may do more than just transferring a data value (reading might clear an interrupt, for instance). You would use the volatile keyword to let the compiler know that it needs to be careful about optimizing accesses to n, and in that case the compiler might generate different code from n++ (increment operation) and n = n + 1 (read, add, and store operations). However for normal variables, the compiler should optimize both forms to the same thing.
It doesn't really. The compiler will make changes specific to the target architecture. Micro-optimizations like this often have dubious benefits, but importantly, are certainly not worth the programmer's time.
Actually, the reason is that the operator is defined differently for post-fix than it is for pre-fix. ++n will increment "n" and return a reference to "n" while n++ will increment "n" will returning a const copy of "n". Hence, the phrase n = n + 1 will be more efficient. But I have to agree with the above posters. Good compilers should optimize away an unused return object.
In C language the side-effect of n++ expressions is by definition equivalent to the side effect of n = n + 1 expression. Since your code relies on the side-effects only, it is immediately obvious that the correct answer is that these expression always have exactly equivalent performance. (Regardless of any optimization settings in the compiler, BTW, since the issue has absolutely nothing to do with any optimizations.)
Any practical divergence in performance of these expressions is only possible if the compiler is intentionally (and maliciously!) trying to introduce that divergence. But in this case it can go either way, of course, i.e. whichever way the compiler's author wanted to skew it.
I think it's more like a hardware question rather than software... If I remember corectly, in older CPUs the n=n+1 requires two locations of memory, where the ++n is simply a microcontroller command... But I doubt this applies for modern architectures...
All those things depends on compiler/processor/compilation directives. So make any assumptions "what is faster in general" is not a good idea.