Can a C compiler ever optimize a loop by running it?
For example:
int num[] = {1, 2, 3, 4, 5}, i;
for(i = 0; i < sizeof(num)/sizeof(num[0]); i++) {
if(num[i] > 6) {
printf("Error in data\n");
exit(1);
}
}
Instead of running this each time the program is executed, can the compiler simply run this and optimize it away?
Let's have a look… (This really is the only way to tell.)
Fist, I've converted your snippet into something we can actually try to compile and run and saved it in a file named main.c.
#include <stdio.h>
static int
f()
{
const int num[] = {1, 2, 3, 4, 5};
int i;
for (i = 0; i < sizeof(num) / sizeof(num[0]); i++)
{
if (num[i] > 6)
{
printf("Error in data\n");
return 1;
}
}
return 0;
}
int
main()
{
return f();
}
Running gcc -S -O3 main.c produces the following assembly file (in main.s).
.file "main.c"
.section .text.unlikely,"ax",#progbits
.LCOLDB0:
.section .text.startup,"ax",#progbits
.LHOTB0:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE22:
.size main, .-main
.section .text.unlikely
.LCOLDE0:
.section .text.startup
.LHOTE0:
.ident "GCC: (GNU) 5.1.0"
.section .note.GNU-stack,"",#progbits
Even if you don't know assembly, you'll notice that the string "Error in data\n" is not present in the file so, apparently, some kind of optimization must have taken place.
If we look closer at the machine instructions generated for the main function,
xorl %eax, %eax
ret
We can see that all it does is XOR'ing the EAX register with itself (which always results in zero) and writing that value into EAX. Then it returns again. The EAX register is used to hold the return value. As we can see, the f function was completely optimized away.
Yes. The C compiler unrolls loops automatically with options -O3 and -Otime.
You didn't specify the compiler, but using gcc with -O3 and taking the size calculation outside the for maybe it could do a little adjustment.
Compilers can do even better than that. Not only can compilers examine the effect of running code "forward", but the Standard even allows them to work code logic in reverse in situations involving potential Undefined Behavior. For example, given:
#include <stdio.h>
int main(void)
{
int ch = getchar();
int q;
if (ch == 'Z')
q=5;
printf("You typed %c and the magic value is %d", ch, q);
return 0;
}
a compiler would be entitled to assume that the program will never receive any input which would cause the printf to be reached without q having received a value; since the only input character which would cause q to receive a value would be 'Z', a compiler could thus legitimately replace the code with:
int main(void)
{
getchar();
printf("You typed Z and the magic value is 5");
}
If the user types Z, the behavior of the original program will be well-defined, and the behavior of the latter will match it. If the user types anything else, the original program will invoke Undefined Behavior and, as a consequence, the Standard will impose no requirements on what the compiler may do. A compiler will be entitled to do anything it likes, including producing the same result as would be produced by typing Z.
Related
I need to compare some char * (which I know the length of) with some string literals. Right now I am doing it like this:
void do_something(char * str, int len) {
if (len == 2 && str[0] == 'O' && str[1] == 'K' && str[2] == '\0') {
// do something...
}
}
The problem is that I have many comparisons like this to make and it's quite tedious to break apart and type each of these comparisons. Also, doing it like this is hard to maintain and easy to introduce bugs.
My question is if there is shorthand to type this (maybe a MACRO).
I know there is strncmp and I have seen that GCC optimizes it. So, if the shorthand is to use strncmp, like this:
void do_something(char * str, int len) {
if (len == 2 && strncmp(str, "OK", len) == 0) {
// do something...
}
}
Then, I would like to know it the second example has the same (or better) performance of the first one.
Yes it will. However, your code is not comparing a char * to a string literal. It is comparing two string literals. The compiler is smart enough to spot this and optimize all the code away. Only the code inside the if block remains.
We can see this by looking at the assembly code generated by the comiler:
cc -S -std=c11 -pedantic -O3 test.c
First with your original code...
#include <stdio.h>
#include <string.h>
int main() {
unsigned int len = 2;
char * str = "OK";
if (len == 2 && strncmp(str, "OK", len) == 0) {
puts("Match");
}
}
Then with just the puts.
#include <stdio.h>
#include <string.h>
int main() {
//unsigned int len = 2;
//char * str = "OK";
//if (len == 2 && strncmp(str, "OK", len) == 0) {
puts("Match");
//}
}
The two assembly files are practically the same. No trace of the strings remains, only the puts.
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14 sdk_version 10, 14
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
callq _puts
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "Match"
.subsections_via_symbols
This is a poor place to focus on optimization. String comparison against small strings is very unlikely to be a performance problem.
Furthermore, your proposed optimization is likely slower. You need to get the length of the input string, and that requires walking the full length of the input string. Maybe you need that for other reasons, but its an increasing edge case.
Whereas strncmp can stop as soon as it sees unequal characters. And it definitely only has to read up to the end of the smallest string.
Your example implies that your strings are always NUL terminated. In that case, don't bother getting their length ahead of time, since that involves searching for the NUL. Instead, you can do
memcmp(str, "OK", 3);
This way, the NULs get compared too. If your length is > 2, the result will be > 0 and if it's shorter, the result will be < 0.
This is a single function call, and memcmp is virtually guaranteed to be better optimized than your hand-written code. At the same time, don't bother optimizing unless you find this code to be a bottleneck. Keep in mind also that any benchmark I run on my machine will not necessarily apply to yours.
The only real reason to make this change is for readability.
Currently studying C. When I define, for example, a vector, such as:
float var1[2023] = {-53.3125}
What would the corresponding X86 Assembly translation look like? I'm looking for the exact portion of code where the variable is defined, where the ".type" and ".size" and alignment values are mentioned.
I've seen on the internet that when dealing with a floating-point number, the X86 Assembly conversion will simply be ".long". However, I'm not sure to what point that is correct.
One easy way to find out is to ask the compiler to show you:
// float.c
float var1[2023] = { -53.3125 };
then compile it:
$ gcc -S float.c
and then study the output:
.file "float.c"
.globl var1
.data
.align 32
.type var1, #object
.size var1, 8092
var1:
.long 3260366848
.zero 8088
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-39)"
.section .note.GNU-stack,"",#progbits
Note that this is just GCC's implementation; clang does it differently:
.file "float.c"
.type var1,#object # #var1
.data
.globl var1
.align 16
var1:
.long 3260366848 # float -5.331250e+01
.long 0 # float 0.000000e+00
.long 0 # float 0.000000e+00
// thousands of these
.size var1, 8092
.ident "clang version 3.4.2 (tags/RELEASE_34/dot2-final)"
.section ".note.GNU-stack","",#progbits
EDIT - To answer the comment below, the use of long simply lays down a specific bit pattern that encodes the compiler's idea of floating point format.
The value 3260366848 is the same as hex 0xC2554000, which is 11000010010101010100000000000000 in binary, and it's the binary value that the CPU cares about. If you care to, you can get out your IEEE floating point spec and decode this, there's the sign, that's the exponent, etc. but all the details of the floating point encoding were handled by the compiler, not the assembler.
I'm no kind of compiler expert, but decades ago I was tracking down a bug in a C compiler's floating point support, and though I don't remember the details, in the back of my mind it strike me as having the compiler do this would have been helpful by saving me from having to use a disassembler to find out what the bit pattern was actually encoded.
Surely others will weigh in here.
EDIT2 Bits are bits, and this little C program (which relies on sizeof int and sizeof float being the same size), demonstrates this:
// float2.c
#include <stdio.h>
#include <memory.h>
int main()
{
float f = -53.3125;
unsigned int i;
printf("sizeof int = %lu\n", sizeof(i));
printf("sizeof flt = %lu\n", sizeof(f));
memcpy(&i, &f, sizeof i); // copy float bits into an int
printf("float = %f\n", f);
printf("i = 0x%08x\n", i);
printf("i = %u\n", i);
return 0;
}
Running it shows that bits are bits:
sizeof int = 4
sizeof flt = 4
float = -53.312500
i = 0xc2554000
i = 3260366848 <-- there ya go
This is just a display notion for 32 bits depending on how you look at them.
Now to answer the question of how would you determine 3260366848 on your own from the floating point value, you'd need to get out your IEEE standard and draw out all the bits manually (recommend strong coffee), then read those 32 bits as an integer.
I am new to C, so forgive me if this query is basic.
I want to call main() from another function, and make the program run infinitely. The code is here:
#include <stdio.h>
void message();
int main()
{
message();
return 0;
}
void message()
{
printf("This is a test message. \n");
main();
}
I expect to see this program run infinitely. However, it runs for some time and then stops suddenly. Using a counter variable, which I printed alongside the test message, I found that the statement "This is a test message." is printed 174608 times after which I get an error message
Segmentation fault (core dumped)
and the program terminates. What does this error mean? And why does the program only run 174608 times (why not infinitely)?
You have stack overflow from infinite recursion. Make infinite loop in main:
int main()
{
while (1)
{
//...
}
}
The mutual recursion costs stack space. If you put the recursion in main() itself, the compiler may recognise the tail recursion, and replace it by iteration. [for fun and education, don't try this at home, children ...] :
#include <stdio.h>
void message();
int main()
{
message();
return main();
}
void message()
{
printf("This is a test message. \n");
}
GCC recognises the tail recursion in optimisation level=2 and above.
main.s output for gcc -O2 -S main.c:
.p2align 4,,15
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
.p2align 4,,7
.p2align 3
.L4:
call message
jmp .L4
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3"
.section .note.GNU-stack,"",#progbits
This is not equivalent to while(1) {...} or for(;;) {...}, which give you infinite loops.
Every time a function(for example, main() or message()) is called, some values are pushed into the stack. When functions are called too many times, your stack get filled, and finally overflow, giving you a "stack overflow" error.
Note that this error has nothing to do with this site, although they happen to have the same name :)
I wrote a c code: convert integer to string and have a comma every 3 digits, and can anyone give me a hint how to convert it to assembly language???
I just want simply convert it into assembly language! Can't use other library call!
#include <stdio.h>
char *my_itoa(int n, char *buf)
{
int i, j, k=0, l=0;
char tmp[32] = {0};
i = n;
do {
j = i%10;
i = i/10;
sprintf(tmp+k, "%d", j);
k++;
l++;
if (i!=0 && l%3 == 0) {
sprintf(tmp+k, ",");
k++;
l = 0;
}
}while(i);
for (k--,i=0; i<=k; i++) {
buf[i] = tmp[k-i];
}
return buf;}
If your complier is gcc, this answer suggests a nice way to produce a combined C/assembly language file that is easier to read than plain assembly. They use c++, but just replace c++ with gcc:
# create assembler code:
gcc -S -fverbose-asm -g original.c -o assembly.s
# create asm interlaced with source lines:
as -alhnd assembly.s > listing.lst
C compilers conforming to the SUSv2 (Single Unix Specification version 2 published in 1997) can do what you want much more simply by using a locale flag specifier. Here's example C code that works with gcc:
#include <stdio.h>
#include <locale.h>
int main()
{
int num = 12345678;
setlocale(LC_NUMERIC, "en_US");
printf("%'d\n", num);
return 0;
}
This prints 12,345,678. Note, too, that this works correctly with negative numbers while your code does not. Because the code is simply a single library call, it's probably not worth implementing in assembly language, but if you still cared to, you could use this:
In the .rodata section include this string constant:
.fmt:
.string "%'d\n"
The code is extremely simple because all the hard work is done within printf:
; number to be converted is in %esi
movl $.fmt, %edi
movl $0, %eax
call printf
I know in some languages the following:
a += b
is more efficient than:
a = a + b
because it removes the need for creating a temporary variable. Is this the case in C? Is it more efficient to use += (and, therefore also -= *= etc)
So here's a definitive answer...
$ cat junk1.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ cat junk2.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s += a * a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ for a in *.c ; do gcc -O3 -o ${a%.c} $a ; done
michael#isolde:~/junk$ time ./junk1
Final sum: 3338615082255021824
real 0m2.188s
user 0m2.120s
sys 0m0.000s
michael#isolde:~/junk$ time ./junk2
Final sum: 3338615082255021824
real 0m2.179s
user 0m2.120s
sys 0m0.000s
...for my computer and my compiler running on my operating system. Your results may or may not vary. On my system, however, the time is identical: user time 2.120s.
Now just to show you how impressive modern compilers can be, you'll note that I used the expression a * a in the assignment. This is because of this little problem:
$ cat junk.c
#include <stdio.h>
int main()
{
long a, s = 0;
for (a = 0; a < 1000000000; a++)
{
s = s + a;
}
printf("Final sum: %ld\n", s);
}
michael#isolde:~/junk$ gcc -O3 -S junk.c
michael#isolde:~/junk$ cat junk.s
.file "junk.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Final sum: %ld\n"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
movabsq $499999999500000000, %rdx
movl $.LC0, %esi
movl $1, %edi
xorl %eax, %eax
jmp __printf_chk
.cfi_endproc
.LFE22:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
The compiler figured out my loop and unrolled it to the point of calculating the cumulative sum and just embedded that as a constant which it proceeded to print out, skipping any kind of looping construct entirely. In the face of optimizers that clever do you really think you're going to find any meaningful edge in distinguishing between s = s + a and s += a?!
This is a compiler specific question really, but I expect all modern compilers would give the same result. Using Visual Studio 2008:
int main() {
int a = 10;
int b = 30;
a = a + b;
int c = 10;
int d = 50;
c += d;
}
The line a = a + b has disassembly
0014139C mov eax,dword ptr [a]
0014139F add eax,dword ptr [b]
001413A2 mov dword ptr [a],eax
The line c += d has disassembly
001413B3 mov eax,dword ptr [c]
001413B6 add eax,dword ptr [d]
001413B9 mov dword ptr [c],eax
Which is the same. They are compiled into the same code.
It depends on what a is.
a += b in C is by definition equivalent to a = a + b, except that from the abstract point of view a is evaluated only once in the former variant. If a is a "pure" value, i.e. if evaluating it once vs. evaluating it many times makes no impact on programs behavior, then a += b is strictly equivalent to a = a + b in all regards, including efficiency.
In other words, in situations when you actually have a free choice between a += b and a = a + b (meaning that you know that they do the same thing) they will generally have exactly the same efficiency. Some compilers might have difficulties when a stands for a function call (for one example; probably not what you meant), but when a is a non-volatile variable the machine code generated for both expressions will be the same.
For another example, if a is a volatile variable, then a += b and a = a + b have different behavior and, therefore, different efficiency. However, since they are not equivalent, your question simply does not apply in such cases.
In the simple cases shown in the question, there is no significant difference. Where the assignment operator scores is when you have an expression such as:
s[i]->m[j1].k = s[i]->m[jl].k + 23; // Was that a typo?
vs:
s[i]->m[j1].k += 23;
Two benefits - and I'm not counting less typing. There's no question about whether there was a typo when the first and second expressions differ; and the compiler doesn't evaluate the complex expression twice. The chances are that won't make much difference these days (optimizing compilers are a lot better than they used to be), but you could have still more complex expressions (evaluating a function defined in another translation unit, for example, as part of the subscripting) where the compiler may not be able to avoid evaluating the expression twice:
s[i]->m[somefunc(j1)].k = s[i]->m[somefunc(j1)].k + 23;
s[i]->m[somefunc(j1)].k += 23;
Also, you can write (if you're brave):
s[i++]->m[j1++].k += 23;
But you cannot write:
s[i++]->m[j1++].k = s[i]->m[j1].k + 23;
s[i]->m[j1].k = s[i++]->m[j1++].k + 23;
(or any other permutation) because the order of evaluation is not defined.
a += b
is more efficient than
a = a + b
because the former takes you 6 keystrokes and the latter takes you 9 keystrokes.
With modern hardware, even if the compiler is stupid and uses slower code for one than the other, the total time saved over the lifetime of the program may possibly be less than the time it takes you to type the three extra key strokes.
However, as others have said, the compiler almost certainly produces exactly the same code so the former is more efficient.
Even if you factor in readability, most C programmers probably mentally parse the former more quickly than the latter because it is such a common pattern.
In virtually all cases, the two produce identical results.
Other than with a truly ancient or incompetently written compiler there should be no difference, as long as a and b are normal variables so the two produce equivalent results.
If you were dealing with C++ rather than C, operator overloading would allow there to be more substantial differences though.