Why do programs run slower on first run after compile? - c

I have this simple hello world program:
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
I compiled this program with LLVM Clang (v15.0.1, built from Homebrew, so not Apple's version) like normal, then I ran and timed the outputs. To my surprise, the first time the program ran, it took nearly 10x longer than the second time, but the next three executions run much faster.
$ clang test.c -o test
$ time ./test
Hello, world!
real 0m0.169s
user 0m0.001s
sys 0m0.002s
$ time ./test
Hello, world!
real 0m0.017s
user 0m0.001s
sys 0m0.006s
$ time ./test
Hello, world!
real 0m0.004s
user 0m0.001s
sys 0m0.002s
$ time ./test
Hello, world!
real 0m0.008s
user 0m0.001s
sys 0m0.005s
I'm running this on an Intel Core i5 mac, running macOS Big Sur v11.6.8. The shell is the bash shipped with macOS.
Nothing in my code deals with time, and I don't think there's anything to cache, so I'm not sure why the first execution runs so slow. I suspect that the OS might be doing some kind of optimization, but I don't know what/how. What is the cause of this large discrepancy in runtimes?

Related

How to run c project in console

Hello I have a project written in C that contains four .c files in different directories.These are ipv4_lib.c udp_lib.c projekt_C.c programLib.c
I 've written it in eclipse and everything works fine and it's easy to run,but now I have to run it in console.I have already run programs in console,but they were much easier and usually contained one or two headers in the same directory,so all I had to do was compile each files and run the main one.
But I have no idea how to run project.Is there some command to do that or something like that? Thanks
Look into GCC and Clang:
GCC: https://en.wikipedia.org/wiki/GNU_Compiler_Collection
Clang: https://en.wikipedia.org/wiki/Clang, https://clang.llvm.org/
Both are compilers you can use to compile your source code via the terminal. GCC is has been around longer and has better support but can be a little slow. Clang is newer, so less widely used, but is noticeably faster than GCC at compiling source code.
An example Clang command in your terminal:
clang -o hello hello.c && ./hello
This will compile your hello.c file and give you a hello executable that you can then run. We'll just assume that running the hello program prints Hello, World! to the console.

How to compile C program

I am learning C and I have a simple hello world program that I am trying to run on Windows 10. Here is the code:
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
I have installed GCC compiler and I tried the following in order to run it in the command prompt:
gcc hello.c
a
I also tried:
gcc hello.c
./a.exe
and:
gcc hello.c
./a
and:
gcc hello.c -o hello
./hello
The program does not run displaying hello, world and it gives the following error:
bash: a.exe: command not found
What am I doing wrong and how can I run the program after the compilation?
It appears that your compilation succeeded successfully.
See if there is an a.out or a.exe file present, as you didn't indicate a non-default executable name.
Note that running a alone typically won't do anything, because it is highly unlikely that your executable is on the bash PATH. This means you need to run ./a.out or ./a (depending on base operating system).
Binary executables under windows typically must have .exe extension to be recognized as such.
I am not sure if gcc under windows adds the right extension automaticaly when outputting executables.
I would try:
gcc hello.c -o hello.exe
./hello.exe

why drastic performance degradation when running a command as different user

i have a hello world C example ./a.out
Now i measured execution time using time for below commands
time ./a.out
Hello World
real 0m0.001s
user 0m0.000s
sys 0m0.002s
time runuser -l root -c './a.out'
real 0m0.017s
user 0m0.004s
sys 0m0.011s
time su -s /bin/bash -c "./a.out" root
Hello World
real 0m0.080s ---> 80 times slower
user 0m0.005s
sys 0m0.071s
Why is the third command 80 times slower than the first command?
Environment -- Redhat 7
With the second and third command, the time command also times the launch of runuser, su and bash, which takes some time as well.
It shouldn't make so many difference if you do:
$ runuser -l root -c 'time ./a.out'
and:
$ su -s /bin/bash -c "time ./a.out" root

Why is the gcc math library so inefficient?

When I was porting some fortran code to c, it surprised me that the most of the execution time discrepancy between the fortran program compiled with ifort (intel fortran compiler) and the c program compiled with gcc, comes from the evaluations of trigonometric functions (sin, cos). It surprised me because I used to believe what this answer explains, that functions like sine and cosine are implemented in microcode inside microprocessors.
In order to spot the problem more explicitly I made a small test program in fortran
program ftest
implicit none
real(8) :: x
integer :: i
x = 0d0
do i = 1, 10000000
x = cos (2d0 * x)
end do
write (*,*) x
end program ftest
On intel Q6600 processor and 3.6.9-1-ARCH x86_64 Linux
I get with ifort version 12.1.0
$ ifort -o ftest ftest.f90
$ time ./ftest
-0.211417093282753
real 0m0.280s
user 0m0.273s
sys 0m0.003s
while with gcc version 4.7.2 I get
$ gfortran -o ftest ftest.f90
$ time ./ftest
0.16184945593939115
real 0m2.148s
user 0m2.090s
sys 0m0.003s
This is almost a factor of 10 difference! Can I still believe that the gcc implementation of cos is a wrapper around the microprocessor implementation in a similar way as this is probably done in the intel implementation? If this is true, where is the bottle neck?
EDIT
According to comments, enabled optimizations should improve the performance. My opinion was that optimizations do not affect the library functions ... which does not mean that I don't use them in nontrivial programs. However, here are two additional benchmarks (now on my home computer intel core2)
$ gfortran -o ftest ftest.f90
$ time ./ftest
0.16184945593939115
real 0m2.993s
user 0m2.986s
sys 0m0.000s
and
$ gfortran -Ofast -march=native -o ftest ftest.f90
$ time ./ftest
0.16184945593939115
real 0m2.967s
user 0m2.960s
sys 0m0.003s
Which particular optimizations did you (commentators) have in mind? And how can compiler exploit a multi-core processor in this particular example, where each iteration depends on the result of the previous one?
EDIT 2
The benchmark tests of Daniel Fisher and Ilmari Karonen made me think that the problem might be related to the particular version of gcc (4.7.2) and maybe to a particular build of it (Arch x86_64 Linux) that I am using on my computers. So I repeated the test on the intel core i7 box with debian x86_64 Linux, gcc version 4.4.5 and ifort version 12.1.0
$ gfortran -O3 -o ftest ftest.f90
$ time ./ftest
0.16184945593939115
real 0m0.272s
user 0m0.268s
sys 0m0.004s
and
$ ifort -O3 -o ftest ftest.f90
$ time ./ftest
-0.211417093282753
real 0m0.178s
user 0m0.176s
sys 0m0.004s
For me this is a very much acceptable performance difference, which would never make me ask this question. It seems that I will have to ask on Arch Linux forums about this issue.
However, the explanation of the whole story is still very welcome.
Most of this is due to differences in the math library. Some points to consider:
Yes, the x86 processors with the x87 unit has fsin and fcos instructions. However, they are implemented in microcode, and there is not particular reason why they must be faster than a pure software implementation.
GCC does not have it's own math library, but rather uses the system provided one. On Linux this is typically provided by glibc.
32-bit x86 glibc uses fsin/fcos.
x86_64 glibc uses software implementations using the SSE2 unit. For a long time, this was a lot slower than the 32-bit glibc version which just used the x87 instructions. However, improvements have (somewhat recently) been made, so depending on which glibc version you have the situation might not be as bad anymore as it used to be.
The Intel compiler suite is blessed with a VERY fast math library (libimf). Additionally, it includes vectorized transcendental math functions, which can often further speed up loops with these functions.

cross compile issue with simple hello program

Background : Trying to setup a cross compiler environment for arm target (TQ2440/Mini2440)
On HOST running Red Hat:
Wrote a simple hello program
gcc -o hello hello.c
compiles successfully
./hello
displays the hello world message
rm hello
arm-linux-gcc -o hello hello.c
file hello
It says 32bit compiled for ARM compatible for Linux 2.0.0
Transfer the "hello" binary file to TARGET
chmod a+x hello
./hello
The problem:
/bin/sh: ./hello: not found
Can anyone point out my mistake or what am I missing here?
I executed ldd on host: ldd hello and I got:
/usr/local/arm/3.3.2/bin/ldd: line 1:
/usr/local/arm/3.3.2/lib/ld-linux.so.2: cannot execute binary file
/usr/local/arm/3.3.2/bin/ldd: line 1:
/usr/local/arm/3.3.2/lib/ld-linux.so.2: cannot execute binary file
ldd: /usr/local/arm/3.3.2/lib/ld-linux.so.2 exited with unknown exit code (126)
Solved.
I was transfering the file through ftp. You need to enter bin to switch to binary transfer mode. Works fine now.
Try running ldd hello and see if it complains about any missing dynamic libraries.

Resources