Are programs that are compiled gcc optimised by default? - c

While at University I learned that compiler optimises our code, in order for the executable to be faster. For example when a variable is not used after a point, it will not be calculated.
So, as far as I understand, that means that if I have a program that calls a sorting algorithm, if the results of the algorithm are printed then the algorithm will run. However, if nothing is printed(or used anywhere else), then there is no reason for the program to even make that call.
So, my question is:
Does these things(optimisation) happen by default when compiling with gcc? Or only when the code is compiled with O1, O2, O3 flags?

When you meet a new program for the first time, it is helpful to type man followed by the program name. When I did it for gcc, it showed me this:
Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.
...
-O0 Reduce compilation time and make debugging produce the expected results. This is the default.
To summarize, with -O0, all code that is in the execution path that is taken will actually execute. (Program text that can never be in any execution path, such as if (false) { /* ... */ }, may not generate any machine code, but that is unobservable.) The executed code will feel "as expected", i.e. it'll do what you wrote. That's the goal, at least.

Related

can I edit lines of code using gdb and is it also possible to save to actual source file and header file while in same debug session? linux

I have this program called parser I compiled with -g flag this is my makefile
parser: header.h parser.c
gcc -g header.h parser.c -o parser
clean:
rm -f parser a.out
code for one function in parser.c is
int _find(char *html , struct html_tag **obj)
{
char temp[strlen("<end")+1];
memcpy(temp,"<end",strlen("<end")+1);
...
...
.
return 0;
}
What I like to see when I debug the parser or something can I also have the capability to change the lines of code after hitting breakpoint and while n through the code of above function. If its not the job of gdb then is there any opensource solution to actually changing code and possible saving so when I run through the next statement in code then changed statement before doing n (possible different index of array) will execute, is there any opensource tool or can it be done in gdb do I need to do some compiling options.
I know I can assign values to variables at runtime in gdb but is this it? like is there any thing like actually also being capable of changing soure
Most C implementations are compiled. The source code is analyzed and translated to processor instructions. This translation would be difficult to do on a piecewise basis. That is, given some small change in the source code, it would be practically impossible to update the executable file to represent those changes. As part of the translation, the compiler transforms and intertwines statements, assigns processor registers to be used for computing parts of expressions, designates places in memory to hold data, and more. When source code is changed slightly, this may result in a new compilation happening to use a different register in one place or needing more or less memory in a particular function, which results in data moving back or forth. Merging these changes into the running program would require figuring out all the differences, moving things in memory, rearranging what is in what processor register, and so on. For practical purposes, these changes are impossible.
GDB does not support this.
(Appleā€™s developer tools may have some feature like this. I saw it demonstrated for the Swift programming language but have not used it.)

IOCCC 1988/isaak.c - why no output even after ANSIfication?

The carefully crafted, self-including code in this IOCCC winning entry from 1988:
http://www.ioccc.org/years.html#1988_isaak
...was still too much for certain systems back then. Also, ANSI C was finally emerging as a stable alternative to the chaotic K&R ecosystem. As a result, the IOCCC judges also provided an ANSI version of this entry:
http://www.ioccc.org/1988/isaak.ansi.c
Its main attraction is its gimmick of including <stdio.h> in the last line (!) with well-thought-out #defines, both inside the source and at compile time, to only allow certain parts of the code into the right level. This is what allows the <stdio.h> header to be ultimately included at the latest stage possible, just before it is necessary, in the source fed to the compiler.
However, this version still fails to produce its output when compiled today, with the provided compiler settings:
gcc -std=c89 -DI=B -DO=- -Dy isaak.ansi.c
tcc -DI=B -DO=- -Dy isaak.ansi.c
Versions used: GCC 9.3.0, TCC 0.9.27
There isn't any evident reliance on the compiled binary filename, hence I left it to the compiler's choice. Even when using -o isaak or -o isaak.ansi, the same result happens: no output.
What is causing this? How are the output functions failing? What can be done to correct this?
Thanks in advance!
NOTE: The IOCCC judges, realising that this entry had portability issues that would detract from its obfuscation value, decided to also include a UUENCODEd version of the code's output:
http://www.ioccc.org/1988/isaak.encode
There is nothing remotely portable about this program. As I see it tries to overwrite the exit standard library function with its own code, expecting that return from empty main() would call that exit(), which is not true. And even then, such behaviour is not standard-conforming - even C89 said it would have undefined behaviour.
You can "fix" the program on modern GCC / Linux by actually calling exit(); inside main - just change the first line to
main(){exit(0);}
I compiled gcc -std=c89 -DI=B -DO=- -Dy isaak.ansi.c and run ./a.out and got sensible output out.

Order of arguments for C bitwise operations?

I've gotten a piece of software working, and am now trying to tune it up so it runs faster. I discovered something that struck as well - just bizarre. It's no longer relevant, because I switched to using a pointer instead of indexing an array (it's faster with the pointers), but I'd still like to know what is going on.
Here's the code:
short mask_num_vals(short mask)
{
short count = 0;
for(short val=0;val<NUM_VALS;val++)
if(mask & val_masks[val])
count++;
return count;
}
This small piece of code is called many many times. What really surprised me is that this code runs significantly faster than its predecessor, which simply had the two arguments to the "&" operation reversed.
Now, I would have thought the two versions would be, for all practical purposes, identical, and they do produce the same result. But the version above is faster - noticeably faster. It makes about a 5% difference in the running time of the overall code that uses it. My attempt to measure the amount of time spent in the function above failed completely - measuring the time used up far more time than actually executing the rest of the code. (A version of Heisenberg's principle for software, I guess.)
So my picture here is, the compiled code evaluates the two arguments, and then does a bitwise "and" on them. Who cares which order the arguments are in? Apparently the compiler or the computer does.
My completely unsupported conjecture is that the compiled code must be evaluating "val_masks[val]" for each bit. If "val_masks[val]" comes first, it evaluates it for every bit, if "mask" comes first, it doesn't bother with "val_masks[val]" if that particular bit in "mask" is zero. I have no evidence whatsoever to support this conjecture; I just can't think of anything else that might cause this behaviour.
Does this seem likely? This behaviour just seemed weird to me, and I think points to some difference in my picture of how the compiled code works, and how it actually works. Again, not all that relevant any more, as I've evolved the code further (using pointers instead of arrays). But I'd still be interested in knowing what is causing this.
Hardware is an Apple MacBook Pro 15-inch 2018, MacOS 10.15.5. Software is gcc compiler, and "gcc --version" produces the following output.
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Compiled with the command "gcc -c -Wall 'C filename'", linked with "gcc -o -Wall 'object filenames'".
Code optimizers are often unpredictable. Their output can change after small meaningless tweaks in code, or after changing command-line options, or after upgrading the compiler. You cannot always explain why the compiler does some optimization in one case but not in another; you can guess all you want, but only experience can show.
One powerful technique in determining what is going on: convert your two versions of code to assembly language and compare.
GCC could be invoked with the command-line switch -S for that.
gcc -S -Wall -O -fverbose-asm your-c-source.c
which produces a textual assembler file your-c-source.s (you could glance into it using a pager like less or a source code editor like GNU emacs) from the C file your-c-source.c
The Clang compiler has similar options.

GCC optimization flag problems

I am having a problem with some C code built with gcc compiler. The code in question has an enum, whose values are used as cases in a switch statement to configure an object. Fairly standard stuff.
When I compile the code using the -O0 option flag, everything builds and runs correctly no problem. However, when the flag is set to -O2 the code no longer works as expected.
When I step through the code and put a watch on the local variables, the enum, which should be only be one of three enum values, is actually -104! This causes the program to fail to configure the object.
Has anyone encountered this before who could provide some guidance? I haven't encountered this before and would appreciate if someone could explain why the compiler does this so I can make any necessary changes.
Snippet of code in question:
value = 0u;
switch(test_config) {
case DISABLE:
break;
case INTERNAL:
value = 1u;
break;
case EXTERNAL:
value = 2u;
break;
default:
valid = FALSE;
break;
}
if (valid) {
configure_test(value);
}
Enum in question:
typedef enum {
DISABLE,
INTERNAL,
EXTERNAL
} test_config_t;
This is the code that is causing the problem. I initially didn't include it because I didn't want the question to be please fix my code, rather I have been googling looking for reasons why gcc optimisation flags would produce different results for the same piece of code and haven't found anything particularly helpful. Also I am not at my computer and had to type this on my phone which also doesn't help. So I came here because there are experts here who know way more than me that could point me in the right direction.
Some more info that I probably should have included. The code runs on hardware which also might be the problem and I am looking into that as well. When ran from FSBL the code works with -O0, but not with -O2. So it may be hardware, but then I don't know why it works one way not the other.
You don't give enough details (since your question don't show any actual code, it should have some MCVE) but you very probably have some undefined behavior and you should be scared.
Remember that C11 or C99 (like most programming languages) is defined by an explicit specification (not only by the concrete behaviour observed on your code) written in English and partly defining the runtime behaviour of a valid C program. Read n1570.
I strongly recommend reading Lattner's blog What Every C programmer should know about Undefined Behavior before even touching or compiling your source code.
I recommend at least compiling with (nearly) all warnings and debug info, e.g. with gcc -Wall -Wextra -g, then improve the code to get no warnings, and run it under the gdb debugger and valgrind. Read more about Invoking GCC. You may also use (temporarily) some sanitizer instrumentation options, notably -fsanitize=undefined and -fsanitize=address. You could also add -std=gnu99 and -pedantic to your compiler flags. Notice that gdb watchpoints are a very useful debugger feature to find why a value has changed or is unexpected.
When you compile for release or for benchmarking with optimizations enabled, keep also the warning flags (so compile with gcc -O2 -Wall -Wextra); optimizations might give extra warnings which you should also correct. BTW, GCC accepts both -O2 and -g at the same time.
When you observe such issues, question first your own code before suspecting the compiler (because compilers are very well tested; I found only one compiler bug in almost 40 years of programming).

confusion between compiler and interpreter?

I read the following documentation about compiler and interpreter somewhere :-
A compiler searches all the errors of a program and lists them. If the program is error
free then it converts the code of program into machine code and then the program can be
executed by separate commands.
An interpreter checks the errors of a program statement by statement. After checking
one statement, it converts that statement into machine code and then executes that
statement. The process continues until the last statement of program occurs.
My doubt came from the following code:
int main()
{
printf("hello")
scanf("%d",&j);
return 0;
}
I am using MINGW GCC compiler. When I compile the above code following things happen:
Firstly I get the error
error: expected ';' before 'scanf()'
After I correct the above error then I get the second error
error: 'j' undeclared (first use in this function)
So I wanted to know that why both the errors are not listed at one time?
Compiler and interpreters are technically two different things, though the boundaries can be pretty fluid sometimes.
A compiler is basically nothing more than a language translator. It takes a source language as input and generates a destination language as output.
An interpreter takes a language (be it high-level or low-level) and executes the code described by the language.
The confusion is mostly because most modern script languages contains both a compiler and an interpreter, where the compiler takes the script language and creates a lower-level equivalent (similar to binary machine language) that the interpreter then reads and executes.
As for your problems with the compiler errors, it's most likely because the compiler can't continue parsing the scanf call due to the first error, and simply skips it (including the undeclared variable).
You should also know that in C some errors can actually cause more errors in code that is otherwise correct for example
int j
printf("Enter something: ");
scanf("%d", &j);
You will get an error because of the missing semicolon after the declaration of the variable j, but you will also get an error with the scanf line as the compiler can't find the variable j, even though the scanf call is otherwise correct.
Another typical example of errors that will give follow-up errors in unrelated code, is to forget the terminating semicolon of a structure in a header file. If it's the last structure you might not even get any error in the header file, just unrelated errors in the source file you include the header file in.
The documentation you are quoting is a bit misleading.
Both compilers and interpreters aim to report as much errors as possible but finding "all the errors of a program" is impossible. (cf. Halting Problem)
So, a compiler doesn't "search for errors", rather, it parses your source into a tree representation (AST) and then tries to transform that tree into another "tree" for another language (say, machine code).
An interpreter also parses your code but the transformation is done in parts at runtime.
So in your example, the missing semicolon causes the parser to fail so the compiler doesn't even get to the compilation stage (which reports the second error).
As others have said, the distinction between compilers and interpreters is not that clear anymore. Similar techniques are used, interpreters often compile to machine code, etc.
The compiler definition you are quoting is not the best one. One would think that the most important characteristic of a compiler is that it finds errors. Though of course it is very important part of the compiler's job, the main one is to translate the source code into some other form - not even necessarily machine code. In the old days some compilers did not bother with listing all the errors found - at least in one case the entire messaging was that the compiler found an error somewhere in the source and stopped. And even now sometimes it is not possible to find all errors in one go.
A common compiler behavior when an error is detected is to try to recover the error and continue the parsing to check other errors.
When the compiler detects the missing semicolon error it usually try to recover the error skipping input until the next semicolon, for that reason the scanf("%d",&j) statement is not parsed and the missing j definition error is not detected.
The text you are quoting is problematic. While in general true, usually the compiler doesn't have a separate "error check" phase.
What it really does is it tries to read your code right away, and if your code has errors in it, it will fail while trying to read.
The difference between interpreter and compiler isn't when it checks for errors, it is when it actually runs the code. A compiler tries to read the program fully, then run it, an interpreter reads ~one statement (or even just one sub-expression), then runs it, reads another, runs it.
Differences between compiler and Interpreter are given below :
Compiler takes entire program whereas Interpreter takes single statement as input.
Compiler generate intermediate object code whereas Interpreter can not generate intermediate object code .
Compiler execute the program entirely whereas Interpreter execute the program line by line.
Compiler usually faster whereas Interpreter usually slower.
Compiler give less error diagnostics than interpreter.
A compiler directly changes the source code into the machine language, whereas an interpreter produces a middle code and then executes this code in order to form a machine understandable code.
Compiler reads entire program for compilation. Interpreter reads single statement at a time.
Compiler displays all errors and warning together. Interpreter displays single error at a time it reads single instruction at a time.
Compiler requires more memory because of object generation, every time when program is being compiled an intermediate code will be generated. An interpreter needs less memory to interpret the program as interpreter does not generates any intermediate code.
In compiler debugging is comparatively difficult. In an interpreter debugging is easy.
Compiler takes large amount of time to analyse the source code but the overall execution time is comparatively faster. An interpreter takes less amount of time to analyse the source code but the overall execution time is slower.
Once a program is compiled, its source code is not useful for running the code. For interpreted programs, the source code is needed to run the program every time.
In compiler the compilation is done before execution. In an interpreter compilation and execution take place simultaneously.
Compiler does not allow a program to run until it is completely error-free.Interpreter runs the program from first line and stops execution only if it encounters an error.
Examples of programming languages that use compilers: C, C++, COBOL Examples of programming languages that use interpreters: BASIC, Visual Basic, Python, Ruby, PHP, Perl, MATLAB, Lisp.

Resources