Questions about C as an intermediate language [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm writing a language that compiles to C right now, and when I say IL I mean as in C is the language I write the code as to then generate assembly by another c compiler, e.g. gcc or clang.
The C code I generate, will it be more beneficial to:
If I do some simple opt passes (constant propagation, dead code removal, ...) will this reduce the amount of work the C compiler has to do, or make it harder because it's not really human C code?
If I were to compile to say three-address code or SSA or some other form and then feed this into a C program with functions, labels, and variables - would that make it easier or harder for the C compiler to optimize?
Which kind of link together to form the following question...
What is the most optimal way to produce good C code from a language that compiles to C?
Is it worth doing any optimisations at all and leaving that to the compiler?

Generally there's not much point doing peephole type optimisations because the C compiler will simply do those for you. What is expensive is a) wasted or unnecessary "gift-wrapping" operations, b) memory accesses, c) branch mispredictions.
For a), make sure you're not passing data about too much, because whilst C will do constant propagation, there's a limit to how far it can detect that two buffers are in fact aliases of the same underlying data. For b) try to keep functions short and operations on the same data together, also limit heap memory use to improve cache performance. For c), the compiler understand for loops, it doesn't understand goto loops. So it will figure that
for(i=0;i<N;i++)
will usually take the loop body, it wont figure that
if(++i < N) goto do_loop_again
will usually take the jump.
So really the rule is to make your automatic code as human-like as possible. Though if it's too human-like, that raises the question of what your language has to offer that C doesn't - the whole point of a non-C language is to create a spaghetti of gotos in the C source, a nice structure in the input script.

Related

How is the assembler compiler programmed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I learned not long ago that most of the asm compilers were written in C or other languages, and we say assembler is the fastest language. But if it's coded in C, how can it be faster than the C itself? What does the compiler do then? Are there compilers of ASM in ASM? I do not really understand how it all works ... I searched on the internet, but I did not find clearly what I was looking for ...
Would you have explained or given me any links that could help me better understand the concept of assemblies compilers?
There are three concepts getting tossed around here:
Speed of a compiler
Speed of a processor
Speed of an executable
First, to get it out of the way, the time it takes to compile some executable has very little relationship to the time it takes for that executable to run. (The compiler can take longer to do some careful analysis and apply optimizations.)
The speed at which your processor can operate is another thing. Assembly language is the closest to machine language, which is what your processor understands. Any given instruction in machine language will operate at the speed that the machine processes that instruction.
Everything that executes on your processor must, by definition, be at some point converted to machine language so that your processor can understand and execute it.
That’s where things get tricky. An assembler will translate code you write directly to machine language, but there is more to a program than just knowing how to convert to machine language. Suppose you have a complex value, such as a collection of options. These options must be maintained as strings, integers, floats, etc. How are they stored? How are they accessed?
The way in which all this is done can vary. The way you organize your program can vary. These variations make a difference in executable time.
So you can write a very slow program using assembly language and a very fast program using an interpreted language. And, frankly, compilers are often better at organizing the final machine code than you are, even if you are using an assembler directly.
So to bring it to a point: the compiler’s job is to transform your textual source code (C, or assembly, or whatever) into machine code, which is what your processor understands. Once done, the compiler is no longer necessary.
There is significantly more to it than that, but that is the general idea.

Assembly and Execution of Programs - Two pass assembler [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
While going through a book on machine instructions and programs I came across a particular point which says that an assembler scans an entire source program twice. It builds a symbol table during the 1st pass/scan and associates the entire program with it during the second scan. The assembler needs to provide an address in a similar way for a function.
Now, since the assembler passes through the program twice, why is it necessary to declare a function before it can be used? Wouldn't the assembler provide an address for the function from the 1st pass and then correlate it to the program during the 2nd pass ?
I am considering C programming in this case.
The simple answer is that C programs require that functions be declared before it can be used because the C language was designed to be processed by a compiler in a single pass. It has nothing to with assemblers and addresses of functions. The compiler needs to know the type of a symbol, whether its a function, variable or something else, before it can use it.
Consider this simple example:
int foo() { return bar(); }
int (*bar)();
In order to generate the correct code the compiler needs to know that bar isn't a function, but a pointer to a function. The code only works if you put extern int (*bar)(); before the definition of foo so the compiler knows what type bar is.
While the language could have been in theory designed to require the compiler to use two passes, this would have required some significant changes in the design of the language. Requiring two passes would also increase the required complexity of the compiler, decreasing the number of platforms that could host a C compiler. This was very important consideration back in the day when C was first being developed, back when 64K (65,536) bytes of RAM was a lot of memory. Even today would have noticeable impact on the compile times of large programs.
Note that the C language does sort of allows what you want anyways, by supporting implicit function declarations. (In my example above this it what happens in foo when bar isn't declared previously.) However this feature is obsolete, limited, and considered dangerous.

C's coverage of assembly [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
In an argument with a friend, I made the remark that it is impossible to write, in any language besides C, a program that is faster than all variants in C, that do the same thing. My argument was based on an affirmative answer to the question below. Is it true?
If we think of "compiling" as a map from [C programs] to [assembly programs], then is this map surjective?
Caveat: Of course, you can include assembly in C programs, but pretend that isn't possible (makes for a more interesting question!).
The answer to the question If we think of "compiling" as a map from [C programs] to [assembly programs], then is this map surjective? is obviously NO.
It can be proven trivially:
* There could be assembly language instructions that the compiler will not generate, such as int 10, halt, jmp *eax, iret, sub esp,esp...
* You might be fiddling with registers in assembly that the C compiler never touches, such as segment registers.
There is just a world of creativity in assembly that the C language cannot express.
Regarding the other question, I'm not sure what you mean by
it is impossible to write, in any language besides C, a program that is faster than all variants in C, that do the same thing.
If you mean that a skilled programmer can always write a C program that will be faster at a given task than any other program written in any language, I think you probably wrong too, because the compiler itself is a fixed variable that is imperfect.
Imagine for example that the C compiler is very dumb and generates unoptimized code. It is obvious that an assembly program can be written that will beat the best C variation at the given task: all that is needed is to optimize the unoptimized code. Since the C compiler is imperfect, you can always find a task for which even the best C variation can be further optimized.

Generally Ada seems to compile code slower than similar C code, why is this? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
When I compile programs in Ada, I typically notice a longer compile time for code of similar length and of similar content to programs written in C or C++.
While it is true that it comes down to the compiler and system to determine compile time the Ada compilation generally takes longer. Is this process radically different than the compile/link process of C or C++. Does it consist of different stages?
What about the Ada compilation process makes the compilation take longer than ?
It is all about the amount of time and effort put into making the compiler fast.
Compilers that have a broader scope tend to have more money to invest in making fast; however, sometimes there are other elements at stake. For example, the details of a compiler might include static type checking, various "extra" correctness checks, and other items (programming contract compliance, code quality, etc) that might adjust the compile time.
Ada tends to have had less money thrown at its compiler, and it is likely a slightly more complex language to parse than C. Both of these factors lend themselves to making it likely that its compiler will be slower.
Note that speed of compilation has little to do with the "quality" of the language. While C might have a larger footprint, Ada has made its mark on the programming world in other ways.

How does one obfuscate code in C? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I want to obfuscate code just for fun. I'm looking at code from the international obfuscated c contest: http://www.ioccc.org/ And I seriously just have no idea how to even start reverse engineering some of this code to make anything of sense.
What are some common obfuscation techniques and how do you make sense of obfuscated code?
There is a lot of different techniques to obfuscate code, here is a small, very incomplete list:
Identifier mangling. Either you will find people using names like a, b, c exclusively, or you find identifiers that have absolutely nothing to do with the actual purpose of the variable/function. Deobfuscation would be to assign sensible names.
Heavy use of the conditional evaluation operator ? :, replacing all occurences of if() else. In most cases that's a lot harder to read, deobfuscation would reinsert if().
Heavy use of the comma operator instead of ;. In combination with 2. and 4., this basically allows the entire program to be one single statement in main().
Recursive calls of main(). You can fold any function into main by having an argument that main can use to decide what to do. Combine this with replacing loops by recursion, and you end up with the entire program being the main function.
You can go the exact opposite direction to 3. and 4., and hack everything into pieces by creating an insane amount of functions that all do virtually nothing.
You can obfuscate the storage of an array by storing the values on the stack. Should you need to walk the data twice, there's always the fork() call handy to make a convenient copy of your stack.
As I said, this is a very incomplete list, but generally, obfuscation is usually the heavy, systematic abuse of any valid programming technique. If the IOCCC were allowing C++ entries, I would bet on a lot of template code entering, making heavy use of throwing exceptions as an if replacement, hiding structure behind polymorphism, etc.

Resources