Compiling OpenMP code to C code - c

Is there a way I can compile code with OpenMP to C code (With OpenMP part translated to plain C), so that I can know what kind of code is being generated by OpenMP. I am using gcc 4.4 compiler.

Like the other commenters pointed out gcc doesn't translate OpenMP directives to plain C.
But if you want to get an idea of what gcc does with your directives you can compile with the
-fdump-tree-optimized option. This will generate the intermediate representation of the program which is at least C-like.
There are several stages that can be dumped (check the man page for gcc), at the optimized
stage the OpenMP directives have been replaced with calls to the GOMP runtime library.
Looking at that representation might give you some insights into what's happening.

There is at least http://www2.cs.uh.edu/~openuh/ OpenUH (even listed at http://openmp.org/wp/openmp-compilers/), which can
emit optimized C or Fortran 77 code that may be compiled by a native compiler on other platforms.
Also there are: http://www.cs.uoi.gr/~ompi/ OMPi:
The OMPi compiler takes C source code with OpenMP #pragmas and produces transformed multithreaded C code, ready to be compiled by the native compiler of the system.
and OdinMP:
OdinMP/CCp was written in Java for
portability reasons and takes a C-program with OpenMP
directives and produces a C-program for POSIX threads.
I should say that it is almost impossible to translate openmp code into Plain C; because old plain C has no support of threads and I think OpenMP source-to-source translators are not aware of C11 yet. OpenMP program can be translated either to C with POSIX threads or to C with some runtime library (like libgomp) calls. For example, OpenUH has its own library which uses pthreads itself.

Related

Do any C-targeting compilers allow inline C?

Some C compilers emit assembly language and allow snippets of assembly to be placed inline in the source code to be copied verbatim to the output, e.g. https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html
Some compilers for higher-level languages emit C, ranging from Nim which was to some extent designed for that, to Scheme which very definitely was not, and takes heroic effort to compile to efficient code that way.
Do any such compilers, similarly allow snippets of C to be placed inline in the source code, to be copied verbatim to the output?
I'm not sure I understand what you mean by "be copied verbatim to the output," but all C compilers (msvc, gcc, clang, etc...) have preprocessor directives that essentially allow snippets of code to be added to the source files for compilation. For example, the #include directive will pull in the contents the specified file to be included in compilation. An "effect" of this is that you can do weird things such as:
printf("My code: \n%s\n",
#include "/tmp/somefile.c"
);
Alternatively, creating macros with the #define directive allows you to supplant snippets of code by calling a macro name. This all happens at the preprocessor stage before turning into the compile "output."
Other languages, like c# with roslyn, allows runtime compilation of code. Of course, you can also implement the same within c by calling your compiler as via something like system() and then loading the resulting library with dlopen.
Edit:
Now that I come back and think about this question, I should also note that python is one of those C-targeting "compilers" (I guess technically a interpreter on top of the python runtime). Python let's you use native C compiled code with some either some py API code to export functions or directly with some dlopen-like helpers. Take a look at the inlinec module that does what I described above (call the compiler then load the compiled code). I suppose you should have the ability to do similar functionality with any language that can call c compiled code (c#, java, etc...).

Why does glibc library use assembly

I am looking at this page: https://sys.readthedocs.io/en/latest/doc/01_introduction.html
that goes into explanation about how glibc does system calls. In one of the examples the code is examined and it is shown, that the last instruction glibc does to actually do a system call (meaning the interrupt to the cpu) is written in assembly.... So why is part of glibc in assembly? Is there some sort of advantage by writing that small part in assembly?
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation? Thank you.
The answer is super simple - since C doesn't cover system calls (because it doesn't cover any physical hardware in general, and prefers to express itself in terms of abstract machine), there is no C construct glibc can use to perform system call.
One could argue that compiler could provide a sort of intrinsic to do that, but since in Linux glibc is actually part of the compiler suit of tools (in contains CRT as well) there is really no need for it, glibc can do the job.
Also, last, but not the least, in modern CPUs syscall is usually not an interrupt. Instead, it's a specific instruction (syscall in x86_64).
I want to address this piece of your question:
Also, the shared libraries during runtime are already compiled to machine code correct?
So why would there be any advantage using two different languages before compilation?
SergeyA correctly points out that there isn't any C construct (even with all of GCC's extensions) that will cause the compiler to emit a syscall instruction. That's not the only thing that the C library is supposed to do that simply can't be written purely in C: the implementations of setjmp and longjmp, makecontext and setcontext, the "entry point" code that calls main, the "trampoline" that you return to when you return from a signal handler, and several other low-level bits all require a little bit of hand-written assembly. (Exercise: what do they all have in common?)
But there's another reason to mix assembly language into a program mostly written in C. This is one of the several implementations of memcpy for x86-64 in glibc. It is 3100 lines of hand-written assembly language and preprocessor macros. What it does could be expressed in four lines of C. Why would anyone go to that much trouble? Speed. Compilers are always getting closer, but they haven't yet quite managed to beat the human brain when it comes to squeezing every last possible cycle out of a critical innermost loop. (It is worth mentioning that in early 2018 the glibc devs spent a bunch of time replacing hand-written assembly implementations of math.h functions with C, because the compilers have caught up on those, and C is ever so much more maintainable.)
And yet a third answer, which isn't particularly relevant to glibc but comes up a bunch elsewhere, is that maybe you have two different languages in your program because each of them is better at part of your problem. The statistical language R is mostly implemented in C, but a bunch of its mathematical primitives are (or were, I haven't checked in a while) written in FORTRAN, because FORTRAN is still the language that numerical computation wizards think in. Both C and FORTRAN get compiled to machine code, and in principle you could rewrite all the FORTRAN in C, but nobody wants to.

LLVM IR limitations

I am looking to generate LLVM-IR code from C code and was wondering how well is the IR generation for functions in:
stdio.h, string.h, stdlib.h and generally the standard memory based functions such as malloc, calloc, since I have not been able to find most of the common functions in:
http://llvm.org/docs/LangRef.html and was wondering about the limitations of this representation and whether I might be required to add my own intrinsics just to deal with standard/most popular c functions.
I am looking to change the code at runtime, so was wondering which kind of approach will give me the most flexibility eg: Manipulate the code at AST level instead.
Thanks
Emitting LLVM IR from C is exactly what the industrial-strength compiler Clang does. I suggest running Clang on small snippets of C code with -emit-llvm (details in this document: http://clang.llvm.org/get_started.html) and observing the resulting IR.
You can even do this in your browser: http://ellcc.org/demo/index.cgi
That will allow you to see how builtins like memcpy are handled and any other similar doubts.
Note that neither LLVM nor Clang carry a full C library with them, but they can be used to compile an existing one. newlib is a popular portable C library designed specifically for being built on various new platforms. PNaCl, for example, uses it to build C/C++ code into portable executables - it compiles newlib with the user's code together into a single LLVM IR module.

"C or gcc" is like "Chicken or the egg" ? :( [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How are gcc/g++ bootstrapped?
I would like to know how gcc is compiled as we all know it is written in C.
Did they used some other compiler to come up with gcc?
If so, can I use the same compile to compile my C program?
There is no chicken and egg here. glibc is compiled with the compiler you are using.
That compiler was first compiled with a previous version of the same compiler. Then it can compile itself as well.
The real chicken-and-egg problem was solved in the 1950's when someone had to write the world's first compiler. After that, you can use one compiler to compile the next one.
There are two basic ways to build a new compiler:
If you're writing a new compiler for an established language like C, use an existing compiler from a different vendor to build your new compiler. For example, you could use the C compiler shipped with HP-UX to build gcc.
If you're writing a compiler for a new language, start by implementing a very simple compiler in a different language (the first C compiler was written in PDP-11 assembler). This initial compiler will only recognize a small subset of the target language; basically enough to do some file I/O and some simple statements. Write a new compiler in the target language subset and build it with your first compiler. Now write a slightly more capable compiler that can recognize a larger subset of the target language, and build it with the second compiler. Repeat the process until you have a compiler capable of recognizing the full target language.
They did not use some other compiler. You can write a C program that doesn't use glibc by simply telling the compiler not to use it. So something like this:
gcc main.c -nostdlib
This is an interesting question. I think you are wondering in what language is written a compiler of a new language, aren't you? Well, if we had only Assembly language (for instance,x86), the only way to write a C compiler would be in Assembly language. Later, we could write a better, yet more powerful compiler written in C by using our assembly-written compiler, and so on...
An the question arises: how did the early programmers write the first assembly compiler? My father told me: by manually entering the 1's and 0's! :-)

How does C code call assembly code (e.g. optimized strlen)?

I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.
So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?
The C standard dictates what each library function must do rather than how it is implemented.
Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.
Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.
At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.
Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.
First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.
"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.
A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.
The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.
I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.
When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.
Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.
One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.
You can write inline assembly in your C code. The syntax for this is highly compiler specific but the asm keyword is ususally used. Look into inline assembly for more information.

Resources