How to write pragmas in C - c

I know that #pragmas are compiler directives which are used to provide additional information to the compiler. My question is that, I need to write some #pragmas for my project. i.e, I need to invoke some particular code when there are some particular pattern in code. Can some one throw light on this ...?
Thanks in advance..!

You can't write your own #pragmas. You must look into your compiler's handbook for which #pragmas are supported.
Alternatively, if your compiler allows you to modify its source code (license- and sourcecode-wise), you might hack some new ones in. Don't expect it to be a trivial task, there's usually no enduser-friendly plug-in write-your-own-pragmas system.

#pragma is a way for compiler vendors to legally implement proprietary extensions. They are hard-coded into the compiler. (And IIRC compilers are required to ignore unknown pragmas.)
Unless you write your own compiler, you cannot create your own pragmas.

Related

Why doesn't (can't) the OS translate C code directly into machine language instead first translating it into assembly language?

As far as I've understood, when a program (written in C for example) is compiled, it is first translated into assembly language and then into machine language. Why can't (isn't) the "assembly language step" be skipped?
Your understanding is wrong, compilers do not necessarily translate C code into assembler. They usually perform several phases and have internal representations, but this doesn't necessarily resemble to a human readable assembler.
Here, I found a nice introduction for LLVM. LLVM is the compiler toolkit that is used for clang.
It is easier for the compiler developers.
It is possible to write a compiler that reads C and writes object code. However, this requires the compiler writer to write all the computations that encode instructions. Instruction encodings are intricate on some machines. Additionally, there are fields to fill in that depend on other interactions, such as how far away a branch target is, which depends on what instructions are between the branch and the target.
Additionally, part of the way a compiler is written is with patterns that say things like “To increment an object x, issue an increment instruction.” In order to write object code directly, you have to encode all the instructions you want to write into those patterns. That means your patterns must have some sort of language for describing instructions.
Well, we already have a language for that: assembly language. So it is simply easier to write your patterns in ways like “To increment an object x, issue inc x.”
Modern compilers have many layers. There is a front end that reads C text (or other languages) and turns it into a language internal to the compiler. There is an optimizer that operates on the internal language (or a representation of it) and tries to improve the code. There is a back end that turns the internal language into assembly language. There is an assembler that turns the assembly into object code. And there is a linker that links object code into an executable file.
As with many complex tasks, it is simply easier for human minds to work with a complex task when it is separated into nice pieces. This reduces bugs and improves the time it takes to work with software. It also makes software flexible, because we can change the front end to support a new language (e.g., Java instead of C) or change the back end to support a new processor (change from Intel assembly to PowerPC assembly). And changing one optimizer improves all the compilers, for Java and C and Intel and PowerPC.
The gcc command that we use to compile is actually just a driver that calls other programs that perform the front-end processing, the optimization, the assembly, and the linking. You can also call most of these phases separately, or use a switch to tell gcc to show you the commands it is using.
Additionally, GCC has a feature that allows developers to insert assembly language directly intermixed with the C code. This compels GCC to include an assembler.
The operating system does not do anything like that. This is the job of the compiler. And in fact, many do directly emit object files - you have to explicitly ask them to emit assembly code. Others choose not to because emitting a fully-featured object file requires expert knowledge about the various formats which exist for this. Assemblers have various convenience features which make the job easier, can (sometimes?) target multiple object file formats without changes in the assembly code. Also, it is a very useful feature to emit annotated assembly code, so not having a separate code generator only for direct object file emission saves you time without any restrictions (except needing an assembler), which makes it an attractive option when you have limited resources.
Depends on the compiler; there is no actual need for the assembly code.
Maybe the authors of whatever compiler you are talking about (GNU-CC?) considered it slightly easier for themselves if they didn't have to resolve certain things like branches themselves.
Assembly code is purely a convenient, somewhat-human-readable representation of the machine code and the symbolic references and relocations needed by the linker when putting together the output of different translation units. Without an intermediate assembly-language step, the compiler would also be responsible for generating the relocations in the form the linker needs, which is doable, but painful. Since an assembler with this capability already exists for processing hand-written assembly code, it makes sense to use it.
There is usually no assembler stage. MSVC (cl.exe) and GCC produce machine code (.obj, .o) right away.
A cross compiler can directly generate the machine code without the help of the OS where that cross compiler is installed.
For example, tornado package installed in windows can generate machine code for vxworks.

How to create a C compiler for custom CPU?

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?
Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?
Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.
Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.
Quick overview/tutorial on writing a LLVM backend.
This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.
[ . . . ]
To create a static compiler (one that emits text assembly), you need to implement the following:
Describe the register set.
Describe the instruction set.
Describe the target machine.
Implement the assembly printer for the architecture.
Implement an instruction selector for the architecture.
There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.
Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this
vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.
The short answer is that it doesn't work that way.
The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):
Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
Type checking, leading to an annotated abstract syntax tree.
Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.
In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.
Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.
1) Short answer:
"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."
2) Longer answer: it's certainly possible. But challenging. And likely expensive.
If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.
3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:
http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/
You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.
But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).
There's a lot of this in the literature, google "compiler-compiler".
But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)
If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

What does #pragma intrinsic mean?

Just want to know what does #pragma intrinsic(_m_prefetchw) mean ?
As far as I am aware, that looks like someone was intending to modify some MSVC++ specific setting. However, that setting is not a valid option for the intrinsic pragma. _m_prefetchw on the other hand is a 3D Now! intrinsic function.
Like all compiler intrinsic functions, it exposes (possibly) faster assembly instructions supported by the underlying hardware to your C or C++ application in a manner
A. more consistent with optimizers, and
B. more consistent with the language, when compared with using inline assembly.
On MSVC on x86_64/x64/amd64 systems, inline assembly is not supported, so one must use such intrinsics to access whizzbang features of the underlying hardware.
Finally, it should be noted that _m_prefetchw is a 3D Now! intrinsic, and 3D Now! is only supported on AMD hardware. It's probably not something you want to use for new code (i.e. you should use SSE instead, which works on both Intel and AMD hardware, and has more features to boot).
The meaning of "#pragma intrinsic" (note spelling), as with all "#pragma" directives, varies from one compiler to another. Generally, it indicates that a particular thing that looks syntactically like a call to an external function should be replaced with some inline code. In some cases, this may greatly improve performance, especially if the compiler can determine constant values for some or all of the arguments (in the latter situation, the compiler may be able to compute the value of the function and replace it with a constant).
Generally, having functions processed as intrinsic won't pose any particular problem. The biggest danger is that if a user defines in one module a function with the same name as one of the compiler's intrinsic function, and attempts to call that function from another module, the compiler might instead replace the function call with its expected instruction sequence. To prevent this, some compilers don't enable intrinsic functions by default (since doing so would cause the above incompatibility with some standard-conforming programs) but provide #pragma directives to do enable them. Compilers may also use command-line option to enable intrinsics (since the standard allows anything there), or may define some functions like __memcpy() as intrinsic, and within string.h, use a #define directive to convert memcpy into __memcpy (since programs that #include string.h are not allowed to use memcpy for any other purpose).
In C, it depends on whether the implementation recognizes (and defines) it.
If the implementation does not recognize the "intrinsic" preprocessing token, the pragma is ignored.
If the implementation recognizes it, whatever is defined will happen (and if another implementation defines it differently, a different thing happens on the other implementation).
So, check the documentation for the implementation you're talking about (edit: and don't use it if you expect to compile your source on different implementations).
I couldn't find any reference to "#pragma intrinsic" in man gcc, on my system.
The intrinsic pragma tells the compiler that a function has known behavior. The compiler may call the function and not replace the function call with inline instructions, if it will result in better performance.
Source: http://msdn.microsoft.com/en-us/library/tzkfha43(VS.80).aspx

How to use the __attribute__ keyword in GCC C?

I am not clear with use of __attribute__ keyword in C.I had read the relevant docs of gcc but still I am not able to understand this.Can some one help to understand.
__attribute__ is not part of C, but is an extension in GCC that is used to convey special information to the compiler. The syntax of __attribute__ was chosen to be something that the C preprocessor would accept and not alter (by default, anyway), so it looks a lot like a function call. It is not a function call, though.
Like much of the information that a compiler can learn about C code (by reading it), the compiler can make use of the information it learns through __attribute__ data in many different ways -- even using the same piece of data in multiple ways, sometimes.
The pure attribute tells the compiler that a function is actually a mathematical function -- using only its arguments and the rules of the language to arrive at its answer with no other side effects. Knowing this the compiler may be able to optimize better when calling a pure function, but it may also be used when compiling the pure function to warn you if the function does do something that makes it impure.
If you can keep in mind that (even though a few other compilers support them) attributes are a GCC extension and not part of C and their syntax does not fit into C in an elegant way (only enough to fool the preprocessor) then you should be able to understand them better.
You should try playing around with them. Take the ones that are more easily understood for functions and try them out. Do the same thing with data (it may help to look at the assembly output of GCC for this, but sizeof and checking the alignment will often help).
Think of it as a way to inject syntax into the source code, which is not standard C, but rather meant for consumption of the GCC compiler only. But, of course, you inject this syntax not for the fun of it, but rather to give the compiler additional information about the elements to which it is attached.
You may want to instruct the compiler to align a certain variable in memory at a certain alignment. Or you may want to declare a function deprecated so that the compiler will automatically generate a deprecated warning when others try to use it in their programs (useful in libraries). Or you may want to declare a symbol as a weak symbol, so that it will be linked in only as a last resort, if any other definitions are not found (useful in providing default definitions).
All of this (and more) can be achieved by attaching the right attributes to elements in your program. You can attach them to variables and functions.
Take a look at this whole bunch of other GCC extensions to C. The attribute mechanism is a part of these extensions.
There are too many attributes for there to be a single answer, but examples help.
For example __attribute__((aligned(16))) makes the compiler align that struct/function on a 16-bit stack boundary.
__attribute__((noreturn)) tells the compiler this function never reaches the end (e.g. standard functions like exit(int) )
__attribute__((always_inline)) makes the compiler inline that function even if it wouldn't normally choose to (using the inline keyword suggests to the compiler that you'd like it inlining, but it's free to ignore you - this attribute forces it).
Essentially they're mostly about telling the compiler you know better than it does, or for overriding default compiler behaviour on a function by function basis.
One of the best (but little known) features of GNU C is the attribute mechanism, which allows a developer to attach characteristics to function declarations to allow the compiler to perform more error checking. It was designed in a way to be compatible with non-GNU implementations, and we've been using this for years in highly portable code with very good results.
Note that attribute spelled with two underscores before and two after, and there are always two sets of parentheses surrounding the contents. There is a good reason for this - see below. Gnu CC needs to use the -Wall compiler directive to enable this (yes, there is a finer degree of warnings control available, but we are very big fans of max warnings anyway).
For more information please go to http://unixwiz.net/techtips/gnu-c-attributes.html
Lokesh Venkateshiah

Are these C #ifdefs for portability outdated?

I'm working with an old C code that still has a few dusty corners. I'm finding a lot of #ifdef statements around that refer to operating systems, architectures, etc. and alter the code for portability. I don't know how many of these statements are still relevant today.
I know that #ifdef isn't the best idea in certain circumstances, and I'll be sure to fix that, but what I'm interested in here is what's being tested.
I've listed them below. If you could tell me if any of them are definitely useful in this day and age, or if the machines or OSs with which they're associated have long since expired, that would be great. Also, if you know of any central reference for these, I'd love to hear about it.
Thanks in advance,
Ross
BORLANDC
BSD
CGLE
DRYRUN
HUGE
IBMPC
MAIN
M_XENIX
OPTIMIZED
P2C_H_PROTO
sgi
TBFINDADDREXTENDED
UNIX
vms
__GCC__
__GNUC__
__HUGE__
__ID__
__MSDOS__
__TURBOC__
Here you are.
You are coming from the wrong direction. Instead of asking what code can be safely deleted, you should ask - what code have to stay.
Find out what platforms have to be supported and delete everything that is not defined in any of them. You'll get yourself cleanest code possible that is still guaranteed to work.
What context is this code being used?
If it's a library other people outside your organization are using, you shouldn't touch this stuff unless you're releasing a new version and explicitly removing support for some OSs. In the latter case, you should remove all the relevant IFDEF code as part of making a new release, and should be explicit about what you are removing.
If it's a library people inside your organization are using, you should ask those people what you can remove, not us.
If it's code being used very narrowly (i.e. you control its use directly), you can, if you wish, safely remove any sort of compiler portability, since you are only using one compiler.
You're asking the wrong people: It's your users (or potential users) who decide what's still useful, not us. Start by finding out what platforms you need to support, and then you can find out what's not needed.
If, for example, you don't need to support 16-bit systems, you can dispense with __HUGE__, __MSDOS__, and __TURBOC__.
Any #ifdef based on arbitrary preprocessor definitions provided by the implementation is outdated - especially those which are in the namespace reserved for the application, not the implementation, as most of those are! The correct modern way to achieve this kind of portability is to either detect the presence of different interfaces/features/behavior with a configure script and #define HAVE_FOO etc. based on that, directly test standard preprocessor defines (like UINT_MAX to determine integer size), and/or provide prebuilt header files for each platform you want to support with the appropriate HAVE_FOO definitions.
The old-style "portability" #ifdefs closely coupled knowledge of every single platform all over your source, making for a nightmare when platforms changed and adopted new features, behaviors, or defaults. (Just imagine the mess of old code that assumes Windows is 16bit or Linux has SysV-style signal()!) The modern style isolates knowledge of the platform and allows the conditional compilation in your source files to depend only on the presence/absence/behavior of the feature it wants to use.
Code that is annotated like that can in fact be quite difficult to maintain. You could consider to look into something like autotools or alike to configure your sources for a particular architecture.

Resources