What is the scope of a #define? - c

What is the scope of a #define?
I have a question regarding the scope of a #define for C/C++ and am trying to bet understand the preprocessor.
Let's say I have a project containing multiple source and header files. Let's say I have a header file that has the following:
// header_file.h
#ifndef __HEADER_FILE
#define __HEADER_FILE
#define CONSTANT_1 1
#define CONSTANT_2 2
#endif
Let's then say I have two source files that are compiled in the following order:
// source1.c
#include header_file.h
void funct1(void)
{
int var = CONSTANT_1;
}
// source2.c
#include header_file.h
void funct2(void)
{
int var = CONSTANT_2;
}
Assuming I have included all the other necessary overhead, this code should compile fine. However, I'm curious as to what #defines are remembered between compilations. When I compile the above code, are the contents of each #include actually included, or are the include guards actually implemented?
TLDR: Do #defines carry over from one compilation unit to the next? Or do #define only exist within a single compilation unit?
As I type this out, I believe I'm answering my own question and I will state my believed answer. #defines are constrained to a single compilation unit (.c). The preprocessor essentially forgets any #defines when it goes from one compilation unit to the next. Thus in the above example I listed, the include guards do not come into play. Am I correct in this belief?

source1.c is compiled separately from source2.c therefore your defines are processed for source1 as it is compiled and then as an independent action they are processed for source2 as it is compiled.
Hopefully this is a clear explanation.

Preprocessor macros do not have "scope" as such, they just define a piece of text that should replace the macro in the code.
This means that the compiler never sees the strings CONSTANT_1 and CONSTANT_2 but instead gets the source in a preprocessed form with these macros replaced with their expansions (1 and 2 respectively).
You may inspect this preprocessed source by calling gcc with the -E flag, or with whatever flag only does preprocessing on your particular compiler.

Yes, you are right!!
Compilation of a file, in it self, is merely, just a process under execution. One process can not interfare with another unless explicitly done. The c pre-processors are just literal substitution mechanism, performed in a dumb way. Whatever conditional checking are performed, are confined to ongoing instance of pre-processor only, nothing gets carry forward once execution (compilation) comes to end. Pre-processors do not "configure" compiler, their scope is limited till "their own compilation"

Related

What happens when preprocessor lines are processed by the preprocessor? - the '.i' file

I am using Gnu cc compiler of Gcc to compile my C programs. Consider a program,
#include <stdio.h>
int main(){
return 0;
}
Now, when I pre-process the above code, using
cpp sample.c > sample.i
I get a lot of contents in sample.i which I haven't included. Say, 'stdio.h' file is preprocessed. If that is the case,
Question 1:
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Question 2:
Can anyone explain what exactly happens when the preprocessor proccess the C file.(The contents that I got in my '*.i' file)
Compiler: gcc
OS: Ubuntu
Thanks
Why are there so many lines in my preprocessed file? I haven't used any of the standard library functions nor Macros.
Preprocessing is just one part of the compilation process. It's more or less a simple textual replacement and nothing more complex is involved at the preprocessing stage. The preprocessor does not know or care whether you have used any standard functions in your code program or not. An optimizer (as part of the compilation process) might
"remove" parts that are not needed. But the preprocessor doesn't do that.
It'll do preprocessing of all the header files you have included and other header files included via your header files and so on.
Can anyone explain what exactly happens when the preprocessor process the C file.(The contents that I got in my '*.i' file)
The preprocessing involves quite a few tasks: macro replacement, conditional compilation, stringification, string concatenation etc.
You can read more about cpp in detail here: https://gcc.gnu.org/onlinedocs/cpp/
the preprocessor command #include "aFile.h" will put the hole content from aFile.h into your cpp file. And that exactly to the place, where the preprocessor directives stands. That is the reason why you can use the in aFile.h defined functions.
if you are interest to learn more about the preprocessor, there is a very good (and short) guidance on cplusplus.com
The preprocessor does text substitution. The net effect of #include <stdio.h> is to replace the #include <stdio.h> line with the contents of <stdio.h>.
Practically, <stdio.h> contains several declarations of various functions (e.g. fprintf(), fscanf()), declarations of variables (e.g. stdout, stdin), and some macro definitions (which, when used in later code, cause text substitution).
The preprocessor is specified as a phase of compilation, which takes source code as input, substitutes text as required (e.g. the #include as I have described, macro expansions, etc), and outputs the resultant source code. That output is what you are directing into sample.i
The output of the preprocessor is then input to a later phase of compilation, which actually understands declarations, definitions, statements, etc.
The phases of compilation are sequential - they occur one after the other, not all at once. So the later phase of compilation feeds no information whatsoever back to the preprocessor. It is the later phase of compilation that detects if declarations etc are used. But, since it cannot feed such information back to the preprocessor (and the preprocessor is an ignorant program that couldn't use such information anyway) the preprocessor cannot know that declarations are unused, and filter them out.
1) You may not use them, but you have included them in line 1
#include <stdio.h>
That's where what you see come from. Try to remove it to see the difference.
2) The preprocessor read your C file and processed all preprocessor directives that you have declared. All Preprocessor directives start with a '#' symbol. The '#include' will replace this line by the content of the given file. You also have the classical '#ifndef' and '#define' directive. The latter is equal to 'if' statement which allow you to activate a part of a code only if a symbol is defined
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
#ifndef WIN32
#include <some_file.h>
#else
#include <some_other_file.h>
#endif
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
#ifndef _SOME_SYMBOL_
#define _SOME_SYMBOL_
// this second function is ignored
int main() { return 0;}
#endif //endof _SOME_SYMBOL_
When the preprocessor reads the above file, the symbol "_SOME_SYMBOL_" is unknown, so the preprocessor initializes it. Next it includes the file whether or not it knows of WIN32. Usually this kind of symbol is passed trough command line. So part of your code is dynamically activated or deactivated.
The preprocessor will output this
void some_other_function_from_some_other_file(){}
int main() { return 0;}

How to understand a #include inside a enum definition?

How do I interpret this C code:
typedef enum {
#include <test.h>
enum1,
enum2,
…
} test_enum;
test.h includes many macros. How to understand this?
Does means that the definitions of the enum needs the macros defined inside the header file?
Can #include appear anywhere?
An #include statement may appear on any line. It is most often used to include entire declarations. However, it can be used to insert any text.
It may be that test.h contains a list of names to be declared inside the enum. It may also contain preprocessor statements, such as macro definitions or #if … #endif statements.
You would have to show the contents of test.h for further help understanding it.
#include and #define are pre processor directives not actual code.
You can put them anywhere (except as part of a literal string) - some compilers are more fussy than others (i.e. the # has to be in column 0).
The Preprocessor expands these out as required, and that is what the compiler sees. As to what it means in your case, depends on the content of test.h
There is normally a compiler option to see your code with all the preprocessor stuff expanded (used to be -e or -E on gcc I think)
The #include directive causes the contents of the included file to be placed exactly at the point of the #include directive. The resulting code is what it is once that expansion has taken place, and can be any valid language construct.
If the included file contains:
enum_a,
enum_b,
enum_c,
Then after inclusion, your code would look like:
typedef enum {
enum_a,
enum_b,
enum_c,
enum1,
enum2,
…
} test_enum;
Which is a valid construct.
A #include directive can appear anywhere. See this.
Pre-processor statements can occur anywhere and are simple textual substitutions. Whether or not the processed code is valid C code is checked by the compiler, not the pre-processor.
Depending on your compiler you can review the changes done by the pre-processor.
For gcc, this would be the -E flag, so by compiling your source code with
gcc -E in.c
you can see which changes code is contained in the enum declaration after inserting test.h and
processing it.

C: Are #define directives global?

I'm somewhat confused by #define statements. In libraries, they seem to be a means of communication across different files, with lots of #ifdefs and #ifndefs.
Having said that, I now have two files file1.c and file2.c compiled together, with #define TEST 10 inside file2.c. Yet, when I use TEST inside file2.c the compiler gives the following error message:
'TEST' undeclared (first use in this function)
Are #define directives global?
#defines are not global, they are just a substitution where ever they are used (if declared in the same compile unit)
They are not globals, they are not symbols, they are irrelevant at linkage, they are only relevant at pre-compilation.
#defined macros are global in that they do not follow normal C scoping rules. The textual substitution from the macro will be applied (almost) anywhere the macro name appears after its #define. (Notable exceptions are if the macro name is part of a comment or part of a string literal.)
If you define a macro in a header file, any file that #includes that header file will inherit that macro (whether desired or not), unless the file explicitly undefines it afterward with #undef.
In your example, file2.c does not know about the TEST macro. How would it know to pick up the #define from file1.c? By magic? Since macros perform textual substitution on the source code, there is no representation of them in the generated object files. file2.c therefore needs to know that substitution rule itself, and if you want that shared across multiple files, that #define needs to live in a common header file that your .c files #include.
If you're asking specifically about how many of the #ifdefs that you see in libraries work, many of them are likely checking against pre-defined macro names provided by the compilation environment. For example, a C99 compiler defines a __STDC_VERSION__ macro that specifies the language version; a Microsoft compiler defines an _MSC_VER macro. (Often these predefined macros start with leading underscores since those names are reserved for the compiler.)
Additionally, most compilers allow defining simple macros as command-line arguments. For example, you might compile your code via gcc -DNDEBUG file1.c to compile file.c with NDEBUG defined to disable asserts.
In case somebody reads this later, and to add some practical information:
Some environments like atmel, vs, or iar, allow you to define global #define directives. They basically pass these defined values to the precompiler in some commandline format.
You can do the same in batch commands or makefiles, etc.
Arduino always adds a board variant (usually located at hardware\arduino\variants) to all compilations. At that point you can create a new board that contains your global define directives, and use it that way. For example, you can define a mega2560(debug) board out of the original mega2560 that contains some debug directives. You will add a reference to that variant in "boards.txt", by copy pasting some text, and properly modifying it.
At the end of the day, you will have to give that global hfile or global directive to the compiler in one way or another.
you should make a file1.h and put your defines there. Then in file2.c
#include "file1.h"
easy as a pie :)

Should variable definition be in header files?

My very basic knowledge of C and compilation process has gone rusty lately. I was trying to figure out answer to the following question but I could not connect compilation, link and pre-processing phase basics. A quick search on the Google did not help much either. So, I decided to come to the ultimate source of knowledge :)
I know: Variables should not be defined in the .h files. Its ok to declare them there.
Why: Because a header file might get included from multiple places, thus redefining the variable more than one time (Linker gives the error).
Possible work-around: Use header-guards in header files and define variable in that.
Is it really a solution: No. Because header-guards are for preprocessing phase. That is to tell compiler that this part has been already included and do not include it once again. But our multiple definition error comes in the linker part - much after the compilation.
This whole thing has got me confused about how preprocessing & linking work. I thought that preprocessing will just not include the code, if the header guard symbol has been defined. In that case, shouldn't multiple definition of a variable problem also get solved?
What happens that these preprocessing directives save the compilation process from redefining symbols under header guards, but the linker still gets multiple definitions of the symbol?
One thing that I've used in the past (when global variables were in vogue):
var.h file:
...
#ifdef DEFINE_GLOBALS
#define EXTERN
#else
#define EXTERN extern
#endif
EXTERN int global1;
EXTERN int global2;
...
Then in one .c file (usually the one containing main()):
#define DEFINE_GLOBALS
#include "var.h"
The rest of the source files just include "var.h" normally.
Notice that DEFINE_GLOBALS is not a header guard, but rather allows declaring/defining the variables depending on whether it is defined. This technique allows one copy of the declarations/definitions.
Header guard protects you from multiple inclusions in a single source file, not from multiple source files. I guess your problem stems from not understanding this concept.
It is not that pre-processor guards are saving during the compile time from this problem. Actually during compile time, one only source file gets compiled into an obj, symbol definitions are not resolved. But, in case of linking when the linker tries to resolve the symbol definitons, it gets confused seeing more than one definition casuing it to flag the error.
You have two .c files. They get compiled separately. Each one includes your header file. Once. Each one gets a definition. They conflict at link time.
The conventional solution is:
#ifdef DEFINE_SOMETHING
int something = 0;
#endif
Then you #define DEFINE_SOMETHING in only one .c file.
Header guards stop a header file being included multiple times in the same translation unit (i.e. in the same .c source file). They have no effect if you include the file in two or more translation units.

#include <> files in different files

If I have a several header files :lets say 1.h, 2.h, 3.h.
Let's say the all three of the header files have #include <stdlib.h> and one of the include files in them.
When I have to use all 3 header files in a C file main.c,
it will have 3 copies of #include <stdlib.h> after the preprocessor.
How does the compiler handle this kind of conflict?
Is this an error or does this create any overhead?
If there are no header guards, what will happen?
Most C headers include are wrapped as follows:
#ifndef FOO_H
#define FOO_H
/* Header contents here */
#endif
The first time the preprocessor scans this, it will include the contents of the header because FOO_H is undefined; however, it also defines FOO_H preventing the header contents from being added a second time.
There is a small performance impact of having a header included multiple times: the preprocessor has to go to disk and read the header each time. This can be mitigated by adding guards in your C file to include:
#ifndef FOO_H
#include <foo.h>
#endif
This stuff is discussed in great detail in Large-Scale C++ Software Design (an excellent book).
This is usually solved with preprocessor statements:
#ifndef __STDLIB_H
#include <stdlib.h>
#define __STDLIB_H
#endif
Although I never saw it for common header files like stdlib.h, so it might just be necessary for your own header files.
The preprocessor will include all three copies, but header guards will prevent all but the first copy from being parsed.
Header guards will tell the preprocessor to convert subsequent copies of that header file to effectively nothing.
Response to edit:
Standard library headers will have the header guards. It would be very unusual and incorrect for them to not have the guards.
Similarly, it is your responsibility to use header guards on your own headers.
If header guards are missing, hypothetically, you will get a variety of errors relating to duplicate definitions.
Another point: You can redeclare a function (or extern variable) a bazillion times and the compiler will accept it:
int printf(const char*, ...);
int printf(const char*, ...);
is perfectly legal and has a small compilation overhead but no runtime overhead.
That's what happens when an unguarded include file is included more than once.
Note that it is not true for everything in an include file. You can't redeclare an enum, for example.
This is done by one of the two popular techniques, both of which are under stdlib's responsibility.
One is defining a unique constant and checking for it, to #ifdef out all the contents of the file if it is already defined.
Another is microsoft-specific #pragma once, that has an advantage of not having to even read the from the hard drive if it was already included (by remembering the exact path)
You must also do the same in all header files you produce. Or, headers that include yours will have a problem.
As far a I know regular include simply throws in the contents of another file. The standard library stdlib.h urely utilizes the code guards: http://en.wikipedia.org/wiki/Include_guard, so you end up including only one copy. However, you can break it (do try it!) if you do: #include A, #undef A_GUARD, #include A again.
Now ... why do you include a .h inside another .h? This can be ok, at least in C++, but it is best avoided. You can use forward declarations for that: http://en.wikipedia.org/wiki/Forward_declaration
Using those works for as long as your code does not need to know the size of an imported structure right in the header. You might want to turn some function arguments by value into the ones by reference / pointer to solve this issue.
Also, always utilize the include guards or #pragma once for your own header files!
As others have said, for standard library headers, the system must ensure that the effect of a header being included more than once is the same as the header being included once (they must be idempotent). An exception to that rule is assert.h, the effect of which can change depending upon whether NDEBUG is defined or not. To quote the C standard:
Standard headers may be included in any order; each may be included more than once in
a given scope, with no effect different from being included only once, except that the
effect of including <assert.h> depends on the definition of NDEBUG.
How this is done depends upon the compiler/library. A compiler system may know the names of all the standard headers, and thus not process them a second time (except assert.h as mentioned above). Or, a standard header may include compiler-specific magic (mostly #pragma statements), or "include guards".
But the effect of including any other header more than once need not be same, and then it is up to the header-writer to make sure there is no conflict.
For example, given a header:
int a;
including it twice will result in two definitions of a. This is a Bad Thing.
The easiest way to avoid conflict like this is to use include guards as defined above:
#ifndef H_HEADER_NAME_
#define H_HEADER_NAME_
/* header contents */
#endif
This works for all the compilers, and doesn't rely of compiler-specific #pragmas. (Even with the above, it is a bad idea to define variables in a header file.)
Of course, in your code, you should ensure that the macro name for include guard satisfies this:
It doesn't start with E followed by an uppercase character,
It doesn't start with PRI followed by a lowercase character or X,
It doesn't start with LC_ followed by an uppercase character,
It doesn't start with SIG/SIG_ followed by an uppercase character,
..etc. (That is why I prefer the form H_NAME_.)
As a perverse example, if you want your users guessing about certain buffer sizes, you can have a header like this (warning: don't do this, it's supposed to be a joke).
#ifndef SZ
#define SZ 1024
#else
#if SZ == 1024
#undef SZ
#define SZ 128
#else
#error "You can include me no more than two times!"
#endif
#endif

Resources