I'm curious if there's any way to use #, $, or ? in a C function or variable name. I know that linkers allow them (because of C++ name mangling).
Is there any kind of escape code that could allow this (I don't care how ugly it looks)? Or, in standard C, is this completely impossible?
It is not possible in purely standard C.
But if using GCC (or probably Clang/LLVM) you can have $ in identifiers, and you can set the linker name using asm labels
You could perhaps also play GNU ld tricks with ld scripts.
Standard C does not allow any of these (and it can't really allow ? since it's an operator and thus a separate token). GCC (and possibly compatible compilers) allow $ but not the others. However, you could use the GNU C (GCC) extension for making a declaration for an external-linkage name that references a different underlying symbol name; this may achieve what you want, e.g. if you're trying to reference C++ symbols. I believe the syntax is something like adding __asm__("symbol_name") to the end of the function declaration. There are some examples in the glibc headers on most Linux systems.
Alternatively, if you have dlsym, you could use it to look up the names at runtime.
Related
I am working on a static analysis tool for C. I need to pass the code being analysed through the C preprocessor so that the tool can see the library function prototypes, type definitions, etc. Unfortunately both with clang on Mac OS X and gcc on Linux distros, some of the standard header files refer to compiler built-in types like __builtin_va_list that my tool doesn't know about. Does anyone have any suggestions for how to work around this. One possibility, if it's available somewhere, would be a vanilla-flavoured set of header files that produce C that conforms strictly to the standard. The header files don't have to map to any ABI, as the tool doesn't need to compile and run the code: they just have to give the API promised by the C standard. Any suggestions will be gratefully received.
Instead of finding a set of standard standard header files, you can just use a set of empty files with the expected names and pass the source code through the compiler preprocessor with a -Idirectory option. Your syntax analysis tool should be able to deal with the remaining symbols.
It would be useful to have a preprocessor option in addition to -dI to preserve #include lines instead of handling them.
In the mean time, you can try using the include files from my nolibc repository.
In GCC10, gcc default to fno-common. That means, all tentative defined symbols are not common. I think gcc conforms to the C specification but it seems there are no common symbols in the native C program. Are common symbols only for extension syntax?
Does native C have common symbol?
Read the C11 standard n1570. Its index don't even mention common symbols.
Read carefully also the documentation of GCC and this draft report.
Perhaps you refer to the ELF file format used on Linux for object files and executables. There you can find a mention of common symbols, which tend to be deprecated .... Read the Linux ABI specification, etc here.
My recommendation is to declare all your public symbols as extern in some header file (#include-d in most of your *.c files), and define them once (without extern) in a single translation unit. You could use simple preprocessor tricks (such as X-macros).
You might be interested in using C code generators such as lemon or SWIG, or develop your script (with GNU awk or Guile or Python or GPP etc... ....) for simple metaprogramming techniques (autoconf could be inspirational) generating some C code. Configure your build automation tool (GNU make, ninja...) suitably.
You might be interested in using static analyzer options and precompiled headers of recent GCC. Look also into Clang static analyzer and clang tidy and Frama-C.
You surely want to pass -Wall -Wextra -g -H to gcc and read How to debug small programs and Modern C.
No, it has nothing to do with "extension syntax", and it has nothing to do with "common symbols" as a language construct. It simply refers to the behavior of variable declarations at file scope.
C says that if you place a declaration like int i; in a file, and don't elaborate on it anywhere else, then it will have external linkage and it will be considered to be defined to have a value of 0. This is called a "tentative definition". Declarations with the same name in different files, if they have external linkage, all refer to the same variable. Generally the way to use external linkage is to define a variable in one file, and use an extern declaration in any other files that make use of it.
In GCC with -fcommon, tentative definitions for the same variable can appear in more than one file. GCC will resolve this at link time, and allocate storage (initialized to zero) for the variable once.
In GCC with -fno-common, tentative definitions are resolved to definitions ASAP when the file is compiled. If more than one file contains a tentative definition for a variable, then this will cause a multiple definition error at link time.
As far as I can tell, the C standard doesn't require or prohibit either behavior. In particular, C does not have C++'s "one definition rule". However, the -fno-common behavior is generally less surprising, catches a forgotten extern sooner, and allows the compiler to optimize better (because it knows exactly where the variable lives when compiling, instead of waiting to find out later). For these reasons the default was changed in GCC.
Declaring a global variable with the same name as a standard function produces an error in clang (but not gcc). It is not due to a previous declaration in a header file. I can get the error by compiling the following one-line file:
extern void *memcpy[];
Clang says
foo.c:1:14: error: redefinition of 'memcpy' as different kind of symbol
foo.c:1:14: note: previous definition is here
Apparently this only happens for a few standard functions. printf produces an error, fprintf produces a warning, fseek just works.
Why is this an error? Is there a way to work around it?
Motivation. I am using the C compiler as a compiler backend. C code is programmatically generated. The generated code relies on byte-level address arithmetic and pointer type casting. All external symbols are declared as extern void *variablename[];.
According to the C standard (ISO 9899:1999 section 7.1.3), "all external identifiers defined by the library are reserved in a hosted environment. This means, in effect, that no user-supplied external names may match library names."
Your problem can be easily solved by adding a unique prefix to all your identifiers, e.g. "mylang_".
As an alternative, you can avoid the problem by using the LLVM or GCC -ffreestanding flag, which will compile your code for a non-hosted environment. (The C standard specifies that the restriction only applies to a hosted environment.) In this case you can use all the names you want (apart from main, which is still your program's entry point), but you must make your own arrangements for your library. This is how operating system kernels can legally define their own versions of the C library functions.
The reason is explained here and a relevant extract is given below. http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html
I get an error in gcc as well.
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them. There are several reasons for these restrictions:
Other people reading your code could get very confused if you were using a function named exit to do something completely different from what the standard exit function does, for example. Preventing this situation helps to make your programs easier to understand and contributes to modularity and maintainability.
It avoids the possibility of a user accidentally redefining a library function that is called by other library functions. If redefinition were allowed, those other functions would not work properly.
It allows the compiler to do whatever special optimizations it pleases on calls to these functions, without the possibility that they may have been redefined by the user. Some library facilities, such as those for dealing with variadic arguments (see Variadic Functions) and non-local exits (see Non-Local Exits), actually require a considerable amount of cooperation on the part of the C compiler, and with respect to the implementation, it might be easier for the compiler to treat these as built-in parts of the language.
The page also describes other restricted names.
I am compiling one program called nauty. This program uses a canonical function name getline which is also part of the standard GNU C library.
Is it possible to tell GCC at compile time to use this program defined function?
One solution:
Now you have declaration of the function in some application .h file something like:
int getline(...); // the custon getline
Change that to:
int application_getline(...); // the custon getline
#define getline application_getline
I think that should do it. It will also fix the .c file where the function is defined, assuming it includes that .h file.
Also, use grep or "find in files" of editor to make sure that every place where that macro takes effect, it will not cause trouble.
Important: in every file, make sure that .h file included after any standard headers which may use getline symbol. You do not want that macro to take effect in those...
Note: this is an ugly hack. Then again, almost everything involving C pre-processor macros can be considered an ugly hack, by some criteria ;). Then again, getting existing incompatible code bases to co-operate and work together is often a case where a hack is acceptable, especially if long term maintenance is not a concern.
Note2: As per this answer and as pointed out in a comment, this is undefined behavior by C standard. Keep this in mind, if intention is to maintain the software for longer then just getting a working executable binary one time. But I added a better solution.
Note that you may trigger undefined behavior if the GCC header where standard getline is defined is actually used in your code. These are the relevant information sources (emphasis mine):
The libc manual:
1.3.3 Reserved Names
The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names. All other library names are reserved if your program explicitly includes the header file that defines or declares them. There are several reasons for these restrictions:
[...]
and the C99 draft standard (N1256):
7.1.3 Reserved identifiers
1
Each header declares or defines all identifiers listed in its associated subclause, and
optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.
[...]
2
No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined.
3
If the program removes (with #undef) any macro definition of an identifier in the first
group listed above, the behavior is undefined.
Thus even the macro trick suggested in another post will invoke undefined behavior if you include the header of getline in your code.
Unfortunately, in this case the only safe bet is to manually rename all getline invocations.
C demands unique function names.
but you can use -fno-builtin or -ffreestanding gcc flags.
see description about this flags in gcc man page.
A common approach is to use prefixes which form some sort of namespace. Sometimes you can see macros used for this to make changing the namespace name easier, e.g.
#define MYAPP(f) myapp_##f
Which is then used like
int MYAPP(add)(int a, int b) {
return a + b;
}
This defines a function myapp_add which you can also invoke like
MYAPP(add)(3, 5);
This standards compliance issue started to bug me, so I did a bit of experimenting. Here's a 2nd answer, which is possibly better then the currently accepted answer of mine.
First, solution:
Just define macro _XOPEN_SOURCE with value 699, by adding this to compiler command line options
-D_XOPEN_SOURCE=699
How exactly, that depends on applications build system, but one probably working way would be to define CFLAGS environment variable, and see if it takes effect when rebuilding:
export CFLAGS="-D_XOPEN_SOURCE=699"
Other alternative would be to add #define _XOPEN_SOURCE 699 before includes in every .c file of the application, in case it uses some esoteric build system and you can't get it added to compile options, but doing it from command line is by far preferable.
Then some explanation:
Man page of getline specifies, that getline is defined only under certain standards, such as if _XOPEN_SOURCE>=700. So, by defining a smaller value before including the relevant file, we exclude the library declaration. More information about these feature-test macros is found in GNU libc manual.
I expected there to be some linker issues too, but there weren't, and my investigation resulted this question here. To summarize, linker will prefer symbol from linked object files (at least with gcc), and will only look at dynamic libraries if it has not found symbol otherwise. So, since getline is not ISO C symbol, GNU libc documentation quoted in this answer seems to imply, that after using the _XOPEN_SOURCE trick of this answer, it's ok to use it in an application. Still, beware of other libraries using the POSIX getline and ending up calling application's function (probably with different parameters, resulting in undefined behaviour, probably a crash).
Here is a neat solution to your problem. The trick is LD_PRELOAD.
I have done the similar thing in one of my question post.See the following.
Hack the standard function in library and call the native library function afterwards
You can defined the getline() in the separate file. This will make the design clean too. Now, compile that c file;
$gcc -c -g -fPIC <file.c>.
This will create the file.o. Now, make the shared object of it.
-g for debugging.
-fPIC for position independent code. This will help to save the RAM SIZE. The text segment will be shared, if you specify the -fPIC option.
$gcc -shared libfile.so file.o
Now, link your main file with this shared object.
gcc -g main.c -o main.out -lfile
while executing, use the LD_PRELOAD, this will use your library instead of the native API.
$LD_PRELOAD=<path to libfile.so>/libfile.so ./main.out
If you like my answer,then please appreciate. I have done the similar kind of stuff, in my previous post Hack the standard function in library and call the native library function afterwards .
In gcc, how can I check what C preprocessor definitions are in place during the compilation of a C program, in particular what standard or platform-specific macro definitions are defined?
Predefined macros depend on the standard and the way the compiler implements it.
For GCC: http://gcc.gnu.org/onlinedocs/cpp/Predefined-Macros.html
For Microsoft Visual Studio 8: http://msdn.microsoft.com/en-us/library/b0084kay(VS.80).aspx
This Wikipedia page http://en.wikipedia.org/wiki/C_preprocessor#Compiler-specific_predefined_macros lists how to dump at some of the predefined macros
A likely source of the predefined macros for a specific combination of compiler and platform is the Predef project at Sourceforge. They are attempting to maintain a catalog of all predefined macros in all C and C++ compilers on all platforms. In practice, they have coverage of a fair number of platforms for GCC, and a smattering of other compilers.
They achieved this through a combination of careful reading of documentation, as well as a shell script that figures out what macros are predefined the hard way: it tries them. My understanding is that it actually tries every string it can find in the executable image of the compiler and/or preprocessor to see if it has a predefined meaning.
They will happily add any info they don't have yet to their database.
A program may define a macro at one
point, remove that definition later,
and then provide a different
definition after that. Thus, at
different points in the program, a
macro may have different definitions,
or have no definition at all.