what is the role of .LANCHOR0 in detecting multiple definitions error? - c

I got a multiple definitions error and I fixed it by putting the Var1 as static in the header file which is common to both pet.c and bet.c.
The following is the error log I obtained
libcdr.a(pet.o): In function `.LANCHOR0':
pet.c:(.bss+0x0): multiple definition of `Var1'
build/obj/bet.o:bet.c:(.bss+0x0): first defined here
collect2.exe: error: ld returned 1 exit status
When I checked in google .LANCHOR0 is of type .word in the linker script. I am not able to make out why its called a function and what is its role in multiple definitions error indication ?

.LANCHOR0 isn't a real function, it's just how GCC groups things so it can reference multiple static locations from one reference point.
Constructing a 32-bit address in a register takes multiple instructions, or a PC-relative load of a pointer from a nearby literal pool. The compiler wants to avoid having the address of each individual static (or global) variable in literal pools near code; that would bloat things.
.LANCHOR0, .LANCHOR1, etc. are the names gcc uses for such pointers.
But the result of all this is that apparently variables with static storage look to the assembler like they're defined after a .LANCHOR0 "function".
There's nothing special / useful / interesting going on here as far as debugging your multiple-definitions bug. It's just a consequence of compiling for ARM.

Related

Gcc Force global variable to a given address using linker only

I'm trying to force a global variable to a specific address without modifying the source code.
I'm well aware of solution such as:
// C source code
MyStruct globalVariable __attribute__((section(".myLinkerSection")));
// Linker script
. = 0x400000;
.myLinkerSection:
{
*(.myLinkerSection)
}
But in my case I would like to do the same thing without the __attribute__((section(".myLinkerSection"))) keyword.
Is it doable ?
EDIT:
I cannot modify the source code at all.
The variable is defined as follow:
file.h:
extern MyStruct globalVariable;
file.c:
MyStruct globalVariable;
I assume from the mentions of __attribute__ that you are using gcc / clang or something compatible. You can use the -fdata-sections option to make the compiler put every variable into its own section. With that option, your globalVariable, assuming it would otherwise go in .bss, will be placed in a section called .bss.globalVariable (the exact name might be platform-dependent). Then you can use your linker script to place this section at the desired address.
Note that this option will inhibit certain compiler optimizations. There is a guarantee that objects defined in the same section within the same assembler module are assembled in strict order, and that their addresses do not change after that. In some cases the compiler can take advantage of this; e.g. if it defines int variables foo and bar consecutively in the same section, then it knows their addresses are consecutive, and it can safely generate code that "hardcodes" their relative position. For instance, on some platforms such as ARM64, it takes multiple instructions to materialize the address of a global or static object. So if some function accesses both foo and bar, the compiler can materialize the address of foo, then add the fixed constant 4 to get the address of bar. But if foo and bar are in different sections, this can't be done, and you will pay the (small but nonzero) cost of materializing both addresses separately.
As such, you may want to use -fdata-sections only on the particular source files that define the particular variables of concern.
This also illustrates why you have to get the variable in its own section in order to set its address; you can't move just one variable from a section, since the compiler may have been relying on its relative position to some other variable in that section.
You can define this variable in a separate translation unit. Then list its object file in the appropriate section.

Getting symbol names into C

I have successfully produced an assembler macro which I use to instantiate 32 independent routines in a assembly source file. The routines follow the target system ABI. Their exported symbol names are all practically identical, except for a trailing number suffix. Here is an symbol extract from the assembled object file (ellipsis indicating continuing pattern).
$ nm default_handler.o | sort
...
00000058 T exception_default_handler_5
0000005f T exception_default_handler_6
00000066 T exception_default_handler_7
0000006d T exception_default_handler_8
00000072 T exception_default_handler_9
00000079 T exception_default_handler_10
0000007e T exception_default_handler_11
00000083 T exception_default_handler_12
00000088 T exception_default_handler_13
...
I also have a C program in which I need to reference each of these individual routines. In some parts of the C program, I need to reference all of the assembly routines at once, to store a pointer of each in an array. Here is the necessary code needed to understand my problem (with ellipsis to indicate a continuing pattern). This code preforms the task stated above.
{
...
extern void exception_default_handler_5(void);
extern void exception_default_handler_6(void);
extern void exception_default_handler_7(void);
...
...
array[5] = exception_default_handler_5;
array[6] = exception_default_handler_6;
array[7] = exception_default_handler_7;
...
}
With 64 lines of this approach; the coding golden rule, to always write readable and maintainable code, have obviously been broken. What I would like is a way to automize this process of making an extern forward declaration and putting an instance of it in the array, to minimize the errors that are bound to happen when code is duplicated.
I am thinking that perhaps it's a job for the C-macros, but I cannot figure out a way to do it with them.
Any thoughts?

Generating correct .DEF files to export non-static functions AND GLOBALS

Following on from a question about detecting bad linkage to globals across dll boudaries, it turns out that I need to modify a .DEF file generator tool used by the PostgreSQL project so that it correctly emits DATA tags for .DEF entries for global variables.
Problem
I can't seem to find a way, using Microsoft's tools, to get a symbol table listing that differentiates between global variables and functions, and that includes globals that aren't initialized at their definition site.
Ideas?
Broken current approach
The tool loops over dumpbin /symbols output to generate the .DEF file. Unlike nm, which I'm used to, dumpbin /symbols does not appear to emit an entry for each symbol to indicate the symbol type - function, initialized variable, uninitialized variable. It only shows whether the symbol is locally defined or not.
With each dumpbin output line followed by the corresponding definition in the .c file, we have first an initialized global:
00B 00000000 SECT3 notype External | _DefaultXactIsoLevel
int DefaultXactIsoLevel = XACT_READ_COMMITTED;
vs a function with non-static linkage:
022 00000030 SECT5 notype () External | _IsAbortedTransactionBlockState
bool IsAbortedTransactionBlockState(void) {...}
... and for bonus fun, un-initialized globals appear to be shown as UNDEF, just like references to symbols from other compilation units, e.g:
007 00000004 UNDEF notype External | _XactIsoLevel
int XactIsoLevel;
even though this is pre-declared in the header during compilation (with project specific macro hand expanded for readability) as:
extern __declspec(dllexport) int XactIsoLevel;
So... it looks like dumpbin output doesn't contain enough information to generate a correct .DEF file.
Right now gendefs.pl is merrily spitting out a .DEF file that omits globals that aren't initialized, and declares everything else as code (by failing to specify CONSTANT or DATA in the .DEF). For something so broken, it's worked remarkably well.
Fixing it
To produce correct .DEF files, I need a way to determine which symbols are variables.
I looked at using cl.exe's /Fm option, but it's just a passthrough to the linker's /MAP option, and does nothing when you're just generating an object file, not linking it.
I could use a symbol dump tool that produces more useful information like gcc's nm.exe, but that adds extra tool dependencies and seems fragile.
At this point I am not able to simply annotate every exported function with PGDLLIMPORT (the __declspec(dllimport) / __declspec(dllexport) macro used by the project) and stop using a DEF file.
Even if I could I need to find an approach that will cause clear linker errors when PGDLLIMPORT is omitted on an exposed variable.
So. Windows linker/compiler experts. Any ideas?
Well, I must say I was wrong saying that microsoft tools doesn't use symbol type field at all.
1). cl doesn't use it to differentiate actual type info, but it stores information that you need:
0x20 means function
0x00 means not a function
PE/COFF specification, p. 46-47.
You may search for presence/abscence of () after symbol type (notype in your case) in dumpbin's output to find whether it is code or data.
2). Also, cl generates in obj files special section for linker which include export switch for every __declspec(dllexport) symbol in the form /export:symbol[,type].
3). And last, you can specify 'C++' external linkage and get symbols' types because of mangling.
I just add to the other post.
dumpbin has a specific option, namely /headers, that clearly points out the Type as code or data, together with a list of other attributes.

How is scope of variable implemented in compiler at machine level or memory level

How is scope of a variable is implemented by compilers?
I mean, when we say static variable, the scope is limited to the block or functions that defined in the same file where the static variable is defined?
How is this achieved in machine level or at memory level?
How actually is this restriction achieved?
How is this scoping resolved at program run time?
It is not achieved at all at the machine level. The compiler checks for scopes before machine code is actually generated. The rules of C are implemented by the compiler, not by the machine. The compiler must check those rules, the machine does not and cannot.
A very simplistic explanation of how the compiler checks this:
Whenever a scope is introduced, the compiler gives it a name and puts it in a structure (a tree) that makes it easy to determine the position of that scope in relation to other scopes, and it is marked as being the current scope. When a variable is declared, its assigned to the current scope. When accessing a variable, it is looked for in the current scope. If not found, the tree is looked up to find the scope above the current one. This continues until we reach the topmost scope. If the variable is still not found, then we have a scope violation.
inside compilers, its implementation defined. For example if I were writing a compiler, I would use a tree to define 'scope' and it would definitely be a symbol table inside a binary tree.
Some would use an arbitrary depth Hash table. Its all implementation defined.
I'm not 100% sure I understand what you are asking, but if you mean "how are static variables and functions stored in the final program", that is implementation-defined.
That said, a common way of storing such variables and functions is in the same place as any other global symbols (and some non-global ones) -- the difference is that these are not "exported", and thus not visible in any outside code trying to link to our software.
In other words, a program which has the following in it:
int var;
static int svar;
int func() { static int func_static; ... }
static int sfunc() { ... }
... might have the following layout in memory (let's say our data starts at 0xF000 and functions at 0xFF00):
0xF000: var
0xF004: svar
0xF008: func.func_static
...
0xFF00: func's data
0xFF40: sfunc's data /* assuming we needed 0x40 bytes for `func`! */
The list of exports, however, would only contain the non-static symbols, aka the exported ones:
var v 0xF000
func f 0xFF00
Again -- note how, while the static data is still written into the files (it has to be stored somewhere!), it is not exported; in layman's terms, our program does not tell anyone that it contains svar, sfunc and similar.
In Unices, you can list the symbols that a library or a program exports with the nm tool: http://unixhelp.ed.ac.uk/CGI/man-cgi?nm ; there do exist similar tools for Windows (GnuWin32 might have something similar).
In practice, executable code is often stored separately from the data (so that it can be protected from writes, for example), and it both may get reordered to minimize memory use and cache misses, but the idea remains the same.
Of course, optimizations can be applied -- for example, a static function could be inlined in its every invokation, meaning that no code is generated for the function itself at all, and thus it does not exist on its own anywhere.

How is the static function/variable protected

I want to know how a static variable or function is protected to be used only for the file it is defined in. I know that such variables and functions are declared in data section (heap area to be precise), but is it tagged with the file name ? Suppose I make a fool of the compiler by assigning such a static function (defined in foo.c) to a global function pointer, and call that function pointer in some other file (bar.c). Obviously my code wont give any compilation warning, but incidentally, it gives segmentation fault. Obviously, it is a protection fault, but I am interested in knowing how it is implemented inside the system.
Thanks. MS
The linker takes care of restricting the scope of mapping the function name to the function.
There is no protection for static functions called by function pointer - it's not that uncommon an idiom. For example, the recommended way of implementing GObject methods is to expose a pointer to a static function (see the virtual public methods section in this GObject how-to)
It is 'protected' simply by not having its symbol/location made known to the linker. So you cannot write code in another module that explicitly references the static object by its symbol name, because the linker has no such symbol. There is no run-time protection.
If you pass an address to a static object to some other module at runtime, then you will then be able to access it through such a pointer. That is not "making a fool of the compiler" (or linker in fact), such action may be entirely legitimate.
The fact that you got a seg-fault is probably for an entirely different reason (an invalid pointer for example). The compiler may choose to in-line the code in which case a pointer to it would not be be possible, but if you explicitly take the address of an object, the compiler should instantiate it, so this seems unlikely.
The purpose of static is not to 'protect' the variable/function but to protect the namespace and protect the rest of your program from having its behavior messed up by symbols with conflicting names. It also allows a good bit more optimization in that the compiler knows it doesn't have to facilitate access to the symbol name by outside modules.
you "may" get a problem if foo.c and bar.c are compiled into different dynamic loaded libraries.

Resources