C compiling: relocation truncated to fit R_X86_64_PC32 against symbol - c

/* my program
author/date: me/now
*/
# include <stdio.h>
# define XX 1000
# define YY 20000 /* value of 1000 is ok */
# define ZZ 6000
/* global variable declaration */
int some_variable_this;
int some_variable_that;
double data[XX][YY][ZZ];
static void some_procedure_this ( void )
{
}
static void some_procedure_that ( void )
{
}
int main ( int argc, char *argv[] )
{
}
writing a quick C program to reformat some data.
when compiling via gcc myprogram.c if I make the global data array too large I get the compiler error:
relocation truncated to fit R_X86_64_PC32 against symbol 'some_variable_this'
relocation truncated to fit R_X86_64_PC32 against symbol 'some_variable_that'
My goal is to do a quick c code to reformat some data.
What does this R_X86_64_PC32 mean?
Is there a compiler flag I can used to get around this?
Is there a better way to code this, in C, while still maintaining quickness of writing the code and simplicity for human readability?
this on gcc 4.3.4 in linux if it matters.

What does this R_X86_64_PC32 mean?
It is an ELF relocation type used in ELF for x86_64. This particular type expresses that the location of the referenced data is computed based on a 32-bit offset from the an address related to the program counter. I interpret the diagnostics to indicate that the needed offsets are too large to fit in the provided 32 bits, which means that the compiler generated code that, in practice, the linker wasn't able to link correctly.
Is there a compiler flag I can used to get around this?
Maybe. The basic problem is that you've defined an object that is larger than the authors of your compiler imagined (at the time) that anyone would be able to accommodate. You're going beyond its design limits.
There may be options that would mitigate this effect. In particular, you could try twiddling the -fPIC and / or -fPIE options, and there may be others.
Is there a better way to code this, in C, while still maintaining quickness of writing the code and simplicity for human readability?
It's not a C problem per se, it's an implementation problem. Yet GCC is not wrong or faulty here as far as the language standard is concerned -- the standard does not obligate implementations to accept every possible program that is technically valid.
With that said, you could also try moving the declarations of some_variable_this and some_variable_that after the declaration of data. Conceivably, it might also help to declare those variables static, or to move them (or data) to a separate shared library.

Related

Why doesn't linking a function of the wrong type return garbage?

I'm going through the K&R C book and chapter 4.2 says the following:
If atof itself and the call to it in main have inconsistent types in the same source file, the error will be detected by the compiler. But if (as is more likely) atof were compiled separately, the mismatch would not be detected, atof would return a double that main would treat as an int, and meaningless answers would result.
I wanted to see for myself what would happen if I created and inconsistently used a function as they describe:
main.c:
#include <stdio.h>
int floatFunction();
int main() {
int result = floatFunction();
printf("result: %d\n", result);
}
floatFunction.c:
float floatFunction() {
return 10;
}
bash:
$ gcc main.c floatFunction.c -o main
$ ./main
result: 0
$
Regardless of what value I had floatFunction return, I always got 0 as a result. I also tried compiling with -o3, same result. I was expecting to see the bit pattern of 10.0 interpreted as an int as a result. Why is this not happening?
Also, why doesn't the compilation process notify me that the types don't match and how can I protect myself from making a mistake like this in real code?
There are different registers used for passing (and returning) floating-point vs. integral values (pointers and integers) on x86 at least.
Why you consistently had integral zero in the return-register for integral values, while the return-register for floating-point values was used for the true return-value, I don't know. Just luck probably.
See for example What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64 for more about Linux calling conventions.
Anyway, remember that whatever you observe, at least on the language level it is simply undefined behavior, anything goes.
I was expecting to see the bit pattern of 10.0 interpreted as an int as a result. Why is this not happening?
The most likely reason is that on your platform (which you didn't specify) floating point values are returned in a different register from the one in which integer values are returned.
For example, on x86_64 the integer result is returned in the rax register, while double and float result is returned in the xmm0 register.
Here is an article on various x86 calling conventions.
If you are not on x86_64, you'll need to look up the calling conventions for your platform.
Also, why doesn't the compilation process notify me that the types don't match
The compilation process only sees one source file (compilation unit) at at time, so it can't warn you.
The linker could, but only the AIX linker actually does (of the ones I know).
how can I protect myself from making a mistake like this in real code?
By declaring your functions in a header file and including that header in both compilation units.

Making a function that defaults to aliasing an externally defined symbol in gcc/ld

I have a header-only library that's currently calling malloc and free
This header is included in a lot of different static libraries, which are used to build differently configured programs.
I would like to be able to replace those calls with calls into another allocator, at link time -- based on whether that allocator library is included in the link step, without affecting other calls to malloc and free.
My idea is to have the library call customizable_malloc and customizable_free and have those symbols resolve to malloc and free "by default" -- then the allocator library can provide alternate definitions for customizable_malloc and customizable_free
However, I messed around with weak/alias/weakref attributes and I can't seem to get anything to work. Is there a way to do this?
Note: I know I can create an extra layer of indirection: customizable_malloc could be a weak alias to a function that calls malloc. But that adds a level of indirection that seems unnecessary.
Ideally, here's the steps I want the linker to take when it comes across a call to customizable_malloc:
Check if a definition for customizable_malloc exists
If it does, call it
If it does not, behave as if the call was to regular malloc.
Clarifying note: In a single-target scenario, this could be done with #define. The library could create macros customizable_malloc and customizable_free that default to malloc and free. However, this doesn't work in this case since things are being built into static libraries without knowledge of whether there's an override.
The extra level of indirection is the only way to do it. ELF (and other real-world binary format) symbol definition syntax (including for weak symbols) does not provide any way to provide a definition in terms of a reference to an external definition from somewhere else.
Just do the wrapper approach you're considering. It's simple, clean, and relative to the cost of malloc/free it's not going to make any big difference in performance.
You can achieve desired outcome using GNU-ld --defsym option.
Example:
#include <malloc.h>
#include <stdio.h>
void *custom_malloc(size_t sz);
int main()
{
void *p = custom_malloc(1);
void *q = malloc(42); // important: malloc needs to be referenced somewhere
printf("p = %p, q = %p\n", p, q);
return 0;
}
Compiling this with gcc -c t.c will (naturally) fail to link with unresolved reference to custom_malloc (if the library providing custom_malloc is not used):
$ gcc t.o
/usr/bin/ld: t.o: in function `main':
t.c:(.text+0xe): undefined reference to `custom_malloc'
collect2: error: ld returned 1 exit status
Adding --defsym=custom_malloc=malloc solves this:
$ gcc t.o -Wl,--defsym=custom_malloc=malloc && ./a.out
p = 0x558ca4dc22a0, q = 0x558ca4dc22c0
P.S. If malloc is not linked into the program (i.e. if I comment out the // important line), then --defsym fails:
$ gcc t.c -Wl,--defsym=custom_malloc=malloc && ./a.ou
/usr/bin/ld:--defsym:1: unresolvable symbol `malloc' referenced in expression
...
But that is (I believe) not very relevant to your scenario.
P.P.S. As R correctly stated, the "extra level of indirection" could be a single unconditional JMP malloc instruction, and the overhead of such indirection is unlikely to be measurable.

Getting symbol names into C

I have successfully produced an assembler macro which I use to instantiate 32 independent routines in a assembly source file. The routines follow the target system ABI. Their exported symbol names are all practically identical, except for a trailing number suffix. Here is an symbol extract from the assembled object file (ellipsis indicating continuing pattern).
$ nm default_handler.o | sort
...
00000058 T exception_default_handler_5
0000005f T exception_default_handler_6
00000066 T exception_default_handler_7
0000006d T exception_default_handler_8
00000072 T exception_default_handler_9
00000079 T exception_default_handler_10
0000007e T exception_default_handler_11
00000083 T exception_default_handler_12
00000088 T exception_default_handler_13
...
I also have a C program in which I need to reference each of these individual routines. In some parts of the C program, I need to reference all of the assembly routines at once, to store a pointer of each in an array. Here is the necessary code needed to understand my problem (with ellipsis to indicate a continuing pattern). This code preforms the task stated above.
{
...
extern void exception_default_handler_5(void);
extern void exception_default_handler_6(void);
extern void exception_default_handler_7(void);
...
...
array[5] = exception_default_handler_5;
array[6] = exception_default_handler_6;
array[7] = exception_default_handler_7;
...
}
With 64 lines of this approach; the coding golden rule, to always write readable and maintainable code, have obviously been broken. What I would like is a way to automize this process of making an extern forward declaration and putting an instance of it in the array, to minimize the errors that are bound to happen when code is duplicated.
I am thinking that perhaps it's a job for the C-macros, but I cannot figure out a way to do it with them.
Any thoughts?

Is there a safe way to specify the value of an object may be uninitialized because it is never used?

Disclaimer: The following is a purely academic question; I keep this code at least 100 m away from any production system. The problem posed here is something that cannot be measured in any “real life” case.
Consider the following code (godbolt link):
#include <stdlib.h>
typedef int (*func_t)(int *ptr); // functions must conform to this interface
extern int uses_the_ptr(int *ptr);
extern int doesnt_use_the_ptr(int *ptr);
int foo() {
// actual selection is complex, there are multiple functions,
// but I know `func` will point to a function that doesn't use the argument
func_t func = doesnt_use_the_ptr;
int *unused_ptr_arg = NULL; // I pay a zeroing (e.g. `xor reg reg`) in every compiler
int *unused_ptr_arg; // UB, gcc zeroes (thanks for saving me from myself, gcc), clang doesn't
int *unused_ptr_arg __attribute__((__unused__)); // Neither zeroing, nor UB, this is what I want
return (*func)(unused_ptr_arg);
}
The compiler has no reasonable way to know that unused_ptr_arg is unneeded (and so the zeroing is wasted time), but I do, so I want to inform the compiler that unused_ptr_arg may have any value, such as whatever happens to be in the register that would be used for passing it to func.
Is there a way to do this? I know I’m way outside the standard, so I’ll be fine with compiler-specific extensions (especially for gcc & clang).
Using GCC/Clang `asm` Construct
In GCC and Clang, and other compilers that support GCC’s extended assembly syntax, you can do this:
int *unused_ptr_arg;
__asm__("" : "=x" (unused_ptr_arg));
return (*func)(unused_ptr_arg);
That __asm__ construct says “Here is some assembly code to insert into the program at this point. It writes a result to unused_ptr_arg in whatever location you choose for it.” (The x constraint means the compiler may choose memory, a processor register, or anything else the machine supports.) But the actual assembly code is empty (""). So no assembly code is generated, but the compiler believes that unused_ptr_arg has been initialized. In Clang 6.0.0 and GCC 7.3 (latest versions currently at Compiler Explorer) for x86-64, this generates a jmp with no xor.
Using Standard C
Consider this:
int *unused_ptr_arg;
(void) &unused_ptr_arg;
return (*func)(unused_ptr_arg);
The purpose of (void) &unused_ptr_arg; is to take the address of unused_ptr_arg, even though the address is not used. This disables the rule in C 2011 [N1570] 6.3.2.1 2 that says behavior is undefined if a program uses the value of an uninitialized object of automatic storage duration that could have been declared with register. Because its address is taken, it could not have been declared with register, and therefore using the value is no longer undefined behavior according to this rule.
In consequence, the object has an indeterminate value. Then there is an issue of whether pointers may have a trap representation. If pointers do not have trap representations in the C implementation being used, then no trap will occur due to merely referring to the value, as when passing it as an argument.
The result with Clang 6.0.0 at Compiler Explorer is a jmp instruction with no setting of the parameter register, even if -Wall -Werror is added to the compiler options. In contrast, if the (void) line is removed, a compiler error results.
int *unused_ptr_arg = NULL;
This is what you should be doing. You don't pay for anything. Zeroing an int is a no-op. Ok technically it's not, but practically it is. You will never ever ever see the time of this operation in your program. And I don't mean that it's so small that you won't notice it. I mean that it's so small that so many other factors and operations that are order of magnitude longer will "swallow" it.
This is not actually possible across all architectures for a very good reason.
A call to a function may need to spill its arguments to the stack, and in IA64, spilling uninitialized registers to the stack can crash because the previous contents of the register was a speculative load that loaded an address that wasn't mapped.
To prevent the possibility of zero-ing with each run of int foo(), simply make unused_ptr_arg static.
int foo() {
func_t func = doesnt_use_the_ptr;
static int *unused_ptr_arg;
return (*func)(unused_ptr_arg);
}

C program to get variable name as input and print value

I have a program test.c
int global_var=10;
printf("Done");
i did
gcc -g test.c -o test
My query is
Is there a way i can get the variable name as argument (say "global_var") and print the value.
Thanks
No, C doesn't have introspection. Once the compiler has generated code, the program can not look up variable names.
The way these things are usually solved is by having a collection of all special variables that needs to be looked up by name, containing both the actual name as a string and the variable it self.
Usually it's an array of structures, something like
struct
{
const char *name;
int value;
} variables[] = {
{ "global_var", 10 }
};
The program can then look through the array variables to search for "global_var" and use (or change) the value in the structure.
General answer: No. There is no connection between a variable name and its string representation (you can get the string representation of a variable name at compile time with the preprocessor, though).
For identifiers with external linkage, there are (platform-dependent) ways: See e.g. dlsym for POSIX systems.
You can compile with debugging information and access (most) variables by names from input. Unless you really write something like a debugger, this would be a horrible design, however (and even then, you don’t access the variables used in the debugger itself but of the programme being debugged).
Finally, you could implement your own lookup table mapping from string representations to values.
No.
We only have variable names so humans don't get confused .
After your program gets turned into assembly and eventually machine code, the computer doesn't care what you name your variables.
Alternatively you could use a structure in which you would store the value and the name as a string:
struct tag_name {
char *member1;
int member2;
};
In general, it is not possible to access at runtime global variables by name. Sometimes, it might depend upon the operating system, and how the compiler is invoked. I still assume you want to dereference a global variable, and you know its type.
Then on Linux and some other systems, you could use dlopen(3) with a NULL path (to get a handle for the executable), then use dlsym on the global variable name to get its address; you can then cast that void* pointer to a pointer of the appropriate type and dereference it. Notice that you need to know the type (or at least have a convention to encode the type of the variable in its name; C++ is doing that with name mangling). If you compiled and linked with debug information (i.e. with gcc -g) the type information is in its DWARF sections of your ELF executable, so there is some way to get it.
This works if you link your executable using -rdynamic and with -ldl
Another possibility might be to customize your recent GCC with your own MELT extension which would remember and later re-use some of the compiler internal representations (i.e. the GCC Tree-s related to global variables). Use MELT register_finish_decl_first function to register a handler on declarations. But this will require some work (in coding your MELT extension).
using preprocessor tricks
You could use (portable) preprocessor tricks to achieve your goals (accessing variable by name at runtime).
The simplest way might be to define and follow your own conventions. For example you could have your own globvar.def header file containing just lines like
/* file globvar.def */
MY_GLOBAL_VARIABLE(globalint,int)
MY_GLOBAL_VARIABLE(globalint2,int)
MY_GLOBAL_VARIABLE(globalstr,char*)
#undef MY_GLOBAL_VARIABLE
And you adopt the convention that all global variables are in the above globvar.def file. Then you would #include "globvar.def" several times. For instance, in your global header, expand MY_GLOBAL_VARIABLE to some extern declaration:
/* in yourheader.h */
#define MY_GLOBAL_VARIABLE(Nam,Typ) extern Typ Nam;
#include "globvar.def"
In your main.c you'll need a similar trick to declare your globals.
Elsewhere you might define a function to get integer variables by name:
/* return the address of global int variable or else NULL */
int* global_int_var_by_name (const char*name) {
#define MY_GLOBAL_VARIABLE(Nam,Typ) \
if (!strcmp(#Typ,"int") && !strcmp(name,#Nam)) return (int*)&Nam;
#include "globvar.def"
return NULL;
}
etc etc... I'm using stringification of macro arguments.
Such preprocessor tricks are purely standard C and would work with any C99 compliant compiler.

Resources