I have an ARM project, where I would like to keep certain unused variables and their data, until the time they are used.
I have seen prevent gcc from removing an unused variable :
__attribute__((used)) did not work for me with a global variable (the documentation does imply it only works on functions) (arm-none-eabi gcc 7), but putting the symbol in a different section via __attribute__((section(".data"))) did work. This is presumably because the linker's is only able to strip symbols when they are given their own section via -fdata-sections. I do not like it, but it worked.
So, I tried this approach, but the variables were not kept - and I think this is because something in that project enables -Wl,--gc-sections during linking. Here is a minimal example showing what I've tried to do (basically the main file only refers to the header where the variables to be "kept" are declared as extern - and other than that, main program has does not use these variables; and then those same variables are defined in a separate .c file):
test.c
#include <stdio.h>
#include "test_opt.h"
const char greeting[] = "Hello World - am used";
int main(void) {
printf("%s!\n", greeting);
return 0;
}
test_opt.h
#include <stdint.h>
extern const char mystring[];
struct MyStruct {
uint16_t param_one;
uint8_t param_two;
unsigned char param_three[32];
};
typedef struct MyStruct MyStruct_t;
extern const MyStruct_t mystruct;
mystruct.c
#include "test_opt.h"
const char __attribute__((section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
const MyStruct_t __attribute__((section(".MYSTRUCT"))) mystruct = {
.param_one = 65535,
.param_two = 42,
.param_three = "myStructer here",
};
Test with usual MINGW64 gcc
Let's first try without -Wl,--gc-sections:
$ gcc -Wall -g mystruct.c test_opt.c -o test_opt.exe
$ strings ./test_opt.exe | grep -i 'mystring\|mystruct'
Me, mystring, I am not being used
*myStructer here
mystring
MyStruct
MyStruct_t
mystruct
mystruct.c
mystruct.c
mystruct.c
mystruct.c
mystring
mystruct
.MYSTRING
.MYSTRUCT
.MYSTRING
.MYSTRUCT
Clearly, variables and content are visible here.
Now let's try -Wl,--gc-sections:
$ gcc -Wall -g -Wl,--gc-sections mystruct.c test_opt.c -o test_opt.exe
$ strings ./test_opt.exe | grep -i 'mystring\|mystruct'
mystring
MyStruct
MyStruct_t
mystruct
mystruct.c
mystruct.c
mystruct.c
mystruct.c
mystring
mystruct
Apparently, here we still have some symbol debugging info left - but there are no sections, nor data being reported.
Test with ARM gcc
Let's re-do same experiment with ARM gcc - first without -Wl,--gc-sections:
$ arm-none-eabi-gcc -Wall -g test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
Me, mystring, I am not being used
*myStructer here
mystruct.c
MyStruct_t
MyStruct
mystruct
mystruct.c
mystring
mystruct.c
mystring
mystruct
.MYSTRING
.MYSTRUCT
Same as before, variables, content and section names are visible.
Now let's try with -Wl,--gc-sections:
$ arm-none-eabi-gcc -Wall -g -Wl,--gc-sections test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
Note that, unlike the previous case, here there is neither any data content left, nor any debugging info/symbol names!
So, my question is: assuming that -Wl,--gc-sections is enabled in the project, and I otherwise do not want to remove it (because I like the functionality otherwise), can I somehow specify in code for some special variables, "keep these variables even if the are unused/unreferenced", in such a way that they are kept even with -Wl,--gc-sections enabled?
Note that adding keep to attributes, say:
const char __attribute__((keep,section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
... and compiling with (or without) -Wl,--gc-sections typically results with compiler warning:
mystruct.c:3:1: warning: 'keep' attribute directive ignored [-Wattributes]
3 | const char __attribute__((keep,section(".MYSTRING"))) mystring[] = "Me, mystring, I am not being used";
| ^~~~~
... I guess, because the variables are already declared const if I read that arrow correctly (or maybe because a section is already assumed to be "kept")? So attribute keep is definitely not the answer here ...
To inform linker that some variable needs to be preserved you should use the -Wl,--undefined=XXX option:
gcc ... -Wl,--undefined=greeting
Note that __attribute__((used)) is a compiler-only flag to suppress -Wunused-variable warning.
OK - I found something; not ideal, but at least its just a "syntax hack", and I don't have to come up with stupid stuff to do with the structs just so they show up in the executable (and usually even the code I come up with in that case, gets optimized away :)).
I first tried the (void) varname; hack used for How can I suppress "unused parameter" warnings in C? - I left it below just to show it doesn't work.
What ended up working is: basically, just have a static const void* where the main() is, and assign a pointer to the struct to it (EDIT: in the main()!); I guess because of "static const", the compiler will not remove the variable and its section, even with -Wl,--gc-sections. So test_opt.c now becomes:
#include <stdio.h>
#include "test_opt.h"
const char greeting[] = "Hello World - am used";
static const void *fake; //, *fakeB;
int main(void) {
fake = &mystruct;
(void) &mystring; //fakeB = &mystring;
printf("%s!\n", greeting);
return 0;
}
... and we can test with:
$ arm-none-eabi-gcc -Wall -g -Wl,--gc-sections test_opt.c mystruct.c -o test_opt.elf -lc -lnosys
$ arm-none-eabi-readelf -a ./test_opt.elf | grep -i 'mystring\|mystruct'
[ 5] .MYSTRUCT PROGBITS 00013780 013780 000024 00 A 0 0 4
01 .init .text .fini .rodata .MYSTRUCT .ARM.exidx .eh_frame
5: 00013780 0 SECTION LOCAL DEFAULT 5 .MYSTRUCT
379: 00000000 0 FILE LOCAL DEFAULT ABS mystruct.c
535: 00013780 36 OBJECT GLOBAL DEFAULT 5 mystruct
$ arm-none-eabi-strings ./test_opt.elf | grep -i 'mystring\|mystruct'
*myStructer here
mystruct.c
MyStruct_t
mystruct
MyStruct
mystruct.c
mystring
mystruct.c
mystruct
.MYSTRUCT
Note that only mystruct in above example ended up being preserved - mystring still got optimized away.
EDIT: note that if you try to cheat and move the assignment outside of main:
static const void *fake = &mystruct, *fakeB = &mystring;
int main(void) {
...
... then the compiler will see through your shenanigans, and greet you with:
test_opt.c:6:39: warning: 'fakeB' defined but not used [-Wunused-variable]
6 | static const void *fake = &mystruct, *fakeB = &mystring;
| ^~~~~
test_opt.c:6:20: warning: 'fake' defined but not used [-Wunused-variable]
6 | static const void *fake = &mystruct, *fakeB = &mystring;
| ^~~~
... and you're none the better off still.
Related
I have a binary file (ELF) that I don't write, but I want to use 1 function from this binary (I know the address/offset of the function), that function not exported from the binary.
My goal is to call this function from my C code that I write and compile this function statically in my binary (I compile with gcc).
How can I do that please?
I am going to answer the
call to this function from my c code that I write
part.
The below works under certain assumptions, like dynamic linking and position independent code. I haven't thought for too long about what happens if they are broken (let's experiment/discuss, if there's interest).
$ cat lib.c
int data = 42;
static int foo () { return data; }
gcc -fpic -shared lib.c -o lib.so
$ nm lib.so | grep foo
00000000000010e9 t foo
The above reproduces having the address that you know. The address we know now is 0x10e9. It is the virtual address of foo before relocation. We'll model the relocation the dynamic loader does by hand by simply adding the base address at which lib.so gets loaded.
$ cat 1.c
#define _GNU_SOURCE
#include <stdio.h>
#include <link.h>
#include <string.h>
#include <elf.h>
#define FOO_VADDR 0x10e9
typedef int(*func_t)();
int callback(struct dl_phdr_info *info, size_t size, void *data)
{
if (!(strstr(info->dlpi_name, "lib.so")))
return 0;
Elf64_Addr addr = info->dlpi_addr + FOO_VADDR;
func_t f = (func_t)addr;
int res = f();
printf("res = %d\n", res);
return 0;
}
int main()
{
void *handle = dlopen("./lib.so", RTLD_LAZY);
if (!handle) {
puts("failed to load");
return 1;
}
dl_iterate_phdr(&callback, NULL);
dlclose(handle);
return 0;
}
And now...
$ gcc 1.c -ldl && ./a.out
res = 42
Voila -- it worked! That was fun.
Credit: this was helpful.
If you have questions, feel free to read the man and ask in the comments.
As for
compile this function statically in my binary
I don't know off the bat. This would be trickier. Why do you want that? Also, do you know whether the function depends on some data (or maybe it calls other functions) in the original ELF file, like in the example above?
I'm having trouble while writing my garbage collector in C. I give you a minimal and verifiable example for it.
The first file is in charge of dealing with the virtual machine
#include <stdlib.h>
#include <stdint.h>
typedef int32_t value_t;
typedef enum {
Lb, Lb1, Lb2, Lb3, Lb4, Lb5,
Ib, Ob
} reg_bank_t;
static value_t* memory_start;
static value_t* R[8];
value_t* engine_get_Lb(void) { return R[Lb]; }
value_t engine_run() {
memory_start = memory_get_start();
for (reg_bank_t pseudo_bank = Lb; pseudo_bank <= Lb5; ++pseudo_bank)
R[pseudo_bank] = memory_start + (pseudo_bank - Lb) * 32;
value_t* block = memory_allocate();
}
Then I have the actual garbage collector, the minimized code is:
#include <stdlib.h>
#include <stdint.h>
typedef int32_t value_t;
static value_t* memory_start = NULL;
void memory_setup(size_t total_byte_size) {
memory_start = calloc(total_byte_size, 1);
}
void* memory_get_start() { return memory_start; }
void mark(value_t* base){
value_t vbase = 0;
}
value_t* memory_allocate() {
mark(engine_get_Lb());
return engine_get_Lb();
}
Finally, minimal main is:
int main(int argc, char* argv[]) {
memory_setup(1000000);
engine_run();
return 0;
}
The problem I'm getting with gdb is that if I print engine_get_Lb() I get the address (value_t *) 0x7ffff490a800 while when printing base inside of the function mark I get the address (value_t *) 0xfffffffff490a800.
Any idea why this is happening?
Complementary files that may help
The makefile
SHELL=/bin/bash
SRCS=src/engine.c \
src/main.c \
src/memory_mark_n_sweep.c
CFLAGS_COMMON=-std=c11 -fwrapv
CLANG_SAN_FLAGS=-fsanitize=address
# Clang warning flags
CLANG_WARNING_FLAGS=-Weverything \
-Wno-format-nonliteral \
-Wno-c++98-compat \
-Wno-gnu-label-as-value
# Flags for debugging:
CFLAGS_DEBUG=${CFLAGS_COMMON} -g ${CLANG_SAN_FLAGS} ${CLANG_WARNING_FLAGS}
# Flags for maximum performance:
CFLAGS_RELEASE=${CFLAGS_COMMON} -O3 -DNDEBUG
CFLAGS=${CFLAGS_DEBUG}
all: vm
vm: ${SRCS}
mkdir -p bin
clang ${CFLAGS} ${LDFLAGS} ${SRCS} -o bin/vm
File with instructions .asm
5c190000 RALO(Lb,25)
value_t* memory_allocate() {
mark(engine_get_Lb());
return engine_get_Lb();
}
engine_get_Lb is not declared before use. It is assumed by the compiler to return int, per an antiquated and dangerous rule of the C language. It was deprecated in the C standard for quite some time, and now is finally removed.
Create a header file with declarations of all your global functions, and #include it in all your source files.
Your compiler should have at least warned you about this error at its default settings. If it did, you should have read and completely understood the warnings before continuing. If it didn't, consider an upgrade. If you cannot upgrade, permanently add -Wall -Wextra -Werror to your compilation flags. Consider also -Wpedantic and -std=c11.
It looks like GCC with -O2 and __attribute__((weak)) produces different results depending on how you reference your weak symbols. Consider this:
$ cat weak.c
#include <stdio.h>
extern const int weaksym1;
const int weaksym1 __attribute__(( weak )) = 0;
extern const int weaksym2;
const int weaksym2 __attribute__(( weak )) = 0;
extern int weaksym3;
int weaksym3 __attribute__(( weak )) = 0;
void testweak(void)
{
if ( weaksym1 == 0 )
{
printf( "0\n" );
}
else
{
printf( "1\n" );
}
printf( "%d\n", weaksym2 );
if ( weaksym3 == 0 )
{
printf( "0\n" );
}
else
{
printf( "1\n" );
}
}
$ cat test.c
extern const int weaksym1;
const int weaksym1 = 1;
extern const int weaksym2;
const int weaksym2 = 1;
extern int weaksym3;
int weaksym3 = 1;
extern void testweak(void);
void main(void)
{
testweak();
}
$ make
gcc -c weak.c
gcc -c test.c
gcc -o test test.o weak.o
$ ./test
1
1
1
$ make ADD_FLAGS="-O2"
gcc -O2 -c weak.c
gcc -O2 -c test.c
gcc -O2 -o test test.o weak.o
$ ./test
0
1
1
The question is, why the last "./test" produces "0 1 1", not "1 1 1"?
gcc version 5.4.0 (GCC)
Looks like when doing optimizations, the compiler is having trouble with symbols declared const and having the weak definition within the same compilation unit.
You can create a separate c file and move the const weak definitions there, it will work around the problem:
weak_def.c
const int weaksym1 __attribute__(( weak )) = 0;
const int weaksym2 __attribute__(( weak )) = 0;
Same issue described in this question: GCC weak attribute on constant variables
Summary:
Weak symbols only work correctly if you do not initialize them to a value. The linker takes care of the initialization (and it always initializes them to zero if no normal symbol of the same name exists).
If you try to initialize a weak symbol to any value, even to zero as OP did, the C compiler is free to make weird assumptions about its value. The compiler has no distinction between weak and normal symbols; it is all (dynamic) linker magic.
To fix, remove the initialization (= 0) from any symbol you declare weak:
extern const int weaksym1;
const int weaksym1 __attribute__((__weak__));
extern const int weaksym2;
const int weaksym2 __attribute__((__weak__));
extern int weaksym3;
int weaksym3 __attribute__((__weak__));
Detailed description:
The C language has no concept of a "weak symbol". It is a functionality provided by the ELF file format, and (dynamic) linkers that use the ELF file format.
As the man 1 nm man page describes at the "V" section,
When a weak defined symbol is linked with a normal defined symbol, the
normal defined symbol is used with no error. When a weak undefined symbol
is linked and the symbol is not defined, the value of the weak symbol
becomes zero with no error.
the weak symbol declaration should not be initialized to any value, because it will have value zero if the process is not linked with a normal symbol of the same name. ("defined" in the man 1 nm page refers to a symbol existing in the ELF symbol table.)
The "weak symbol" feature was designed to work with existing C compilers. Remember, the C compilers do not have any distinction between "weak" and "normal" symbols.
To ensure this would work without running afoul of the C compiler behaviour, the "weak" symbol must be uninitialized, so that the C compiler cannot make any assumptions as to its value. Instead, it will generate code that obtains the address of the symbol as usual -- and that's where the normal/weak symbol lookup magic happens.
This also means that weak symbols can only be "auto-initialized" to zero, and not any other value, unless "overridden" by a normal, initialized symbol of the same name.
I'm using GDB to return an address of a local static variable in my c code (pressureResult2 in this case), this is working fine for the ARM output: The .elf file.
However, if I use a build configuration for windows, creating a .exe, the variable I'm asking for can't be found.
What I'm using for returning the address of the variable:
sensorOffset::pressureResult2
Code:
static void sensorOffset (uint8_t axle)
{
static int16_t pressureResult2 = 400;
int16_t pressureResult = 0;
if (axle == AXLE_FRONT)
{
/* Always copy the actual value first to the value with offset */
CtisMach.front.tPressureSensorOffset = CtisMach.front.tPressureAct;
.... and so on
Is someone known with this issue? Is the command different for a windows executable? Or am I just doing something wrong?
To get the most obvious ones out:
Can you read an global static?
Yes, no problem
Does GDB notice anything about debug symbol?
No, the usual "Reading symbol from file.exe .. done" appears.
Does it work with .elf?
Yes, it does.
To answer the comments:
The code is compiled with the following:
cflags := \
-O0 \
-g3 \
-Wall \
-c \
-MD \
-fmessage-length=0 \
-fpermissive \
-I/mingw/include \
-I/usr/include \
-I/local/include \
-D WINDOWS \
$(CONFIGFLAGS) \
$(INCLUDES)
lnkflags := \
-Wl,--enable-stdcall-fixup \
-static-libgcc \
-static-libstdc++ \
$(CONFIGFLAGS) \
$(EXT_LIBDIR)
od_flags := \
--dwarf
Since I already mentioned it doesn't complain about debug variables symbols and I can read the global statics as well this doesn't appear to be the issue, or am I wrong? It should complain about not having debug symbols without -g right? Edit: Andreas reproduced this situation, but I still can't seem to fix it.
To do anything useful with the variable:
static int16_t pressureResult2 = 0;
if (pressureResult2 < 100)
{
pressureResult2++;
}
else
{
pressureResult2 = 0;
}
NOTE: This is just an example, same problem counts for all local statics in the code (that is too large to dump on SO).
GDB response on "Info variables", my variable "pressureResult2" is placed in the category Non-debugging symbols, might this be the issue?:
To see if the -g flag is actually doing something, without -g:
p& randomvar
$1 = (<data variable, no debug info> *) 0x4eade2 <randomvar>
with -g
p& randomvar
$1 = (uint16_t *) 0x4eade2 <randomvar>
So it's active for sure, but its still not possible to return local statics.
The only remarkable things so far is how the variable I'm looking for is categorized into Non-debugging symbols.
Compiling the code snipped of Andreas works including returning the address of the variable, my own code however, not much.
Most likely, you need to add the -g flag to the compiler invocation to add debugging information, and remove optimization flags like -O2. Given the following .c source file, using a cygwin environment on MS Windows:
#include <stdio.h>
static int globalstatic = 512;
static void sensorOffset (uint8_t axle) {
static int16_t pressureResult2 = 400;
pressureResult2++;
printf("%d %d\n", globalstatic, pressureResult2);
}
int main() {
int i = 0;
for (i = 0; i < 10; i++) {
sensorOffset(42);
}
return 0;
}
When compiled to an .exe file with -O2, your observation is reproduceable - gdb recognizes the global static variable, but not the local one (even though -g was specified):
C:> gcc -g -O2 -Wall -pedantic -o static static.c
C:> gdb static.exe
(gdb) break main
(gdb) run
Breakpoint 1, main () at static.c:14
14 int main() { Breakpoint 1, 0x0000000100401128 in main ()
(gdb) print globalstatic
$1 = 512
(gdb) print sensorOffset::pressureResult2
No symbol "sensorOffset" in current context.
When removing the -O2 flag, gdb does recognize the local static variable:
C:> gcc -g -Wall -pedantic -o static static.c
C:> gdb static.exe
(gdb) break main
(gdb) run
Breakpoint 1, main () at static.c:12
12 int i = 0;
(gdb) print sensorOffset::pressureResult2
$1 = 400
I have a linux C program that handles request sent to a TCP socket (bound to a particular port). I want to be able to query the internal state of the C program via a request to that port, but I dont want to hard code what global variables can be queried. Thus I want the query to contain the string name of a global and the C code to look that string up in the symbol table to find its address and then send its value back over the TCP socket. Of course the symbol table must not have been stripped. So can the C program even locate its own symbol table, and is there a library interface for looking up symbols given their name? This is an ELF executable C program built with gcc.
This is actually fairly easy. You use dlopen / dlsym to access symbols. In order for this to work, the symbols have to be present in the dynamic symbol table. There are multiple symbol tables!
#include <dlfcn.h>
#include <stdio.h>
__attribute__((visibility("default")))
const char A[] = "Value of A";
__attribute__((visibility("hidden")))
const char B[] = "Value of B";
const char C[] = "Value of C";
int main(int argc, char *argv[])
{
void *hdl;
const char *ptr;
int i;
hdl = dlopen(NULL, 0);
for (i = 1; i < argc; ++i) {
ptr = dlsym(hdl, argv[i]);
printf("%s = %s\n", argv[i], ptr);
}
return 0;
}
In order to add all symbols to the dynamic symbol table, use -Wl,--export-dynamic. If you want to remove most symbols from the symbol table (recommended), set -fvisibility=hidden and then explicitly add the symbols you want with __attribute__((visibility("default"))) or one of the other methods.
~ $ gcc dlopentest.c -Wall -Wextra -ldl
~ $ ./a.out A B C
A = (null)
B = (null)
C = (null)
~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic
~ $ ./a.out A B C
A = Value of A
B = (null)
C = Value of C
~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic -fvisibility=hidden
~ $ ./a.out A B C
A = Value of A
B = (null)
C = (null)
Safety
Notice that there is a lot of room for bad behavior.
$ ./a.out printf
printf = ▯▯▯▯ (garbage)
If you want this to be safe, you should create a whitelist of permissible symbols.
file: reflect.c
#include <stdio.h>
#include "reflect.h"
struct sym_table_t gbl_sym_table[1] __attribute__((weak)) = {{NULL, NULL}};
void * reflect_query_symbol(const char *name)
{
struct sym_table_t *p = &gbl_sym_table[0];
for(; p->name; p++) {
if(strcmp(p->name, name) == 0) {
return p->addr;
}
}
return NULL;
}
file: reflect.h
#include <stdio.h>
struct sym_table_t {
char *name;
void *addr;
};
void * reflect_query_symbol(const char *name);
file: main.c
just #include "reflect.h" and call reflect_query_symbol
example:
#include <stdio.h>
#include "reflect.h"
void foo(void)
{
printf("bar test\n");
}
int uninited_data;
int inited_data = 3;
int main(int argc, char *argv[])
{
int i;
void *addr;
for(i=1; i<argc; i++) {
addr = reflect_query_symbol(argv[i]);
if(addr) {
printf("%s lay at: %p\n", argv[i], addr);
} else {
printf("%s NOT found\n", argv[i], addr);
}
}
return 0;
}
file:Makefile
objs = main.o reflect.o
main: $(objs)
gcc -o $# $^
nm $# | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $# $^ .reflect.real.o
nm $# | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $# $^ .reflect.real.o
The general term for this sort of feature is "reflection", and it is not part of C.
If this is for debugging purposes, and you want to be able to inspect the entire state of a C program remotely, examine any variable, start and stop its execution, and so on, you might consider GDB remote debugging:
GDB offers a 'remote' mode often used when debugging embedded systems.
Remote operation is when GDB runs on one machine and the program being
debugged runs on another. GDB can communicate to the remote 'stub'
which understands GDB protocol via Serial or TCP/IP. A stub program
can be created by linking to the appropriate stub files provided with
GDB, which implement the target side of the communication
protocol. Alternatively, gdbserver can be used to remotely debug
the program without needing to change it in any way.