I have a linux C program that handles request sent to a TCP socket (bound to a particular port). I want to be able to query the internal state of the C program via a request to that port, but I dont want to hard code what global variables can be queried. Thus I want the query to contain the string name of a global and the C code to look that string up in the symbol table to find its address and then send its value back over the TCP socket. Of course the symbol table must not have been stripped. So can the C program even locate its own symbol table, and is there a library interface for looking up symbols given their name? This is an ELF executable C program built with gcc.
This is actually fairly easy. You use dlopen / dlsym to access symbols. In order for this to work, the symbols have to be present in the dynamic symbol table. There are multiple symbol tables!
#include <dlfcn.h>
#include <stdio.h>
__attribute__((visibility("default")))
const char A[] = "Value of A";
__attribute__((visibility("hidden")))
const char B[] = "Value of B";
const char C[] = "Value of C";
int main(int argc, char *argv[])
{
void *hdl;
const char *ptr;
int i;
hdl = dlopen(NULL, 0);
for (i = 1; i < argc; ++i) {
ptr = dlsym(hdl, argv[i]);
printf("%s = %s\n", argv[i], ptr);
}
return 0;
}
In order to add all symbols to the dynamic symbol table, use -Wl,--export-dynamic. If you want to remove most symbols from the symbol table (recommended), set -fvisibility=hidden and then explicitly add the symbols you want with __attribute__((visibility("default"))) or one of the other methods.
~ $ gcc dlopentest.c -Wall -Wextra -ldl
~ $ ./a.out A B C
A = (null)
B = (null)
C = (null)
~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic
~ $ ./a.out A B C
A = Value of A
B = (null)
C = Value of C
~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic -fvisibility=hidden
~ $ ./a.out A B C
A = Value of A
B = (null)
C = (null)
Safety
Notice that there is a lot of room for bad behavior.
$ ./a.out printf
printf = ▯▯▯▯ (garbage)
If you want this to be safe, you should create a whitelist of permissible symbols.
file: reflect.c
#include <stdio.h>
#include "reflect.h"
struct sym_table_t gbl_sym_table[1] __attribute__((weak)) = {{NULL, NULL}};
void * reflect_query_symbol(const char *name)
{
struct sym_table_t *p = &gbl_sym_table[0];
for(; p->name; p++) {
if(strcmp(p->name, name) == 0) {
return p->addr;
}
}
return NULL;
}
file: reflect.h
#include <stdio.h>
struct sym_table_t {
char *name;
void *addr;
};
void * reflect_query_symbol(const char *name);
file: main.c
just #include "reflect.h" and call reflect_query_symbol
example:
#include <stdio.h>
#include "reflect.h"
void foo(void)
{
printf("bar test\n");
}
int uninited_data;
int inited_data = 3;
int main(int argc, char *argv[])
{
int i;
void *addr;
for(i=1; i<argc; i++) {
addr = reflect_query_symbol(argv[i]);
if(addr) {
printf("%s lay at: %p\n", argv[i], addr);
} else {
printf("%s NOT found\n", argv[i], addr);
}
}
return 0;
}
file:Makefile
objs = main.o reflect.o
main: $(objs)
gcc -o $# $^
nm $# | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $# $^ .reflect.real.o
nm $# | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $# $^ .reflect.real.o
The general term for this sort of feature is "reflection", and it is not part of C.
If this is for debugging purposes, and you want to be able to inspect the entire state of a C program remotely, examine any variable, start and stop its execution, and so on, you might consider GDB remote debugging:
GDB offers a 'remote' mode often used when debugging embedded systems.
Remote operation is when GDB runs on one machine and the program being
debugged runs on another. GDB can communicate to the remote 'stub'
which understands GDB protocol via Serial or TCP/IP. A stub program
can be created by linking to the appropriate stub files provided with
GDB, which implement the target side of the communication
protocol. Alternatively, gdbserver can be used to remotely debug
the program without needing to change it in any way.
Related
For example, if I have the following function
void printText(char text [100]){
printf("%d", text);
}
could I then do this in the command line
printText(Hello World)
and then get my expected output as
Hello World
It depends on your shell. Some shells do support functions. In bash, the POSIX shell, and probably others, the following is the correct syntax:
printText() {
printf '%s\n' "$1"
}
printText 'Hello World'
If you meant your question literally, then no, it's not possible to call a function without even mentioning the file in which it's located. The language used to write the function is irrelevant.
But it is possible to compile a C function and have it called from the shell somehow? Yes. If you created a shared library (shared object on unixy systems or DLL on Windows) from the function, you could. It would require a tool to do so, but such a tool could exit. (Windows also supports COM objects and a number of derived techs. Some of these might even make the task easier.)
(I can't tell if such tools actually do exist or what they are because software recommendations are off-topic on StackOverflow. I will say that such a tool could be built around a library such as libffi.)
One solution would be to rely on dlopen()/LoadLibrary() and
dlsym()/GetProcAddress() but you cannot ensure the function
prototype conforms to your expectation.
A more robust solution consists in providing a lookup table filled
with functions that you know are compliant with the intended usage.
/**
gcc -std=c99 -o prog_c prog_c.c \
-pedantic -Wall -Wextra -Wconversion \
-Wc++-compat -Wwrite-strings -Wold-style-definition -Wvla \
-g -O0 -UNDEBUG -fsanitize=address,undefined
$ ./prog_c printText "Hello world"
printText --> <Hello world>
$ ./prog_c textLen "Hello world"
textLen --> 11
$ ./prog_c what "Hello world"
cannot find function 'what'
**/
#include <stdio.h>
#include <stdbool.h>
#include <string.h>
void
printText(const char *text)
{
printf("printText --> <%s>\n", text);
}
void
textLen(const char *text)
{
printf("textLen --> %d\n", (int)strlen(text));
}
typedef struct
{
const char *name;
void (*fnct)(const char *);
} TextFunction;
bool // success
call_text_function(const char *name,
const char *arg)
{
static TextFunction table[]={ {"printText", printText},
{"textLen", textLen},
{NULL, NULL} };
for(int i=0; table[i].name!=NULL; ++i)
{
if(strcmp(table[i].name, name)==0)
{
table[i].fnct(arg);
return true;
}
}
return false;
}
int
main(int argc,
char **argv)
{
if(argc!=3)
{
fprintf(stderr, "usage: %s function arg\n", argv[0]);
return 1;
}
if(!call_text_function(argv[1], argv[2]))
{
fprintf(stderr, "cannot find function '%s'\n", argv[1]);
return 1;
}
return 0;
}
Consider the following code:
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("main\n");
int a;
scanf("%d", &a);
printf("a = %d\n", a);
return 0;
}
int main1() {
printf("main1\n");
int a;
scanf("%d", &a);
printf("a = %d\n", a);
exit(0);
return 0;
}
int main2() {
printf("main2\n");
int a = getchar() - '0';
int b = getchar() - '0';
int c = getchar() - '0';
printf("a = %d\n", 100 * a + 10 * b + c);
exit(0);
return 0;
}
Assuming that the code resides in a file called test.c, the following works fine (it prints "a = 123"):
gcc -o test test.c
echo 123 | ./test
If, however, I run the program with a custom entry point, I get the dreaded Segmentation fault:
gcc -o test test.c -e"main1"
echo 123 | ./test
But if I replace the scanf with three getchars, the program runs fine again despite being run with a custom entry point:
gcc -o test test.c -e"main2"
echo 123 | ./test
To make things even more interesting, these problems occur with gcc 7.4.0 but not with gcc 4.8.4.
Any ideas?
The -e command line flag redefines the actual entry point of your program, not the “user” entry point. By default, using GCC with the GNU C standard library (glibc) this entry point is called _start, and it performs further setup before invoking the user-provided main function.
If you want to replace this entry point and continue using glibc you’ll need to perform further setup yourself. But alternatively you can use the following method to replace the main entry point, which is much simpler:
gcc -c test.c
objcopy --redefine-sym main1=main test.o
gcc -o test test.o
Note, this will only work if you don’t define main in your code, otherwise you’ll get a “multiple definition of `main'” error from the linker.
I am currently writing a shared library that takes a UNIX username and returns a string with all of the groups that user belongs to in [group1, group2, group3...] format.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <utmp.h>
#include <sys/types.h>
#include <grp.h>
#include <pwd.h>
int num_groups = 0;
struct passwd *pwd;
gid_t *groups;
struct group *grp;
FILE *stream;
char *buff;
size_t length;
char *printGroups(char *arg)
{
stream = open_memstream(&buff, &length);
pwd = getpwnam(arg);
getgrouplist(arg, pwd->pw_gid, groups, &num_groups);
groups = malloc(num_groups * sizeof(gid_t));
if (groups == NULL){
perror("malloc");
exit(EXIT_FAILURE);
}
getgrouplist(arg, pwd->pw_gid, groups, &num_groups);
fprintf(stream, " [");
for (int i = 0; i < num_groups; ++i){
grp = getgrgid(groups[i]);
if (i == num_groups - 1)
fprintf(stream, "%s", grp->gr_name);
else
fprintf(stream, "%s ", grp->gr_name);
}
free(groups);
fprintf(stream, "]");
fclose(stream);
return buff;
}
This is main function in my shared library that returns the string. I verified that the function is indeed correct - the same logic works in a standalone program using printf instead of open_memstream stringstream.
The library however segfaults and I can't pinpoint why. Valgrind does not output anything useful:
gcc -shared -fpic -g -Wall lib.c
valgrind ./a.out
==9916== Process terminating with default action of signal 11 (SIGSEGV)
==9916== Access not within mapped region at address 0x0
==9916== at 0x1: ???
==9916== by 0xFFF000672: ???
Same goes for gdb backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? () (gdb) backtrace
#0 0x0000000000000001 in ?? ()
#1 0x00007fffffffe6e9 in ?? ()
#2 0x0000000000000000 in ?? ()
I am out of ideas. Could somebody point me to a solution, ethier an error in the .so source or the reason why both Valgrind and gdb print ??? despite using the -g flag when compiling?
It looks like you're attempting to run the shared library directly. That's not how shared libraries work. They're referenced by other programs that use them.
For example, this code would use your library:
#include <stdio.h>
#include <stdlib.h>
char *printGroups(char *);
int main()
{
char *groups = printGroups("root");
printf("groups: %s\n", groups);
free(groups);
return 0;
}
If you first compile your library like this:
gcc -shared -fpic -g -Wall lib.c -o libmylib.so
Then, assuming this library lives in the same directory as the the above test code, you compile the test code like this:
gcc -g -Wall -Wextra -L. -o mytest mytest.c -lmylib
Then set an environment variable to find your library:
export LD_LIBRARY_PATH=.
You can then run the test program which will use your library.
When I use a shared library via dlopen, can the library code "see" memory of my process that calls dlopen? For example, I would like to pass a pointer to memory allocated by my application to the library API.
I'm on Linux/x86 if it is important.
The answer is yes, it can. Here is a simple quick example for illustration purposes.
The library code (in file myso.c):
void setInt( int * i )
{
*i = 12345;
}
The library can be built as follows:
gcc -c -fPIC myso.c
gcc -shared -Wl,-soname,libmy.so -o libmy.so myso.o -lc
Here is the client code (main.c):
#include <stdio.h>
#include <dlfcn.h>
typedef void (*setint_t)( int * );
int main()
{
void * h = dlopen("./libmy.so", RTLD_NOW);
if (h)
{
puts("Loaded library.");
setint_t setInt = dlsym( h, "setInt" );
if (setInt) {
puts("Symbol found");
int k;
setInt(&k);
printf("The int is %d\n", k);
}
}
return 0;
}
Now build and run the code. Make sure main.c and the library are in the same directory, in which we execute the following:
user#fedora-21 ~]$ gcc main.c -ldl
[user#fedora-21 ~]$ ./a.out
Loaded library.
Symbol found
The int is 12345
As one can see, the library was able to write to the memory of the integer k.
I am trying to use the 'environ' variable, but it keeps giving me an error. It seems to be a makefile/build error and I can't seem to fix it. I have searched fo answers, but still I am lost.
Here is my c file:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <dirent.h>
#include "cmd.h"
int cmdExec() {
...
extern char **environ;
...
printf("Enter a command\n");
//gets (input);
scanf("%s%*[^\n]", input);
if (...) {
...
}
else if (strcmp(input, "environ") == 0) {
int i;
for (i = 0; environ[i] != NULL; i++) {
printf("%s\n", environ[i]);
}
exit(0);
else
...
return 0;
}
and here is the makefile:
CC = gcc
CFLAGS = -c
CFLAGS-y = -std=c99
all: cmd
cmd.o: cmd.c cmd.h
$(CC) $(CFLAGS) $(CFLAGS-y) cmd.c
cmd.exe: cmd.o
$(CC) -o cmd.exe cmd.o
clean:
rm -rf *.o cmd.exe a.out
This is the output:
make all
gcc -c -std=c99 cmd.c
gcc cmd.o -o cmd
cmd.o:cmd.c:(.text+0x105): undefined reference to `environ'
cmd.o:cmd.c:(.text+0x127): undefined reference to `environ'
collect2: ld returned 1 exit status
make: *** [cmd] Error 1
From what I've searched this deals with linking libraries, but I don't know how to apply that to my specific situation. If someone could give me a hand I'd appreciate it.
Not all(if any) compilers on Windows provides access to environment variables through a global symbol named environ.
You can use e.g. getenv() to access environment variables.
The win32 API provides GetEnvironmentStrings() to access all the variables.
Some platforms allow you to access the environment through an additional argument to main(), you'd declare your main function as:
int main(int argc, char *argv[], char *environ[])
The environ global variable is defined by POSIX, and is not supported by Windows (unless you're using Cygwin, which is a POSIX-like layer implemented on top of Windows).
As far as I know, the non-standard definition
int main(int argc, char **argv, char **envp) { /* ... */ }
is also not supported on Windows.
But a quick Google search turned up this answer, which points to the documentation for the Windows-specific GetEnvironmentStrings function:
LPTCH WINAPI GetEnvironmentStrings(void);
If the function succeeds, the return value is a pointer to the
environment block of the current process.
If the function fails, the return value is NULL.
The result points to a long string with the environment variables separated by '\0' null characters, with the environment terminated by two consecutive null characters.
LPTCH is Microsoft's typedef for a pointer to either unsigned char or a 16-bit wchar_t. See the referenced documentation for more information.