Obtain a pointer to a C char array in Swift - c

A have a structure like this (defined in bson.h of mongodb c driver):
typedef struct
{
uint32_t domain;
uint32_t code;
char message[504];
} bson_error_t;
In Swift I have a pointer to this structure like this:
err: UnsafePointer<bson_error_t> = ...
Now whatever I do I cannot convert message[504] (which Swift sees as a tuple of (Int8, Int8, Int8, ...504 times)) to char* to use it in String.fromCString().
Is it even possible to do that in Swift? As a temporary solution I created a helper C function in a separate .c file which takes err *bson_error_t and returns char*, but this is weird if
Swift cannot do it by itself.

It's not pretty, not intuitive, but it's doable. Purely in Swift, no C glue code needed. A minimal demo:
b.h
typedef struct {
int n;
char s[8];
} Bridged;
Bridged *make_b(void);
b.c
#include <stdlib.h>
#include <string.h>
#include "b.h"
Bridged *make_b(void)
{
Bridged *p = calloc(sizeof(*p), 1);
memcpy(p->s, "foobarz", 8);
return p;
}
b.swift:
// half compile-time, half run-time black magic
func toCharArray<T>(t: T) -> [CChar] {
var a: [CChar] = []
let mirror = reflect(t)
for i in 0 ..< mirror.count {
a.append(mirror[i].1.value as CChar)
}
return a
}
let b = make_b().memory.s // bridged tuple of 8 chars
let a = toCharArray(b) // Swift array of (8) CChars
let s = String.fromCString(a) // proper Swift string
println(s)
Compile:
$ xcrun swiftc -O -c b.swift -import-objc-header b.h
$ clang -O2 -c b.c -o b.c.o
$ xcrun swiftc b.o b.c.o -o b
Run:
$ ./b
Optional("foobarz")

Here my suggestion (similar to rintaro's approach, perhaps slightly simpler):
var err: UnsafeMutablePointer<bson_error_t> = ...
var msg = err.memory.message
let msgString = withUnsafePointer(&msg) { String.fromCString(UnsafePointer($0)) }
println(msgString)

Quick hack to retrieve message String from bson_error_t:
extension bson_error_t {
mutating func messageString() -> String? {
return String.fromCString(
{ (p:UnsafePointer<Void>) in UnsafePointer<CChar>(p) }(&self.message.0)
)
}
}
// Usage:
var err: UnsafeMutablePointer<bson_error_t> = ...
...
let errMessage = err.memory.messageString()

Related

how to stub fgets in C while using Google Unit Test

I have currently been assigned to do unit tests on some problems that I've done during an introductory bootcamp, and I'm having problems understanding the concept of 'stub' or 'mock'.
I'm using Google Unit Test, and the problems from the bootcamp are solved in C.
int validate_input(uint32_t * input_value) {
char input_buffer[1024] = {0};
char * endptr = NULL;
int was_read_correctly = 1;
printf("Give the value for which to print the bits: ");
/*
* Presuming wrong input from user, it does not signal:
* - number that exceeds the range of uint_32 (remains to be fixed)
* For example: 4294967295 is the max value of uint_32 ( and this can be also confirmed by the output )
* If bigger numbers are entered the actual value seems to reset ( go back to 0 and upwards.)
*/
if (NULL == fgets(input_buffer, 1024, stdin)) {
was_read_correctly = 0;
} else {
if ('-' == input_buffer[0]) {
fprintf(stderr, "Negative number not allowed.\n");
was_read_correctly = 0;
}
}
errno = 0;
if (1 == was_read_correctly) {
* input_value = strtol(input_buffer, & endptr, 10);
if (ERANGE == errno) {
fprintf(stderr, "Sorry, this number is too small or too large.\n");
was_read_correctly = 0;
} else if (endptr == input_buffer) {
fprintf(stderr, "Incorrect input.\n(Entered characters or characters and digits.)\n");
was_read_correctly = 0;
} else if ( * endptr && '\n' != * endptr) {
fprintf(stderr, "Input didn't get wholely converted.\n(Entered digits and characters)\n");
was_read_correctly = 0;
}
} else {
fprintf(stderr, "Input was not read correctly.\n");
was_read_correctly = 0;
}
return was_read_correctly;
}
How should I think/plan the process of stubbing a function like fgets/malloc in C? And, if it isn't too much, how a function like this should be thought to test?
Disclaimer: This is just one way to mock C functions for GoogleTest. There are other methods for sure.
The problem to mock C functions lays in the way GoogleTest works. All its cool functionality is based on deriving a C++ class to mock and overriding its methods. These methods must be virtual, too. But C function are no members of any class, left alone of being virtual.
The way we found and use with success it to provide a kind of wrapper class that includes methods that have the same prototype as the C functions. Additionally this class holds a pointer to an instance of itself as a static class variable. In some sense this resembles the Singleton pattern, with all its characteristics, for good or bad.
Each test instantiates an object of this class and uses this object for the common checks.
Finally the C functions are implemented as stubs that call the single instance's method of the same kind.
Let's say we have these C functions:
// cfunction.h
#ifndef C_FUNCTION_H
#define C_FUNCTION_H
extern "C" void cf1(int p1, void* p2);
extern "C" int cf2(void);
#endif
Then the header file for the mocking class is:
// CFunctionMock.h
#ifndef C_FUNCTION_MOCK_H
#define C_FUNCTION_MOCK_H
#include "gmock/gmock.h"
#include "gtest/gtest.h"
#include "cfunction.h"
class CFunctionMock
{
public:
static CFunctionMock* instance;
CFunctionMock() {
instance = this;
}
~CFunctionMock() {
instance = nullptr;
}
MOCK_METHOD(void, cf1, (int p1, void* p2));
MOCK_METHOD(int, cf2, (void));
};
#endif
And this is the implementation of the mocking class, including the replacing C functions. All the functions check that the single instance exists.
// CFunctionMock.cpp
#include "CFunctionMock.h"
CFunctionMock* CFunctionMock::instance = nullptr;
extern "C" void cf1(int p1, void* p2) {
ASSERT_NE(CFunctionMock::instance, nullptr);
CFunctionMock::instance->cf1(p1, p2);
}
extern "C" int cf2(void) {
if (CFunctionMock::instance == nullptr) {
ADD_FAILURE() << "CFunctionMock::instance == nullptr";
return 0;
}
return CFunctionMock::instance->cf2();
}
On non-void function you can't use ASSERT_NE because it quits on an error with a simple return. Therefore the check for an existing instance is a bit more elaborated. You should think of a good default value to return, too.
Now we get to write some test.
// SomeTest.cpp
#include "gmock/gmock.h"
#include "gtest/gtest.h"
using ::testing::_;
using ::testing::Return;
#include "CFunctionMock.h"
#include "module_to_test.h"
TEST(AGoodTestSuiteName, AndAGoodTestName) {
CFunctionMock mock;
EXPECT_CALL(mock, cf1(_, _))
.Times(0);
EXPECT_CALL(mock, cf2())
.WillRepeatedly(Return(23));
// any call of module_to_test that calls (or not) the C functions
// any EXPECT_...
}
EDIT
I was reading the question once more and came to the conclusion that a more direct example is necessary. So here we go! I like to use as much of the magic behind Googletest because it makes extensions so much easier. Working around it feels like working against it.
Oh, my system is Windows 10 with MinGW64.
I'm a fan of Makefiles:
TESTS := Test
WARNINGLEVEL := -Wall -Wextra
CC := gcc
CFLAGS := $(WARNINGLEVEL) -g -O3
CXX := g++
CXXFLAGS := $(WARNINGLEVEL) -std=c++11 -g -O3 -pthread
LD := g++
LDFLAGS := $(WARNINGLEVEL) -g -pthread
LIBRARIES := -lgmock_main -lgtest -lgmock
GTESTFLAGS := --gtest_color=no --gtest_print_time=0
all: $(TESTS:%=%.exe)
run: all $(TESTS:%=%.log)
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $#
%.o: %.cpp
$(CXX) $(CXXFLAGS) -I./include -c $< -o $#
%.exe: %.o
$(LD) $(LDFLAGS) $^ -L./lib $(LIBRARIES) -o $#
%.log: %.exe
$< $(GTESTFLAGS) > $# || type $#
Test.exe: module_to_test.o FgetsMock.o
These Makefiles make it easy to add more tests, modules, anything, and document all options. Extend it to your liking.
Module to Test
To get no warning, I had to extend the provided source:
// module_to_test.c
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include "module_to_test.h"
// all the rest is as in the OP's source...
And of course we need a header file:
// module_to_test.h
#include <stdint.h>
int validate_input(uint32_t *input_value);
The Mock Class
The mock class is modelled after the example above. Do enable "feeding" the string I added an parameterized action.
// FgetsMock.h
#ifndef FGETS_MOCK_H
#define FGETS_MOCK_H
#include <cstring>
#include "gmock/gmock.h"
#include "gtest/gtest.h"
ACTION_P(CopyFromSource, source)
{
memcpy(arg0, source, arg1);
}
class FgetsMock
{
public:
static FgetsMock* instance;
FgetsMock()
{
instance = this;
}
~FgetsMock()
{
instance = nullptr;
}
MOCK_METHOD(char*, fgets, (char*, int, FILE*));
};
#endif
Its implementation file is straight forward and provides the mocked C function.
// FgetsMock.cpp
#include <stdio.h>
#include "FgetsMock.h"
FgetsMock* FgetsMock::instance = nullptr;
extern "C" char* fgets(char* str, int num, FILE* stream)
{
if (FgetsMock::instance == nullptr)
{
ADD_FAILURE() << "FgetsMock::instance == nullptr";
return 0;
}
return FgetsMock::instance->fgets(str, num, stream);
}
Implementing Some Tests
Here are some examples for tests. Unfortunately the module-to-test uses stdout and stderr that are not so simple to catch and test. You might like to read about "death tests" or provide your own method of redirection. In the core, the design of the function is not that good, because it did not take testing into account.
// Test.cpp
#include "gmock/gmock.h"
#include "gtest/gtest.h"
using ::testing::_;
using ::testing::DoAll;
using ::testing::Ge;
using ::testing::NotNull;
using ::testing::Return;
using ::testing::ReturnArg;
#include "FgetsMock.h"
extern "C"
{
#include "module_to_test.h"
}
TEST(ValidateInput, CorrectInput)
{
const char input[] = "42";
const int input_length = sizeof input;
FgetsMock mock;
uint32_t number;
EXPECT_CALL(mock, fgets(NotNull(), Ge(input_length), stdin))
.WillOnce(DoAll(
CopyFromSource(input),
ReturnArg<0>()
));
int result = validate_input(&number);
EXPECT_EQ(result, 1);
EXPECT_EQ(number, 42U);
}
TEST(ValidateInput, InputOutputError)
{
FgetsMock mock;
uint32_t dummy;
EXPECT_CALL(mock, fgets(_, _, _))
.WillOnce(Return(nullptr));
int result = validate_input(&dummy);
EXPECT_EQ(result, 0);
}
TEST(ValidateInput, NegativeInput)
{
const char input[] = "-23";
const int input_length = sizeof input;
FgetsMock mock;
uint32_t dummy;
EXPECT_CALL(mock, fgets(NotNull(), Ge(input_length), stdin))
.WillOnce(DoAll(
CopyFromSource(input),
ReturnArg<0>()
));
int result = validate_input(&dummy);
EXPECT_EQ(result, 0);
}
TEST(ValidateInput, RangeError)
{
const char input[] = "12345678901";
const int input_length = sizeof input;
FgetsMock mock;
uint32_t dummy;
EXPECT_CALL(mock, fgets(NotNull(), Ge(input_length), stdin))
.WillOnce(DoAll(
CopyFromSource(input),
ReturnArg<0>()
));
int result = validate_input(&dummy);
EXPECT_EQ(result, 0);
}
TEST(ValidateInput, CharacterError)
{
const char input[] = "23fortytwo";
const int input_length = sizeof input;
FgetsMock mock;
uint32_t dummy;
EXPECT_CALL(mock, fgets(NotNull(), Ge(input_length), stdin))
.WillOnce(DoAll(
CopyFromSource(input),
ReturnArg<0>()
));
int result = validate_input(&dummy);
EXPECT_EQ(result, 0);
}
Building and Running the Tests
This is the output of my (Windows) console when building freshly and testing:
> make run
gcc -Wall -Wextra -g -O3 -c module_to_test.c -o module_to_test.o
g++ -Wall -Wextra -std=c++11 -g -O3 -pthread -I./include -c FgetsMock.cpp -o FgetsMock.o
g++ -Wall -Wextra -std=c++11 -g -O3 -pthread -I./include -c Test.cpp -o Test.o
g++ -Wall -Wextra -g -pthread Test.o module_to_test.o FgetsMock.o -L./lib -lgmock_main -lgtest -lgmock -o Test.exe
Test.exe --gtest_color=no --gtest_print_time=0 > Test.log || type Test.log
Input was not read correctly.
Negative number not allowed.
Input was not read correctly.
Sorry, this number is too small or too large.
Input didn't get wholely converted.
(Entered digits and characters)
rm Test.o
You see the output of stderr of the C function.
And this is the recorded log, see the Makefile how it is produced.
Running main() from gmock_main.cc
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from ValidateInput
[ RUN ] ValidateInput.CorrectInput
Give the value for which to print the bits: [ OK ] ValidateInput.CorrectInput
[ RUN ] ValidateInput.InputOutputError
Give the value for which to print the bits: [ OK ] ValidateInput.InputOutputError
[ RUN ] ValidateInput.NegativeInput
Give the value for which to print the bits: [ OK ] ValidateInput.NegativeInput
[ RUN ] ValidateInput.RangeError
Give the value for which to print the bits: [ OK ] ValidateInput.RangeError
[ RUN ] ValidateInput.CharacterError
Give the value for which to print the bits: [ OK ] ValidateInput.CharacterError
[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran.
[ PASSED ] 5 tests.
Because of the output on stdout it is mixed up with Googletest's output.
I have managed to solve this issue in the following way:
header file for the stub function:
#ifndef STUBS_H_
#define STUBS_H_
#include "../src/p1.h"
char* fgets_stub(char *s, int size, FILE *stream);
#define fgets fgets_stub
#include "../src/p1.c"
char* fgets_RET;
#endif
implementation of stub function:
#include "stubs.h"
char* fgets_stub(char *s, int size, FILE *stream)
{
if (NULL != fgets_RET)
{
strcpy(s,fgets_RET);
}
return fgets_RET;
}
how to test in test.cpp:
TEST(ValidateInput,CorrectionTest)
{
uint32_t tester = 0;
char* dummy_char = new char[NUM_OF_BITS];
strcpy(dummy_char,"39131");
cout<<dummy_char;
fgets_RET = dummy_char;
ASSERT_EQ(1,validate_input(&tester));
}
if the person that tests wishes to force NULL return of fgets:
TEST(ValidateInput,CorrectionTest)
{
uint32_t tester = 0;
fgets_RET = NULL;
ASSERT_EQ(0,validate_input(&tester));
}

Use of unresolved identifier 'function' when accessing Rust code using the Swift C Bridge

I'm trying to add a C bridge from a Swift macOS app to a Rust program via cargo lipo. I can get a simple function that takes no input and prints "Hello", but my Rust program's input is a struct of strings and arrays of strings. I followed fluffyemily's Mozilla post on setting up cargo lipo in a Swift macOS app and hoped I could simply pass in a struct similar to what we see here in #Shepmaster's helpful article on Rust FFI or Rust bridges to other languages
I can't get past this error when I build the XCode Swift app:
Use of unresolved identifier 'init_loop_volumes'
Here's what I'm working with:
rust_bindings.h
// #include <stdio.h>
#include <stdint.h>
// #include <inttypes.h>
//const char* init_loop_volumes(const char* to);
void simple();
void init_loop_volumes(const struct args);
rust_bindings_header.h
#ifndef rust_bindings_header_h
#define rust_bindings_header_h
#import "rust_bindings.h"
#endif /* rust_bindings_header_h */
lib.rs
#[no_mangle]
pub extern "C" fn init_loop_volumes(args: fv_structopt::VacuumMac) {
let check_process_struct = init(args);
loop_volumes(check_process_struct.c, false);
println!("test")
}
structs.rs
#[repr(C)]
pub struct VacuumMac {
pub deposit_path: std::path::PathBuf,
pub exclude_paths: Vec<std::path::PathBuf>,
pub volumes_path: std::path::PathBuf,
pub log_path: std::path::PathBuf,
pub file_types: Vec<String>,
pub options: Vec<String>,
}
Operation.swift
override func main() {
init_loop_volumes(self.vacuumCli)
// simple() <- this works
}
Structs.swift
public struct VacuumCli {
var deposit_path = String();
var exclude_paths = [String]();
var volume_path = String();
var log_path = String();
var file_types = [String]();
var options = [String]();
}
This is a little tricky to debug as cargo lipo doesn't give a report of what it produced into the lib.a achieve file. Sometimes I'll turn init_loop_volumes into a function that just prints "Hello" and I still get that error even after cleaning the build in XCode, which makes me think XCode is caching my rust_bindings.h somewhere other that what it shows in the file-tree/editor.
Since I'm working with C bridge strings, do I have to do the CStr conversions?

How call and compile function from elf to my binary?

I have a binary file (ELF) that I don't write, but I want to use 1 function from this binary (I know the address/offset of the function), that function not exported from the binary.
My goal is to call this function from my C code that I write and compile this function statically in my binary (I compile with gcc).
How can I do that please?
I am going to answer the
call to this function from my c code that I write
part.
The below works under certain assumptions, like dynamic linking and position independent code. I haven't thought for too long about what happens if they are broken (let's experiment/discuss, if there's interest).
$ cat lib.c
int data = 42;
static int foo () { return data; }
gcc -fpic -shared lib.c -o lib.so
$ nm lib.so | grep foo
00000000000010e9 t foo
The above reproduces having the address that you know. The address we know now is 0x10e9. It is the virtual address of foo before relocation. We'll model the relocation the dynamic loader does by hand by simply adding the base address at which lib.so gets loaded.
$ cat 1.c
#define _GNU_SOURCE
#include <stdio.h>
#include <link.h>
#include <string.h>
#include <elf.h>
#define FOO_VADDR 0x10e9
typedef int(*func_t)();
int callback(struct dl_phdr_info *info, size_t size, void *data)
{
if (!(strstr(info->dlpi_name, "lib.so")))
return 0;
Elf64_Addr addr = info->dlpi_addr + FOO_VADDR;
func_t f = (func_t)addr;
int res = f();
printf("res = %d\n", res);
return 0;
}
int main()
{
void *handle = dlopen("./lib.so", RTLD_LAZY);
if (!handle) {
puts("failed to load");
return 1;
}
dl_iterate_phdr(&callback, NULL);
dlclose(handle);
return 0;
}
And now...
$ gcc 1.c -ldl && ./a.out
res = 42
Voila -- it worked! That was fun.
Credit: this was helpful.
If you have questions, feel free to read the man and ask in the comments.
As for
compile this function statically in my binary
I don't know off the bat. This would be trickier. Why do you want that? Also, do you know whether the function depends on some data (or maybe it calls other functions) in the original ELF file, like in the example above?

Weird pointer conversion in C

I'm having trouble while writing my garbage collector in C. I give you a minimal and verifiable example for it.
The first file is in charge of dealing with the virtual machine
#include <stdlib.h>
#include <stdint.h>
typedef int32_t value_t;
typedef enum {
Lb, Lb1, Lb2, Lb3, Lb4, Lb5,
Ib, Ob
} reg_bank_t;
static value_t* memory_start;
static value_t* R[8];
value_t* engine_get_Lb(void) { return R[Lb]; }
value_t engine_run() {
memory_start = memory_get_start();
for (reg_bank_t pseudo_bank = Lb; pseudo_bank <= Lb5; ++pseudo_bank)
R[pseudo_bank] = memory_start + (pseudo_bank - Lb) * 32;
value_t* block = memory_allocate();
}
Then I have the actual garbage collector, the minimized code is:
#include <stdlib.h>
#include <stdint.h>
typedef int32_t value_t;
static value_t* memory_start = NULL;
void memory_setup(size_t total_byte_size) {
memory_start = calloc(total_byte_size, 1);
}
void* memory_get_start() { return memory_start; }
void mark(value_t* base){
value_t vbase = 0;
}
value_t* memory_allocate() {
mark(engine_get_Lb());
return engine_get_Lb();
}
Finally, minimal main is:
int main(int argc, char* argv[]) {
memory_setup(1000000);
engine_run();
return 0;
}
The problem I'm getting with gdb is that if I print engine_get_Lb() I get the address (value_t *) 0x7ffff490a800 while when printing base inside of the function mark I get the address (value_t *) 0xfffffffff490a800.
Any idea why this is happening?
Complementary files that may help
The makefile
SHELL=/bin/bash
SRCS=src/engine.c \
src/main.c \
src/memory_mark_n_sweep.c
CFLAGS_COMMON=-std=c11 -fwrapv
CLANG_SAN_FLAGS=-fsanitize=address
# Clang warning flags
CLANG_WARNING_FLAGS=-Weverything \
-Wno-format-nonliteral \
-Wno-c++98-compat \
-Wno-gnu-label-as-value
# Flags for debugging:
CFLAGS_DEBUG=${CFLAGS_COMMON} -g ${CLANG_SAN_FLAGS} ${CLANG_WARNING_FLAGS}
# Flags for maximum performance:
CFLAGS_RELEASE=${CFLAGS_COMMON} -O3 -DNDEBUG
CFLAGS=${CFLAGS_DEBUG}
all: vm
vm: ${SRCS}
mkdir -p bin
clang ${CFLAGS} ${LDFLAGS} ${SRCS} -o bin/vm
File with instructions .asm
5c190000 RALO(Lb,25)
value_t* memory_allocate() {
mark(engine_get_Lb());
return engine_get_Lb();
}
engine_get_Lb is not declared before use. It is assumed by the compiler to return int, per an antiquated and dangerous rule of the C language. It was deprecated in the C standard for quite some time, and now is finally removed.
Create a header file with declarations of all your global functions, and #include it in all your source files.
Your compiler should have at least warned you about this error at its default settings. If it did, you should have read and completely understood the warnings before continuing. If it didn't, consider an upgrade. If you cannot upgrade, permanently add -Wall -Wextra -Werror to your compilation flags. Consider also -Wpedantic and -std=c11.

Get names and addresses of exported functions in linux

I am able to get a list of exported function names and pointers from an executable in windows by using using the PIMAGE_DOS_HEADER API (example).
What is the equivalent API for Linux?
For context I am creating unit test executables and I am exporting functions starting with the name "test_" and I want the executable to just spin through and execute all of the test functions when run.
Example psuedo code:
int main(int argc, char** argv)
{
auto run = new_trun();
auto module = dlopen(NULL);
auto exports = get_exports(module); // <- how do I do this on unix?
for( auto i = 0; i < exports->length; i++)
{
auto export = exports[i];
if(strncmp("test_", export->name, strlen("test_")) == 0)
{
tcase_add(run, export->name, export->func);
}
}
return trun_run(run);
}
EDIT:
I was able to find what I was after using the top answer from this question:
List all the functions/symbols on the fly in C?
Additionally I had to use the gnu_hashtab_symbol_count function from Nominal Animal's answer below to handle the DT_GNU_HASH instead of the DT_HASH.
My final test main function looks like this:
int main(int argc, char** argv)
{
vector<string> symbols;
dl_iterate_phdr(retrieve_symbolnames, &symbols);
TRun run;
auto handle = dlopen(NULL, RTLD_LOCAL | RTLD_LAZY);
for(auto i = symbols.begin(); i != symbols.end(); i++)
{
auto name = *i;
auto func = (testfunc)dlsym(handle, name.c_str());
TCase tcase;
tcase.name = string(name);
tcase.func = func;
run.test_cases.push_back(tcase);
}
return trun_run(&run);
}
Which I then define tests in the assembly like:
// test.h
#define START_TEST(name) extern "C" EXPORT TResult test_##name () {
#define END_TEST return tresult_success(); }
// foo.cc
START_TEST(foo_bar)
{
assert_pending();
}
END_TEST
Which produces output that looks like this:
test_foo_bar: pending
1 pending
0 succeeded
1 total
I do get quite annoyed when I see questions asking how to do something in operating system X that you do in Y.
In most cases, it is not an useful approach, because each operating system (family) tends to have their own approach to issues, so trying to apply something that works in X in Y is like stuffing a cube into a round hole.
Please note: the text here is intended as harsh, not condesceding; my command of the English language is not as good as I'd like. Harshness combined with actual help and pointers to known working solutions seems to work best in overcoming nontechnical limitations, in my experience.
In Linux, a test environment should use something like
LC_ALL=C LANG=C readelf -s FILE
to list all the symbols in FILE. readelf is part of the binutils package, and is installed if you intend to build new binaries on the system. This leads to portable, robust code. Do not forget that Linux encompasses multiple hardware architectures that do have real differences.
To build binaries in Linux, you normally use some of the tools provided in binutils. If binutils provided a library, or there was an ELF library based on the code used in binutils, it would be much better to use that, rather than parse the output of the human utilities. However, there is no such library (the libbfd library binutils uses internally is not ELF-specific). The [URL=http://www.mr511.de/software/english.html]libelf[/URL] library is good, but it is completely separate work by chiefly a single author. Bugs in it have been reported to binutils, which is unproductive, as the two are not related. Simply put, there are no guarantees that it handles the ELF files on a given architecture the same way binutils does. Therefore, for robustness and reliability, you'll definitely want to use binutils.
If you have a test application, it should use a script, say /usr/lib/yourapp/list-test-functions, to list the test-related functions:
#!/bin/bash
export LC_ALL=C LANG=C
for file in "$#" ; do
readelf -s "$file" | while read num value size type bind vix index name dummy ; do
[ "$type" = "FUNC" ] || continue
[ "$bind" = "GLOBAL" ] || continue
[ "$num" = "$[$num]" ] || continue
[ "$index" = "$[$index]" ] || continue
case "$name" in
test_*) printf '%s\n' "$name"
;;
esac
done
done
This way, if there is an architecture that has quirks (in the binutils' readelf output format in particular), you only need to modify the script. Modifying such a simple script is not difficult, and it is easy to verify the script works correctly -- just compare the raw readelf output to the script output; anybody can do that.
A subroutine that constructs a pipe, fork()s a child process, executes the script in the child process, and uses e.g. getline() in the parent process to read the list of names, is quite simple and extremely robust. Since this is also the one fragile spot, we've made it very easy to fix any quirks or problems here by using that external script (that is customizable/extensible to cover those quirks, and easy to debug).
Remember, if binutils itself has bugs (other than output formatting bugs), any binaries built will almost certainly exhibit those same bugs also.
Being a Microsoft-oriented person, you probably will have trouble grasping the benefits of such a modular approach. (It is not specific to Microsoft, but specific to a single-vendor controlled ecosystem where the vendor-pushed approach is via overarching frameworks, and black boxes with clean but very limited interfaces. I think it as the framework limitation, or vendor-enforced walled garden, or prison garden. Looks good, but getting out is difficult. For description and history on the modular approach I'm trying to describe, see for example the Unix philosophy article at Wikipedia.)
The following shows that your approach is indeed possible in Linux, too -- although clunky and fragile; this stuff is intended to be done using the standard tools instead. It's just not the right approach in general.
The interface, symbols.h, is easiest to implement using a callback function that gets called for each symbol found:
#ifndef SYMBOLS_H
#ifndef _GNU_SOURCE
#error You must define _GNU_SOURCE!
#endif
#define SYMBOLS_H
#include <stdlib.h>
typedef enum {
LOCAL_SYMBOL = 1,
GLOBAL_SYMBOL = 2,
WEAK_SYMBOL = 3,
} symbol_bind;
typedef enum {
FUNC_SYMBOL = 4,
OBJECT_SYMBOL = 5,
COMMON_SYMBOL = 6,
THREAD_SYMBOL = 7,
} symbol_type;
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom);
#endif /* SYMBOLS_H */
The ELF symbol binding and type macros are word-size specific, so to avoid the hassle, I declared the enum types above. I omitted some uninteresting types (STT_NOTYPE, STT_SECTION, STT_FILE), however.
The implementation, symbols.c:
#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <fnmatch.h>
#include <dlfcn.h>
#include <link.h>
#include <errno.h>
#include "symbols.h"
#define UINTS_PER_WORD (__WORDSIZE / (CHAR_BIT * sizeof (unsigned int)))
static ElfW(Word) gnu_hashtab_symbol_count(const unsigned int *const table)
{
const unsigned int *const bucket = table + 4 + table[2] * (unsigned int)(UINTS_PER_WORD);
unsigned int b = table[0];
unsigned int max = 0U;
while (b-->0U)
if (bucket[b] > max)
max = bucket[b];
return (ElfW(Word))max;
}
static symbol_bind elf_symbol_binding(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_BIND(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_BIND(st_info)) {
#else
switch (ELF_ST_BIND(st_info)) {
#endif
case STB_LOCAL: return LOCAL_SYMBOL;
case STB_GLOBAL: return GLOBAL_SYMBOL;
case STB_WEAK: return WEAK_SYMBOL;
default: return 0;
}
}
static symbol_type elf_symbol_type(const unsigned char st_info)
{
#if __WORDSIZE == 32
switch (ELF32_ST_TYPE(st_info)) {
#elif __WORDSIZE == 64
switch (ELF64_ST_TYPE(st_info)) {
#else
switch (ELF_ST_TYPE(st_info)) {
#endif
case STT_OBJECT: return OBJECT_SYMBOL;
case STT_FUNC: return FUNC_SYMBOL;
case STT_COMMON: return COMMON_SYMBOL;
case STT_TLS: return THREAD_SYMBOL;
default: return 0;
}
}
static void *dynamic_pointer(const ElfW(Addr) addr,
const ElfW(Addr) base, const ElfW(Phdr) *const header, const ElfW(Half) headers)
{
if (addr) {
ElfW(Half) h;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_LOAD)
if (addr >= base + header[h].p_vaddr &&
addr < base + header[h].p_vaddr + header[h].p_memsz)
return (void *)addr;
}
return NULL;
}
struct phdr_iterator_data {
int (*callback)(const char *libpath, const char *libname,
const char *objname, const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom);
void *custom;
};
static int iterate_phdr(struct dl_phdr_info *info, size_t size, void *dataref)
{
struct phdr_iterator_data *const data = dataref;
const ElfW(Addr) base = info->dlpi_addr;
const ElfW(Phdr) *const header = info->dlpi_phdr;
const ElfW(Half) headers = info->dlpi_phnum;
const char *libpath, *libname;
ElfW(Half) h;
if (!data->callback)
return 0;
if (info->dlpi_name && info->dlpi_name[0])
libpath = info->dlpi_name;
else
libpath = "";
libname = strrchr(libpath, '/');
if (libname && libname[0] == '/' && libname[1])
libname++;
else
libname = libpath;
for (h = 0; h < headers; h++)
if (header[h].p_type == PT_DYNAMIC) {
const ElfW(Dyn) *entry = (const ElfW(Dyn) *)(base + header[h].p_vaddr);
const ElfW(Word) *hashtab;
const ElfW(Sym) *symtab = NULL;
const char *strtab = NULL;
ElfW(Word) symbol_count = 0;
for (; entry->d_tag != DT_NULL; entry++)
switch (entry->d_tag) {
case DT_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab)
symbol_count = hashtab[1];
break;
case DT_GNU_HASH:
hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
if (hashtab) {
ElfW(Word) count = gnu_hashtab_symbol_count(hashtab);
if (count > symbol_count)
symbol_count = count;
}
break;
case DT_STRTAB:
strtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
case DT_SYMTAB:
symtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
break;
}
if (symtab && strtab && symbol_count > 0) {
ElfW(Word) s;
for (s = 0; s < symbol_count; s++) {
const char *name;
void *const ptr = dynamic_pointer(base + symtab[s].st_value, base, header, headers);
symbol_bind bind;
symbol_type type;
int result;
if (!ptr)
continue;
type = elf_symbol_type(symtab[s].st_info);
bind = elf_symbol_binding(symtab[s].st_info);
if (symtab[s].st_name)
name = strtab + symtab[s].st_name;
else
name = "";
result = data->callback(libpath, libname, name, ptr, symtab[s].st_size, bind, type, data->custom);
if (result)
return result;
}
}
}
return 0;
}
int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom),
void *custom)
{
struct phdr_iterator_data data;
if (!callback)
return errno = EINVAL;
data.callback = callback;
data.custom = custom;
return errno = dl_iterate_phdr(iterate_phdr, &data);
}
When compiling the above, remember to link against the dl library.
You may find the gnu_hashtab_symbol_count() function above interesting; the format of the table is not well documented anywhere that I can find. This is tested to work on both i386 and x86-64 architectures, but it should be vetted against the GNU sources before relying on it in production code. Again, the better option is to just use those tools directly via a helper script, as they will be installed on any development machine.
Technically, a DT_GNU_HASH table tells us the first dynamic symbol, and the highest index in any hash bucket tells us the last dynamic symbol, but since the entries in the DT_SYMTAB symbol table always begin at 0 (actually, the 0 entry is "none"), I only consider the upper limit.
To match library and function names, I recommend using strncmp() for a prefix match for libraries (match at the start of the library name, up to the first .). Of course, you can use fnmatch() if you prefer glob patterns, or regcomp()+regexec() if you prefer regular expressions (they are built-in to the GNU C library, no external libraries are needed).
Here is an example program, example.c, that just prints out all the symbols:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include "symbols.h"
static int my_func(const char *libpath, const char *libname, const char *objname,
const void *addr, const size_t size,
const symbol_bind binding, const symbol_type type,
void *custom __attribute__((unused)))
{
printf("%s (%s):", libpath, libname);
if (*objname)
printf(" %s:", objname);
else
printf(" unnamed");
if (size > 0)
printf(" %zu-byte", size);
if (binding == LOCAL_SYMBOL)
printf(" local");
else
if (binding == GLOBAL_SYMBOL)
printf(" global");
else
if (binding == WEAK_SYMBOL)
printf(" weak");
if (type == FUNC_SYMBOL)
printf(" function");
else
if (type == OBJECT_SYMBOL || type == COMMON_SYMBOL)
printf(" variable");
else
if (type == THREAD_SYMBOL)
printf(" thread-local variable");
printf(" at %p\n", addr);
fflush(stdout);
return 0;
}
int main(int argc, char *argv[])
{
int arg;
for (arg = 1; arg < argc; arg++) {
void *handle = dlopen(argv[arg], RTLD_NOW);
if (!handle) {
fprintf(stderr, "%s: %s.\n", argv[arg], dlerror());
return EXIT_FAILURE;
}
fprintf(stderr, "%s: Loaded.\n", argv[arg]);
}
fflush(stderr);
if (symbols(my_func, NULL))
return EXIT_FAILURE;
return EXIT_SUCCESS;
}
To compile and run the above, use for example
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 example.o symbols.o -ldl -o example
./example | less
To see the symbols in the program itself, use the -rdynamic flag at link time to add all symbols to the dynamic symbol table:
gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 -rdynamic example.o symbols.o -ldl -o example
./example | less
On my system, the latter prints out
(): stdout: 8-byte global variable at 0x602080
(): _edata: global at 0x602078
(): __data_start: global at 0x602068
(): data_start: weak at 0x602068
(): symbols: 70-byte global function at 0x401080
(): _IO_stdin_used: 4-byte global variable at 0x401150
(): __libc_csu_init: 101-byte global function at 0x4010d0
(): _start: global function at 0x400a57
(): __bss_start: global at 0x602078
(): main: 167-byte global function at 0x4009b0
(): _init: global function at 0x4008d8
(): stderr: 8-byte global variable at 0x602088
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097da0
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): __asprintf: global function at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): free: global function at 0x7fc652097000
...
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): dlvsym: 118-byte weak function at 0x7fc6520981f0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cf14a0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc65208c740
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __libc_enable_secure: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __tls_get_addr: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global_ro: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_find_dso_for_object: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_starting_up: weak at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_argv: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): putwchar: 292-byte global function at 0x7fc651d4a210
...
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): vwarn: 224-byte global function at 0x7fc651dc8ef0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): wcpcpy: 39-byte weak function at 0x7fc651d75900
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229bae0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_get_tls_static_info: 21-byte global function at 0x7fc6522adaa0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_PRIVATE: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.3: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.4: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): free: 42-byte weak function at 0x7fc6522b2c40
...
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): malloc: 13-byte weak function at 0x7fc6522b2bf0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_allocate_tls_init: 557-byte global function at 0x7fc6522adc00
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _rtld_global_ro: 304-byte global variable at 0x7fc6524bdcc0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): __libc_enable_secure: 4-byte global variable at 0x7fc6524bde68
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_rtld_di_serinfo: 1620-byte global function at 0x7fc6522a4710
I used ... to mark where I removed lots of lines.
Questions?
To get a list of exported symbols from a shared library (a .so) under Linux, there are two ways: the easy one and a slightly harder one.
The easy one is to use the console tools already available: objdump (included in GNU binutils):
$ objdump -T /usr/lib/libid3tag.so.0
00009c15 g DF .text 0000012e Base id3_tag_findframe
00003fac g DF .text 00000053 Base id3_ucs4_utf16duplicate
00008288 g DF .text 000001f2 Base id3_frame_new
00007b73 g DF .text 000003c5 Base id3_compat_fixup
...
The slightly harder way is to use libelf and write a C/C++ program to list the symbols yourself. Have a look at the elfutils package, which is also built from the libelf source. There is a program called eu-readelf (the elfutils version of readelf, not to be confused with the binutils readelf). eu-readelf -s $LIB lists exported symbols using libelf, so you should be able to use that as a starting point.

Resources