get size of target array in systemtap - arrays

In an answer on a sister site, I'm trying to dump information from the Linux kernel array unix_socket_table#net/unix/af_unix.c which is defined as:
struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
For the moment, I'm hard-coding the size of the array in my stp script:
for (i = 0; i < 512; i++)
How could I avoid that? That information (the size of the array) is stored in the debug information. gdb can tell me it:
$ gdb --batch --ex 'whatis unix_socket_table' "/usr/lib/debug/boot/vmlinux-$(uname -r)"
type = struct hlist_head [512]
$ gdb --batch --ex 'p sizeof(unix_socket_table)/sizeof(*unix_socket_table)' "/usr/lib/debug/boot/vmlinux-$(uname -r)"
$1 = 512
But how would I do it in systemtap? AFAICT, systemtap has no sizeof() operator.

If it were a type, the #cast operator could be used:
size=&#cast(0,"$TYPENAME")[1]
but alas, unix_socket_table isn't a type. So, plan B, use symdata on the variable (in scope of any old kernel function in the vicinity).
probe begin /* kernel.function("*#net/unix/af_unix.c") */ {
println(symdata(& #var("unix_socket_table")))
exit()
}
results here:
unix_socket_table+0x0/0x1000 [kernel]
The second hex number is the symbol size, as computed from the ELF symbol tables at script processing time, equivalent to the 4096 figure here:
% readelf -s /usr/lib/debug/lib/modules/`uname -r`/vmlinux | grep unix_socket_table
71901: ffffffff82023dc0 4096 OBJECT GLOBAL DEFAULT 28 unix_socket_table
You can get the number with for instance:
probe begin {
tokenize(symdata(#var("unix_socket_table#net/unix/af_unix.c")),"/");
printf("%d\n", strtol(tokenize("",""), 16));
exit()
}

Many thanks to #fche for pointing me in the right direction. As he says, systemtap's symdata() function can be used to retrieve symbol information at a given address including the size. So we can write our own sizeof() function that parses it to extract the size as:
function sizeof(address:long) {
tokenize(symdata(address), "/");
return strtol(tokenize("",""),16);
}
If we look at the definition of that symdata() function, we see that it is itself a systemtap function that makes use of the _stp_snprint_addr() C function, itself calling _stp_kallsyms_lookup() to retrieve data. That means we can also define our own sizeof() using stp_kallsyms_lookup() directly:
function sizeof:long (addr:long) %{ /* pure */ /* pragma:symbols */
STAP_RETVALUE = -1;
_stp_kallsyms_lookup(STAP_ARG_addr, (unsigned long*)&(STAP_RETVALUE), NULL, NULL, NULL);
%}
(note that we need -g (guru) here as we're using embedded C).
Now, to get the array size, we need the size of the elements of the arrays. One approach can be to use the address offset between 2 elements of the array. So we could define our array_size() function as:
function array_size(first:long, second:long) {
return sizeof(first) / (second - first);
}
(where sizeof() is one or the other of the functions defined above).
And call it as:
probe begin {
printf("%d\n", array_size(
&#var("unix_socket_table#net/unix/af_unix.c")[0],
&#var("unix_socket_table#net/unix/af_unix.c")[1]));
exit();
}
Which gives us 512 as expected.
For sizeof(), another approach could be to use the C sizeof() operator:
$ sudo stap -ge '
%{ #include <net/af_unix.h> %}
probe begin {
printf("%d\n", %{ sizeof(unix_socket_table)/sizeof(unix_socket_table[0]) %} );
exit();
}'
512
(also needs -g) but then the information is retrieved from the kernel source code (header files), not debug information, so while that would work for kernel arrays that are defined in header files, that approach won't necessary work for all arrays.

Related

Function address in a Position Independent Executable

I created a small unit test library in C.
Its main feature is the fact that you don't need to register your test functions, they are identified as test functions because they have a predefined prefix (test_).
For example, if you want to create a test function, you can write something like this:
int test_abc(void *t)
{
...
}
Yes, just like in Go.
To find the test functions, the runner:
takes the name of the executable from argv[0];
parses the ELF sections to find the symbol table;
from the symbol table, takes all the functions named test_*;
treats the addresses from the symbol table as function pointers;
invoke the test functions.
For PIE binaries, there is one additional step. To find the load address for the test functions, I assume there is a common offset that applies to all functions. To figure out the offset, I subtract the address of main (runtime, function pointer) from the address of main read from the symbol table.
All the things described above are working fine: https://github.com/rodrigo-dc/testprefix
However, as far as I understood, function pointer arithmetic is not allowed by the C99 standard.
Given that I have the address from the symbol table - Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?
I was hoping for some linker variable, some base address, or anything like that.
Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?
Yes: see this answer, and also the comment about using dladdr().
P.S. Note that taking address of main in C++ is not allowed.
Because you have an ELF executable, this probably precludes "funny" architectures (e.g. Intel 8051, PIC, etc.) that might have segmented or non-linear, non-contiguous address spaces.
So, you [probably] can use the method you've described with main to get the actual address. You just need to convert to/from either char * or uintptr_t types so you are using byte offsets/differences.
But, you can also create a unified table of pointers to the various functions using by creating descriptor structs that are placed in a special linker section of your choosing using (e.g.) __attribute__((section("mysection"))
Here is some code that shows what I mean:
#include <stdio.h>
typedef struct {
int (*test_func)(void *); // pointer to test function
const char *test_name; // name of the test
int test_retval; // test return value
// more data ...
int test_xtra;
} testctl_t;
// define a struct instance for a given test
#define ATTACH_TEST(_func) \
testctl_t _func##_ctl __attribute__((section("testctl"))) = { \
.test_func = _func, \
.test_name = #_func \
}
// advance to next struct (must be 16 byte aligned)
#define TESTNEXT(_test) \
(testctl_t *) (((char *) _test) + asiz)
int
test_abc(void *t)
{
printf("test_abc: hello\n");
return 1;
}
ATTACH_TEST(test_abc);
int
test_def(void *t)
{
printf("test_def: hello\n");
return 2;
}
ATTACH_TEST(test_def);
int
main(void)
{
// these are special symbols defined by the linker for our special linker
// section that denote the start/end of the section (similar to
// _etext/_edata)
extern testctl_t __start_testctl;
extern testctl_t __stop_testctl;
size_t rsiz = sizeof(testctl_t);
size_t asiz;
testctl_t *test;
// align the size to a 16 byte boundary
asiz = rsiz;
asiz += 15;
asiz /= 16;
asiz *= 16;
// show the struct sizes
printf("main: sizeof(testctl_t)=%zx/%zx\n",rsiz,asiz);
// section start and stop symbol addresses
printf("main: start=%p stop=%p\n",&__start_testctl,&__stop_testctl);
// cross check of expected pointer values
printf("main: test_abc=%p test_abc_ctl=%p\n",test_abc,&test_abc_ctl);
printf("main: test_def=%p test_def_ctl=%p\n",test_def,&test_def_ctl);
for (test = &__start_testctl; test < &__stop_testctl;
test = TESTNEXT(test)) {
printf("\n");
// show the address of our test descriptor struct and the pointer to
// the function
printf("main: test=%p test_func=%p\n",test,test->test_func);
printf("main: calling %s ...\n",test->test_name);
test->test_retval = test->test_func(test);
printf("main: return is %d\n",test->test_retval);
}
return 0;
}
Here is the program output:
main: sizeof(testctl_t)=18/20
main: start=0x404040 stop=0x404078
main: test_abc=0x401146 test_abc_ctl=0x404040
main: test_def=0x401163 test_def_ctl=0x404060
main: test=0x404040 test_func=0x401146
main: calling test_abc ...
test_abc: hello
main: return is 1
main: test=0x404060 test_func=0x401163
main: calling test_def ...
test_def: hello
main: return is 2

How do I print IP addresses with bpf_trace_printk()?

I want to print IP addresses of packets parsed by an XDP program I am testing with. I using bpf_trace_printk() to print details about packets parsed by my program.
How can I print IP addresses with bpf_trace_print()?
I tried using this suggestion to print the IP, but I get this error when trying to use bpf_trace_printk()
/virtual/main.c:99:52: warning: cannot use more than 3 conversion specifiers
bpf_trace_printk("\n- src_ip: %d.%d.%d.%d\n", src_ipaddr[3],src_ipaddr[2],src_ipaddr[1],src_ipaddr[0]);
^
6 warnings generated.
error: /virtual/main.c:111:59: in function filter i32 (%struct.xdp_md*): too many args to 0x5b6de28: i64 = Constant<6>
Its not clear to my why I am getting this error.
bpf_trace_printk is meant for debugging only. It will print a large warning in your system logs when you use it. If you're at the stage where you want to pretty-print IP addresses, then you're probably not debugging anymore.
The proper alternative is to use the bpf_perf_event_output BPF helper. See https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md#lesson-7-hello_perf_outputpy for an example with bcc. That will allow you to send arbitrary data to userspace, where you can pretty-print the IP addresses with Python.
Linux kernel provides BPF helper, bpf_trace_printk(), with the following definition:
long bpf_trace_printk(const char *fmt, __u32 fmt_size, ...);
So you need to define size of format string before your arguments.
A hard limitation is that bpf_trace_printk() can accept only up to 5 input arguments in total. You must define fmt and fmt_size, which means you limit to use just 3 other arguments. So 4 specifiers is not allowed in format string. This is quite often pretty limiting and you might need to use multiple bpf_trace_printk() invocations to log all the data.
for(int i=0; i<4; i++)
bpf_trace_printk("IP section %d [%d]", strlen("IP section %d [%d]"),i,src_ipaddr[i]);
or use #craig-estey method
bpf_trace_printk("\n- src_ip:");
for (int idx = 3; idx >= 0; --idx)
bpf_trace_printk("%c%d",(idx == 0) ? ' ' : '.',src_ipaddr[idx]);
bpf_trace_printk("\n");
I'm not sure but you may use sprintf
char part1Ip[32] = {0};
char part2Ip[32] = {0};
char wholeIp[32] = {0};
sprintf(part1Ip, "%d.%d", src_ipaddr[3],src_ipaddr[2]);
sprintf(part2Ip, "%d.%d", src_ipaddr[1],src_ipaddr[0]);
sprintf(wholeIp, "%s.%s", part1Ip,part2Ip);
bpf_trace_printk("\n- src_ip: %s\n", wholeIp);
For more information see BPF tips & tricks.

How to share variables between two functions without declaring it a global variable in C language

I've been reading through a lot of answers, and there are a lot of opinions about this but I wasn't able to find a code that answers my question (I found a lot of code that answers "how to share variables by declaring")
Here's the situation:
Working with embedded systems
Using IAR workbench systems
STM32F4xx HAL drivers
Declaring global variables is not an option (Edit: Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them)
C language
in case this is important: 2 .c files, and 1 .h is included in both
Now that's out of the way, let me write an example.
file1.c - Monitoring
void function(){
uint8_t varFlag[10]; // 10 devices
for (uint8_t i = 0; i < 10; i++)
{
while (timeout <= 0){
varFlag[i] = 1;
// wait for response. We'll know by the ack() function
// if response back from RX,
// then varFlag[i] = 0;
}
file2.c - RX side
// listening... once indicated, this function is called
// ack is not called in function(), it is called when
// there's notification that there is a received message
// (otherwise I would be able to use a pointer to change
// the value of varFlag[]
void ack(uint8_t indexDevice)
{
// indexDevice = which device was acknowledged? we have 10 devices
// goal here is to somehow do varFlag[indexDevice] = 0
// where varFlag[] is declared in the function()
}
You share values or data, not variables. Stricto sensu, variables do not exist at runtime; only the compiler knows them (at most, with -g, it might put some metadata such as offset & type of locals in the debugging section -which is usually stripped in production code- of the executable). Ther linker symbol table (for global variables) can, and often is, stripped in a embedded released ELF binary. At runtime you have some data segment, and probably a call stack made of call frames (which hold some local variables, i.e. their values, in some slots). At runtime only locations are relevant.
(some embedded processors have severe restrictions on their call stack; other have limited RAM, or scratchpad memory; so it would be helpful to know what actual processor & ISA you are targeting, and have an idea of how much RAM you have)
So have some global variables keeping these shared values (perhaps indirectly thru some pointers and data structures), or pass these values (perhaps indirectly, likewise...) thru arguments.
So if you want to share the ten bytes varFlag[10] array:
it looks like you don't want to declare uint8_t varFlag[10]; as a global (or static) variable. Are you sure you really should not (these ten bytes have to sit somewhere, and they do consume some RAM anyway, perhaps in your call stack....)?
pass the varFlag (array, decayed to pointer when passed as argument) as an argument, so perhaps declare:
void ack(uint8_t indexDevice, uint8_t*flags);
and call ack(3,varFlag) from function...
or declare a global pointer:
uint8_t*globflags;
and set it (using globflags = varFlag;) at the start of the function declaring varFlag as a local variable, and clear if (using globflags = NULL;) at the end of that function.
I would suggest you to look at the assembler code produced by your compiler (with GCC you might compile with gcc -S -Os -fverbose-asm -fstack-usage ....). I also strongly suggest you to get your code reviewed by a colleague...
PS. Perhaps you should use GCC or Clang/LLVM as a cross-compiler, and perhaps your IAR is actually using such a compiler...
Your argument for not using global variables:
Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them
confuses lifetime with scope. Variables with static lifetime occupy memory permanently regardless of scope (or visibility). A variable with global scope happens to also be statically allocated, but then so is any other static variable.
In order to share a variable across contexts it must necessarily be static, so there is no memory saving by avoiding global variables. There are however plenty of other stronger arguments for avoiding global variables and you should read A Pox on Globals by Jack Ganssle.
C supports three-levels of scope:
function (inside a function)
translation-unit (static linkage, outside a function)
global (external linkage)
The second of these allows variable to be directly visible amongst functions in the same source file, while external linkage allows direct visibility between multiple source files. However you want to avoid direct access in most cases since that is the root of the fundamental problem with global variables. You can do this using accessor functions; to use your example you might add a file3.c containing:
#include "file3.h"
static uint8_t varFlag[10];
void setFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 1 ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 0 ;
}
}
uint8_t getFlag( size_t n )
{
return varFlag[n] == 0 ? 0 : 1 ;
}
With an associated header file3.h
#if !defined FILE3_INCLUDE
#define FILE3_INCLUDE
void setFlag( size_t n ) ;
void clrFlag( size_t n ) ;
uint8_t getFlag( size_t n ) ;
#endif
which file1.c and file2.c include so they can access varFlag[] via the accessor functions. The benefits include:
varFlag[] is not directly accessible
the functions can enforce valid values
in a debugger you can set a breakpoint catch specifically set, clear or read access form anywhere in the code.
the internal data representation is hidden
Critically the avoidance of a global variable does not save you memory - the data is still statically allocated - because you cannot get something for nothing varFlag[] has to exist, even if it is not visible. That said, the last point about internal representation does provide a potential for storage efficiency, because you could change your flag representation from uint8_t to single bit-flags without having to change interface to the data or the accessing the accessing code:
#include <limits.h>
#include "file3.h"
static uint16_t varFlags ;
void setFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags |= 0x0001 << n ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags &= ~(0x0001 << n) ;
}
}
uint8_t getFlag( size_t n )
{
return (varFlags & (0x0001 << n)) == 0 ? 0 : 1 ;
}
There are further opportunities to produce robust code, for example you might make only the read accessor (getter) publicly visible and hide the so that all but one translation unit has read-only access.
Put the functions into a seperate translation unit and use a static variable:
static type var_to_share = ...;
void function() {
...
}
void ack() {
...
}
Note that I said translation unit, not file. You can do some #include magic (in the cleanest way possible) to keep both function definitions apart.
Unfortunately you can't in C.
The only way to do such thing is with assemply.

Strange load instructions produced by mipsel-gcc when compiling glibc

I'm trying to get a small piece of hello-world MIPS program running in Gem 5 simulator. The program was compiled with gcc 4.9.2 and glibc 2.19 (built by crosstool-ng) and runs well in qemu, but it crashed with a page fault (trying to access address 0) in gem5.
Code is rather simple:
#include <stdio.h>
int main()
{
printf("hello, world\n");
return 0;
}
file ./test result:
./test: ELF 32-bit LSB executable, MIPS, MIPS-I version 1, statically
linked, for GNU/Linux 3.15.4, not stripped
After some debugging with gdb, I figured out that the page fault is triggered by _dl_setup_stack_chk_guard function in glibc. It accepts a void pointer called _dl_random passed by __libc_start_main function, which happens to be NULL. However, as far as I know, these functions never dereference the pointer, but instructions were generated to load values from the memory _dl_random pointer points to. Some code pieces might help understanding:
in function __libc_start_main (macro THREAD_SET_STACK_GUARD is not set):
/* Initialize the thread library at least a bit since the libgcc
functions are using thread functions if these are available and
we need to setup errno. */
__pthread_initialize_minimal ();
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif
in function _dl_setup_stack_chk_guard (always inlined):
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
union
{
uintptr_t num;
unsigned char bytes[sizeof (uintptr_t)];
} ret = { 0 };
if (dl_random == NULL)
{
ret.bytes[sizeof (ret) - 1] = 255;
ret.bytes[sizeof (ret) - 2] = '\n';
}
else
{
memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
}
return ret.num;
}
disassembly code:
0x00400ea4 <+228>: jal 0x4014b4 <__pthread_initialize_minimal>
0x00400ea8 <+232>: nop
0x00400eac <+236>: lui v0,0x4a
0x00400eb0 <+240>: lw v0,6232(v0)
0x00400eb4 <+244>: li a0,-256
0x00400eb8 <+248>: lwl v1,3(v0)
0x00400ebc <+252>: lwr v1,0(v0)
0x00400ec0 <+256>: addiu v0,v0,4
0x00400ec4 <+260>: and v1,v1,a0
0x00400ec8 <+264>: lui a0,0x4a
0x00400ecc <+268>: sw v1,6228(a0)
0x4a1858 (0x4a0000 + 6232) is the address of _dl_random
0x4a1854 (0x4a0000 + 6228) is the address of __stack_chk_guard
Page fault occurs at 0x00400eb8. I don't quite get it how instruction 0x00400eb8 and 0x00400ebc are generated. Could someone shed some light on it please? Thanks.
Here is how I find the root of this problem and my suggestion for solution.
I think it helpful to dive into the Glibc source code to see what really happens. Starting from _dl_random or __libc_start_main are both OK.
As the value of _dl_random is unexpectedly NULL, we need to find how this variable initialize and where it is assigned. With the help of code analysing tools, we can find _dl_random in Glibc is only assigned with meaningful value in function _dl_aux_init, and this function is called by __libc_start_min.
_dl_aux_init iterates on its parameter -- auxvec -- and acts corresponding to auxvec[i].at_type. AT_RANDOM is the case for the assignment of _dl_random. So the problem is that there isn't an AT_RANDOM element to make _dl_random assigned.
As the program runs well in user mode qemu, the root of this problem resides in system environment provider, say, gem5, which has the responsibility to construct auxvec. Having that keyword, we can find that the auxv is constructed in gem5/src/arch/<arch-name>/process.cc.
The current auxv for MIPS is constructed as below:
// Set the system page size
auxv.push_back(auxv_t(M5_AT_PAGESZ, MipsISA::PageBytes));
// Set the frequency at which time() increments
auxv.push_back(auxv_t(M5_AT_CLKTCK, 100));
// For statically linked executables, this is the virtual
// address of the program header tables if they appear in the
// executable image.
auxv.push_back(auxv_t(M5_AT_PHDR, elfObject->programHeaderTable()));
DPRINTF(Loader, "auxv at PHDR %08p\n", elfObject->programHeaderTable());
// This is the size of a program header entry from the elf file.
auxv.push_back(auxv_t(M5_AT_PHENT, elfObject->programHeaderSize()));
// This is the number of program headers from the original elf file.
auxv.push_back(auxv_t(M5_AT_PHNUM, elfObject->programHeaderCount()));
//The entry point to the program
auxv.push_back(auxv_t(M5_AT_ENTRY, objFile->entryPoint()));
//Different user and group IDs
auxv.push_back(auxv_t(M5_AT_UID, uid()));
auxv.push_back(auxv_t(M5_AT_EUID, euid()));
auxv.push_back(auxv_t(M5_AT_GID, gid()));
auxv.push_back(auxv_t(M5_AT_EGID, egid()));
Now we know what to do. We just need to provide an accessible address value to _dl_random tagged by MT_AT_RANDOM. Gem5's ARM arch implements this already (code). Maybe we can take it as an example.

How to view Linux memory map info in C?

I'm dynamically loading some Linux libraries in C.
I can get the start addresses of the libraries using the
dlinfo
(see 1).
I can't find any information to get the size of a library, however.
The only thing that I've found is that one must read the
/proc/[pid]/maps
file and parse it for the relevant information (see 2).
Is there a more elegant method?
(This answer is LINUX/GLIBC specific)
According to http://s.eresi-project.org/inc/articles/elf-rtld.txt
there are link_map *map; map->l_map_start & map->l_map_end
/*
** Start and finish of memory map for this object.
** l_map_start need not be the same as l_addr.
*/
ElfW(Addr) l_map_start, l_map_end;
It is a bit not exact, as said here http://www.cygwin.com/ml/libc-hacker/2007-06/msg00014.html
= some libraries are not continous in memory; the letter linked has some examples... e.g. this is the very internal (to rtld) function to detect is the given address inside lib's address space or not, based on link_map and direct working with ELF segments:
/* Return non-zero if ADDR lies within one of L's segments. */
int
internal_function
_dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
{
int n = l->l_phnum;
const ElfW(Addr) reladdr = addr - l->l_addr;
while (--n >= 0)
if (l->l_phdr[n].p_type == PT_LOAD
&& reladdr - l->l_phdr[n].p_vaddr >= 0
&& reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
return 1;
return 0;
}
And this function is the Other alternative, which is to find program headers/ or section headers of ELF loaded (there are some links to such information in link_map)
And the easiest is to use some stat syscall with map->l_name - to read file size from the disk (inexact in detecting huge bss section).
Parsing /proc/self/maps (or perhaps popen-ing a pmap command) seems still the easiest thing to me. And there is also the dladdr function (provided you have some adress to start with).

Resources