I am currently trying to understand the translation of some simple C-Code into assembly by the clang compiler. However the following behaviour is confusing to me:
int a(void);
int b(void);
int a() {
return 1;
}
int b() {
return 2;
}
int c(){
return 3;
}
int main(int argc, char **argv) {
int (*procs[])(void) = {a,b};
int (*procs2[])(void) = {c,b};
...
gets translated to:
I figured out that the values at the addresses 0x4006XX hold the respective addresses of functions a, b and c. However I wonder why this extra step of using the 0x4006XX addresses is necessary (why not just use the literal address?). And even more curious as to why it uses two different addresses for the address of b.
I know this is probably an obscure question but any help is appreciated :)
It appears that your compiler generates position independent code. Position independent code can be loaded to an arbitrary address at runtime, making the addresses of functions and static variables unpredictable at compile time. The one thing that is predictable is the distance from the variable or function to the current instruction. The compiler uses the lea instruction to add the content of rip, the instruction pointer, to this distance to get the actual address. That's what you are seeing.
Related
I have a function called:
doSomething()
The way I understand this is: In assembly, this will jump to the function location and store the function's return address in some register so that after the function is done, the program counter can return to the main program.
How do I get this function's return address and put this inside a variable so that I can use it?
__builtin_return_address doesn't seem to be working. When I translate it to assembly, it doesn't know what to do. I believe I can't use GCC. I'm not even using printf since its not a part of the standard C library.
Currently, I'm trying to guess what its return address is by putting various numbers inside this variable.
With gcc you can use __builtin_return_address, see https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html
Another user here posted about using gcc which I agree with, but storing it as a variable is simply by using a void data type, kind of like this:
void *addr = __builtin_extract_return_addr (__builtin_return_address (0));
C and C++ support function pointers:
C Example:
int sum (int num1, int num2) {
return num1 + num2;
}
int (*f2p) (int, int);
int main (int argc, char (argv[]) {
f2p = sum;
printf ("sum=%d\n", f2p(10, 13));
return 0;
}
In C++, you'd use std::function.
Here is a good tutorial:
https://www.learncpp.com/cpp-tutorial/78-function-pointers/
You do NOT need to worry about the binary memory address of the function in order to use function pointers. If, for whatever reason you wanted; just cast the address (e.g. "fp2" above) to a number.
I'm suggesting "function pointers" to parameterize the callee's address.
As Persixty suggested, you can use setjmp()/longjmp() to parameterize the caller's address.
Of course, you can also use inline assembly to accomplish either.
Q: Does this help answer your question?
I want to know the length of C function (written by me) at runtime. Any method to get it? It seems sizeof doesn't work here.
There is a way to determine the size of a function. The command is:
nm -S <object_file_name>
This will return the sizes of each function inside the object file. Consult the manual pages in the GNU using 'man nm' to gather more information on this.
You can get this information from the linker if you are using a custom linker script. Add a linker section just for the given function, with linker symbols on either side:
mysec_start = .;
*(.mysection)
mysec_end = .;
Then you can specifically assign the function to that section. The difference between the symbols is the length of the function:
#include <stdio.h>
int i;
__attribute__((noinline, section(".mysection"))) void test_func (void)
{
i++;
}
int main (void)
{
extern unsigned char mysec_start[];
extern unsigned char mysec_end[];
printf ("Func len: %lu\n", mysec_end - mysec_start);
test_func ();
return 0;
}
This example is for GCC, but any C toolchain should have a way to specify which section to assign a function to. I would check the results against the assembly listing to verify that it's working the way you want it to.
There is no way in standard C to get the amount of memory occupied by a function.
I have just came up with a solution for the exact same problem but the code i have written is platform depended.
The idea behind, putting known opcodes at the end of the function and searching for them from start while counting bytes we have skipped.
Here is the medium link which i have explained with some code
https://medium.com/#gurhanpolat/calculate-c-function-size-x64-x86-c1f49921aa1a
Executables (at least ones which have debug info stripped) doesn't store function lengths in any way. So there's no possibility to parse this info in runtime by self. If you have to manipulate with functions, you should do something with your objects in linking phase or by accessing them as files from your executable. For example, you may tell linker to link symbol tables as ordinary data section into the executable, assign them some name, and parse when program runs. But remember, this would be specific to your linker and object format.
Also note, that function layout is also platform specific and there are some things that make the term "function length" unclear:
Functions may have store used constants in code sections directly after function code and access them using PC-relative addressing (ARM compilers do this).
Functions may have "prologs" and "epilogs" which may may be common to several functions and thus lie outside main body.
Function code may inline other function code
They all may count or not count in function length.
Also function may be completely inlined by compiler, so it loose its body.
A fully worked out solution without linker or dirty platform dependent tricks:
#include <stdio.h>
int i;
__attribute__((noinline, section("mysec"))) void test_func (void)
{
i++;
}
int main (void)
{
extern char __start_mysec[];
extern char __stop_mysec[];
printf ("Func len: %lu\n", __stop_mysec - __start_mysec);
test_func ();
return 0;
}
That's what you get when you read FazJaxton's answer with jakobbotsch's comment
In e.g. Codewarrior, you can place labels around a function, e.g.
label1:
void someFunc()
{
/* code goes here. */
}
label2:
and then calculate the size like (int)(label2-label1), but this is obviously very compiler dependent. Depending on your system and compiler, you may have to hack linker scripts, etc.
The start of the function is the function pointer, you already know that.
The problem is to find the end, but that can be done this way:
#include <time.h>
int foo(void)
{
int i = 0;
++i + time(0); // time(0) is to prevent optimizer from just doing: return 1;
return i;
}
int main(int argc, char *argv[])
{
return (int)((long)main - (long)foo);
}
It works here because the program has ONLY TWO functions so if the code is re-ordered (main implemented before foo) then you will get an irrelevant (negative) calculation, letting you know that it did not work this way but that it WOULD work if you move the foo() code into main() - just substract the main() size you got with the initial negative reply.
If the result is positive, then it will be correct -if no padding is done (yes, some compilers happily inflate the code, either for alignment or for other, less obvious reasons).
The ending (int)(long) cast is for portability between 32-bit and 64-bit code (function pointers will be longer on a 64-bit platform).
This is faily portable and should work reasonably well.
There's no facility defined within the C language itself to return the length of a function; there are simply too many variables involved (compiler, target instruction set, object file/executable file format, optimization settings, debug settings, etc.). The very same source code may result in functions of different sizes for different systems.
C simply doesn't provide any sort of reflection capability to support this kind of information (although individual compilers may supply extensions, such as the Codewarrior example cited by sskuce). If you need to know how many bytes your function takes up in memory, then you'll have to examine the generated object or executable file directly.
sizeof func won't work because the expression func is being treated as a pointer to the function, so you're getting the size of a pointer value, not the function itself.
Just subtract the address of your function from the address of the next function. But note it may not work on your system, so use it only if you
are 100% sure:
#include <stdint.h>
int function() {
return 0;
}
int function_end() {
return 0;
}
int main(void) {
intptr_t size = (intptr_t) function_end - (intptr_t) function;
}
There is no standard way of doing it either in C or C++. There might naturally exist implementation/platform-specific ways of doiung it, but I am not aware of any
size_t try_get_func_size_x86(void* pfn, bool check_prev_opcode = true, size_t max_opcodes_runout = 10000)
{
const unsigned char* op = (const unsigned char*)pfn;
for(int i = 0; i < max_opcodes_runout; i++, op++)
{
size_t sz_at = (size_t)(op - (const unsigned char*)pfn) + 1;
switch(*op)
{
case 0xC3: // ret Opcode
case 0xC2: // ret x Opcode
if(!check_prev_opcode)
return sz_at;
switch(*(op-1)) // Checking Previous Opcode
{
case 0x5D: // pop ebp
case 0x5B: // pop ebx
case 0x5E: // pop esi
case 0x5F: // pop edi
case 0xC9: // leave
return sz_at;
}
}
}
return 0;
}
You can find the length of your C function by subtracting the addresses of functions.
Let me provide you an example
int function1()
{
}
int function2()
{
int a,b; //just defining some variable for increasing the memory size
printf("This function would take more memory than earlier function i.e function01 ");
}
int main()
{
printf("Printing the address of function01 %p\n",function01);
printf("Printing the address of function02 %p\n",function02);
printf("Printing the address of main %p\n",main);
return 0;
}
Hope you would get your answer after compiling it. After compiling you will able to see the
difference in size of function01 and function2.
Note : Normally there is 16bytes diff between one function and other.
This question already has answers here:
How do I quiet the C compiler about a function pointer takes any number of arguments?
(5 answers)
Closed 9 years ago.
I'd like to ask if it's possible to make function pointer that can be assigned to any function later i have.
typedef struct
{
char *name ;
void (*func0)(void) ;
}option;
int test(int i)
{
return i;
}
How to cast either the option parameter or the function so later I'd be able to call the option parameter and use it as a function?
I tried:
op.func0= test ;
or:
(int)op.func0= test ;
failed.
In C you can store your function as
void * (*func)();
func = test;
But when you invoke it you have to cast it to the appropriate type
int ret = ((int (*)())func)(a);
This is not considered safe on arbitrary target architectures. On x86 it should work, but you should avoid those tricks on other architectures.
Edit:
If you wish to avoid the typecast warning then make the assignment as
func = (void * (*)())test;
With this typecast it compiles with gcc -pedantic
It is okay to cast your function to a different function pointer type, you just have to make sure to cast it back before using it. Here is an example:
#include <stdio.h>
typedef struct
{
char *name ;
void (*func0)(void) ;
} option;
int test(int i)
{
return i;
}
int main(int argc,char **argv)
{
option a;
int i;
a.func0 = (void(*)(void))test;
i = ((int(*)(int))a.func0)(5);
printf("%d\n",i);
return 0;
}
It outputs 5.
If you have a limited (i.e., practical) limit on the call signatures you may likely be a able to use union of pointers of different signatures:
typedef union call_sigs_union
{
void *void_void; // pure void pointer
int (*int_sig_void)(void); // returns integer with no arguments
int (*int_sig_int)(int); // returns integer with single int arg
. . .
} sigs_t;
The advantage of this is you better warnings out of the compiler if you are doing something wrong than if you cast everything.
Caveat: this is not guaranteed across many architectures as, in theory, different return types, and perhaps signatures, can have different pointer types. While I have read about a few systems this actually happens on I have not heard about the major players in popular use having such behavior. Intel X86, Itanimum, Sun SPARC, and IBM RISC are some of the well-behaved systems I have used. Noting I have read on the ARM architecture leads me to believe these would be trouble.
The odd behavior is usually limited to special-purpose processors. Can anyone refresh my memory on systems that this would break on?
I have the following piece of C code which prints the rip register and the address of a function foo. Running the executable multiple times results in the same values of rip and &foo being printed.
#include <stdio.h>
#include <inttypes.h>
void foo(int x) {
printf("foo sees %d\n", x);
}
int main(int argc, char *argv[]) {
uint64_t ip;
asm("leaq (%%rip), %0;": "=r"(ip));
printf("rip is 0x%016" PRIx64 "\n", ip);
void (*fp)(int) = &foo;
printf("foo is at offset %p\n", fp);
(*fp)(10);
return 0;
}
Q1: Why does rip remain the same?
Q2: Will &foo remain the same, provided the binary and machine remain the same?
Q3: When can &foo change?
Background: I am trying to store the execution times of functions in a history table. I am thinking of using the function address to index into the table and calculate deviations from previous executions.
Q1:
Depends on your platform. Some platforms load your program into a virtual address space, so the exact same code will have the exact same virtual address for foo (assuming the program and the OS's loader don't change between runs, and the loader isn't one that randomizes the load address per the comments). On other platforms that do not load your executable into a virtual address space, you may or may not get the same address depending on whether other programs have executed and/or terminated between runs.
Q2:
Don't count on it. If nothing changes at all, you will have deterministic behavior (same address). But there are many, many things that can change (again, dependent on the platform).
Q3:
They can change at any time on a platform that doesn't allocate a virtual address (as other processes start/continue doing work/terminate). On a platform that does allocate a virtual address, they addresses can change if your program or related libraries change at all, if there is an OS patch that changes loader behavior, or probably due to other circumstances I'm not thinking of at the moment.
Bottom Line
Storing the address may work for your very specific case, but it's a fragile solution.
Nothing is guaranteed.
The solution is to index using the function name, not its address (The C99 standard provides the __func__ identifier). That way your index is guaranteed to remain the same across all changes in OS, compiler, options, and phase of the moon. Until you refactor the function name, of course :-)
Since you're using Linux you can use dladdr() to ask about symbols near places in memory. For example:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
void foo() {
}
int main() {
Dl_info info;
void *test = foo; // Note: not standard C
dladdr(test, &info);
printf("closest symbol: %s in %s\n", info.dli_sname, info.dli_fname);
return 0;
}
when compiled with:
gcc -Wall -Wextra test.c -ldl -rdynamic
Correctly identifies the void* as foo, which will be correct no matter where foo gets loaded.
We are currently developing an application for a msp430 MCU, and are running into some weird problems. We discovered that declaring arrays withing a scope after declaration of "normal" variables, sometimes causes what seems to be undefined behavior. Like this:
foo(int a, int *b);
int main(void)
{
int x = 2;
int arr[5];
foo(x, arr);
return 0;
}
foo is passed a pointer as the second variable, that sometimes does not point to the arr array. We verify this by single stepping through the program, and see that the value of the arr array-as-a-pointer variable in the main scope is not the same as the value of the b pointer variable in the foo scope. And no, this is not really reproduceable, we have just observed this behavior once in a while.
This is observable even before a single line of the foo function is executed, the passed pointer parameter (b) is simply not pointing to the address that arr is.
Changing the example seems to solve the problem, like this:
foo(int a, int *b);
int main(void)
{
int arr[5];
int x = 2;
foo(x, arr);
return 0;
}
Does anybody have any input or hints as to why we experience this behavior? Or similar experiences? The MSP430 programming guide specifies that code should conform to the ANSI C89 spec. and so I was wondering if it says that arrays has to be declared before non-array variables?
Any input on this would be appreciated.
Update
#Adam Shiemke and tomlogic:
I'm wondering what C89 specifies about different ways of initializing values within declarations. Are you allowed to write something like:
int bar(void)
{
int x = 2;
int y;
foo(x);
}
And if so, what about:
int bar(int z)
{
int x = z;
int y;
foo(x);
}
Is that allowed? I assume the following must be illegal C89:
int bar(void)
{
int x = baz();
int y;
foo(x);
}
Thanks in advance.
Update 2
Problem solved. Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.
If anybody is interested I can post the complete example reproducing the problem, and the fix?
Thanks for all the input on this.
That looks like a compiler bug.
If you use your first example (the problematic one) and write your function call as foo(x, &arr[0]);, do you see the same results? What about if you initialize the array like int arr[5] = {0};? Neither of these should change anything, but if they do it would hint at a compiler bug.
In your updated question:
Basically we where disabling interrupts before calling the function (foo) and after declarations of variables. We where able to reproduce the problem in a simple example, and the solution seems to be to add a _NOP() statement after the disable interrupt call.
It sounds as if the interrupt disabling intrinsic/function/macro (or however interrupts are disabled) might be causing an instruction to be 'skipped' or something. I'd investigate whether it is coded/working correctly.
You should be able to determine if it is a compiler bug based on the assembly code that is produced. Is the assembly different when you change the order of the variable declarations? If your debugger allows you, try single stepping through the assembly.
If you do find a compiler bug, also, check your optimization. I have seen bugs like this introduced by the optimizer.
Both examples look to be conforming C89 to me. There should be no observable difference in behaviour assuming that foo isn't accessing beyond the bounds of the array.
For C89, the variables need to be declared in a list at the start of the scope prior to any assignment. C99 allows you to mix assignment an declaration. So:
{
int x;
int arr[5];
x=5;
...
is legal c89 style. I'm surprised your compiler didn't throw some sort of error on that if it doesn't support c99.
Assuming the real code is much more complex, heres some things i would check, keep in mind they are guesses:
Could you be overflowing the stack on occasion? If so could this be some artifact of "stack defense" by the compiler/uC? Does the incorrect value of &foo fall inside a predictable memory range? if so does that range have any significance (inside the stack, etc)?
Does the mcu430 have different ranges for ram and rom addressing? That is, is the address space for ram 16bit while the program address space 24bit? PIC's have such an architecture for example. If so it would be feasible that arr is getting allocated as rom (24bit) and the function expects a pointer to ram (16bit) the code would work when the arr was allocated in the first 16bit's of address space but brick if its above that range.
Maybe you have at some place in your program in illegal memory write which corrupts your stack.
Did you have a look at the disassembly?