getting address of particular instruction in a function [duplicate] - c

I want to know the length of C function (written by me) at runtime. Any method to get it? It seems sizeof doesn't work here.

There is a way to determine the size of a function. The command is:
nm -S <object_file_name>
This will return the sizes of each function inside the object file. Consult the manual pages in the GNU using 'man nm' to gather more information on this.

You can get this information from the linker if you are using a custom linker script. Add a linker section just for the given function, with linker symbols on either side:
mysec_start = .;
*(.mysection)
mysec_end = .;
Then you can specifically assign the function to that section. The difference between the symbols is the length of the function:
#include <stdio.h>
int i;
__attribute__((noinline, section(".mysection"))) void test_func (void)
{
i++;
}
int main (void)
{
extern unsigned char mysec_start[];
extern unsigned char mysec_end[];
printf ("Func len: %lu\n", mysec_end - mysec_start);
test_func ();
return 0;
}
This example is for GCC, but any C toolchain should have a way to specify which section to assign a function to. I would check the results against the assembly listing to verify that it's working the way you want it to.

There is no way in standard C to get the amount of memory occupied by a function.

I have just came up with a solution for the exact same problem but the code i have written is platform depended.
The idea behind, putting known opcodes at the end of the function and searching for them from start while counting bytes we have skipped.
Here is the medium link which i have explained with some code
https://medium.com/#gurhanpolat/calculate-c-function-size-x64-x86-c1f49921aa1a

Executables (at least ones which have debug info stripped) doesn't store function lengths in any way. So there's no possibility to parse this info in runtime by self. If you have to manipulate with functions, you should do something with your objects in linking phase or by accessing them as files from your executable. For example, you may tell linker to link symbol tables as ordinary data section into the executable, assign them some name, and parse when program runs. But remember, this would be specific to your linker and object format.
Also note, that function layout is also platform specific and there are some things that make the term "function length" unclear:
Functions may have store used constants in code sections directly after function code and access them using PC-relative addressing (ARM compilers do this).
Functions may have "prologs" and "epilogs" which may may be common to several functions and thus lie outside main body.
Function code may inline other function code
They all may count or not count in function length.
Also function may be completely inlined by compiler, so it loose its body.

A fully worked out solution without linker or dirty platform dependent tricks:
#include <stdio.h>
int i;
__attribute__((noinline, section("mysec"))) void test_func (void)
{
i++;
}
int main (void)
{
extern char __start_mysec[];
extern char __stop_mysec[];
printf ("Func len: %lu\n", __stop_mysec - __start_mysec);
test_func ();
return 0;
}
That's what you get when you read FazJaxton's answer with jakobbotsch's comment

In e.g. Codewarrior, you can place labels around a function, e.g.
label1:
void someFunc()
{
/* code goes here. */
}
label2:
and then calculate the size like (int)(label2-label1), but this is obviously very compiler dependent. Depending on your system and compiler, you may have to hack linker scripts, etc.

The start of the function is the function pointer, you already know that.
The problem is to find the end, but that can be done this way:
#include <time.h>
int foo(void)
{
int i = 0;
++i + time(0); // time(0) is to prevent optimizer from just doing: return 1;
return i;
}
int main(int argc, char *argv[])
{
return (int)((long)main - (long)foo);
}
It works here because the program has ONLY TWO functions so if the code is re-ordered (main implemented before foo) then you will get an irrelevant (negative) calculation, letting you know that it did not work this way but that it WOULD work if you move the foo() code into main() - just substract the main() size you got with the initial negative reply.
If the result is positive, then it will be correct -if no padding is done (yes, some compilers happily inflate the code, either for alignment or for other, less obvious reasons).
The ending (int)(long) cast is for portability between 32-bit and 64-bit code (function pointers will be longer on a 64-bit platform).
This is faily portable and should work reasonably well.

There's no facility defined within the C language itself to return the length of a function; there are simply too many variables involved (compiler, target instruction set, object file/executable file format, optimization settings, debug settings, etc.). The very same source code may result in functions of different sizes for different systems.
C simply doesn't provide any sort of reflection capability to support this kind of information (although individual compilers may supply extensions, such as the Codewarrior example cited by sskuce). If you need to know how many bytes your function takes up in memory, then you'll have to examine the generated object or executable file directly.
sizeof func won't work because the expression func is being treated as a pointer to the function, so you're getting the size of a pointer value, not the function itself.

Just subtract the address of your function from the address of the next function. But note it may not work on your system, so use it only if you
are 100% sure:
#include <stdint.h>
int function() {
return 0;
}
int function_end() {
return 0;
}
int main(void) {
intptr_t size = (intptr_t) function_end - (intptr_t) function;
}

There is no standard way of doing it either in C or C++. There might naturally exist implementation/platform-specific ways of doiung it, but I am not aware of any

size_t try_get_func_size_x86(void* pfn, bool check_prev_opcode = true, size_t max_opcodes_runout = 10000)
{
const unsigned char* op = (const unsigned char*)pfn;
for(int i = 0; i < max_opcodes_runout; i++, op++)
{
size_t sz_at = (size_t)(op - (const unsigned char*)pfn) + 1;
switch(*op)
{
case 0xC3: // ret Opcode
case 0xC2: // ret x Opcode
if(!check_prev_opcode)
return sz_at;
switch(*(op-1)) // Checking Previous Opcode
{
case 0x5D: // pop ebp
case 0x5B: // pop ebx
case 0x5E: // pop esi
case 0x5F: // pop edi
case 0xC9: // leave
return sz_at;
}
}
}
return 0;
}

You can find the length of your C function by subtracting the addresses of functions.
Let me provide you an example
int function1()
{
}
int function2()
{
int a,b; //just defining some variable for increasing the memory size
printf("This function would take more memory than earlier function i.e function01 ");
}
int main()
{
printf("Printing the address of function01 %p\n",function01);
printf("Printing the address of function02 %p\n",function02);
printf("Printing the address of main %p\n",main);
return 0;
}
Hope you would get your answer after compiling it. After compiling you will able to see the
difference in size of function01 and function2.
Note : Normally there is 16bytes diff between one function and other.

Related

How to get the return address of a function in C?

I have a function called:
doSomething()
The way I understand this is: In assembly, this will jump to the function location and store the function's return address in some register so that after the function is done, the program counter can return to the main program.
How do I get this function's return address and put this inside a variable so that I can use it?
__builtin_return_address doesn't seem to be working. When I translate it to assembly, it doesn't know what to do. I believe I can't use GCC. I'm not even using printf since its not a part of the standard C library.
Currently, I'm trying to guess what its return address is by putting various numbers inside this variable.
With gcc you can use __builtin_return_address, see https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html
Another user here posted about using gcc which I agree with, but storing it as a variable is simply by using a void data type, kind of like this:
void *addr = __builtin_extract_return_addr (__builtin_return_address (0));
C and C++ support function pointers:
C Example:
int sum (int num1, int num2) {
return num1 + num2;
}
int (*f2p) (int, int);
int main (int argc, char (argv[]) {
f2p = sum;
printf ("sum=%d\n", f2p(10, 13));
return 0;
}
In C++, you'd use std::function.
Here is a good tutorial:
https://www.learncpp.com/cpp-tutorial/78-function-pointers/
You do NOT need to worry about the binary memory address of the function in order to use function pointers. If, for whatever reason you wanted; just cast the address (e.g. "fp2" above) to a number.
I'm suggesting "function pointers" to parameterize the callee's address.
As Persixty suggested, you can use setjmp()/longjmp() to parameterize the caller's address.
Of course, you can also use inline assembly to accomplish either.
Q: Does this help answer your question?

frama-c slicing plugin appears to discard used stack values

Problem description
I'm developing a frama-c plugin that uses the slicing plugin as a library to remove unused bits of automatically generated code. Unfortunately the slicing plugin drops a bunch of stack values, which are actually used. They are used in so far as their addresses are contained in structures that are handed of to abstract external functions.
Simple example
This is a simpler example that models the same general structure I have.
/* Abstract external function */
void some_function(int* ints[]);
int main() {
int i;
int *p = &i;
int *a[] = { &p };
some_function(a);
return 0;
}
When slicing this example with frama-c-gui -slice-calls some_function experiment_slicing.c (I haven't figures out how to see the slicing output when invoking the command line without gui) it drops everything but the declaration int *a[]; and the call to some_function.
Attempted fixes
I tried fixing it by adding ACSL annotations.
However what I believed to be the sensible specification (see below) did not work
/*# requires \valid(ints) && \valid(ints[0]);
*/
void some_function(int* ints[]);
I then tried with an assign (see below) which does have the desired behaviour. however it is not a correct specification, since the function never actually writes to the pointer but needs to read it for correct functionality. I am worried that if I move ahead with such an incorrect specification it will lead to weird problems down the line.
/*# requires \valid(ints) && \valid(ints[0]);
assigns *ints;
*/
void some_function(int* ints[]);
You are on the right track: it is the assigns clause that you should use here: it will indicate which parts of the memory state are concerned by a call to an undefined function. However, you need to provide a complete assigns clause, with its \from part (that indicates which memory location are read to compute the new value of the memory location written to).
I have added an int variable to your example, as your function isn't returning a result (void return type). For a function that is returning something, you should also have a clause assigns \result \from ...;:
int x;
/*# assigns x \from indirect:ints[..], *(ints[..]); */
void some_function(int* ints[]);
int main() {
int i;
int*p = &i;
int *a[] = { &p };
some_function(a);
return 0;
}
The assigns clause indicates that some_function might change the value of x, and that the new value will be computed from the addresses stored in ints[..]
(the indirect label tells that we're not using their value directly, this is described in more detail in section 8.2 of Eva's manual), and their content.
using frama-c -slice-calls some_function file.c -then-last -print (the last arguments are here to print the resulting file on the standard output: -then-last indicates that the following options should operate on the last Frama-C project created, in that case the one resulting from the slicing, and -print prints the C code of said project. You may also use -ocode output.c to redirect the pretty-printing of the code into output.c.) gives the following result:
* Generated by Frama-C */
void some_function(int **ints);
void main(void)
{
int i;
int *p = & i;
int *a[1] = {(int *)(& p)};
some_function(a);
return;
}
Note in addition that your example is not well-typed: &p is a pointer to pointer to int, and should thus be stored in an int** array, not an int* array. But I assume that it only stems from reducing your original example and is does not matter much for slicing itself.

Preinitialized function pointers in compiled binary?

I am currently trying to understand the translation of some simple C-Code into assembly by the clang compiler. However the following behaviour is confusing to me:
int a(void);
int b(void);
int a() {
return 1;
}
int b() {
return 2;
}
int c(){
return 3;
}
int main(int argc, char **argv) {
int (*procs[])(void) = {a,b};
int (*procs2[])(void) = {c,b};
...
gets translated to:
I figured out that the values at the addresses 0x4006XX hold the respective addresses of functions a, b and c. However I wonder why this extra step of using the 0x4006XX addresses is necessary (why not just use the literal address?). And even more curious as to why it uses two different addresses for the address of b.
I know this is probably an obscure question but any help is appreciated :)
It appears that your compiler generates position independent code. Position independent code can be loaded to an arbitrary address at runtime, making the addresses of functions and static variables unpredictable at compile time. The one thing that is predictable is the distance from the variable or function to the current instruction. The compiler uses the lea instruction to add the content of rip, the instruction pointer, to this distance to get the actual address. That's what you are seeing.

Direct access to the function stack

I previously asked a question about C functions which take an unspecified number of parameters e.g. void foo() { /* code here */ } and which can be called with an unspecified number of arguments of unspecified type.
When I asked whether it is possible for a function like void foo() { /* code here */ } to get the parameters with which it was called e.g. foo(42, "random") somebody said that:
The only you can do is to use the calling conventions and knowledge of the architecture you are running at and get parameters directly from the stack. source
My question is:
If I have this function
void foo()
{
// get the parameters here
};
And I call it: foo("dummy1", "dummy2") is it possible to get the 2 parameters inside the foo function directly from the stack?
If yes, how? Is it possible to have access to the full stack? For example if I call a function recursively, is it possible to have access to each function state somehow?
If not, what's the point with the functions with unspecified number of parameters? Is this a bug in the C programming language? In which cases would anyone want foo("dummy1", "dummy2") to compile and run fine for a function which header is void foo()?
Lots of 'if's:
You stick to one version of a compiler.
One set of compiler options.
Somehow manage to convince your compiler to never pass arguments in registers.
Convince your compiler not to treat two calls f(5, "foo") and f(&i, 3.14) with different arguments to the same function as error. (This used to be a feature of, for example, the early DeSmet C compilers).
Then the activation record of a function is predictable (ie you look at the generated assembly and assume it will always be the same): the return address will be there somewhere and the saved bp (base pointer, if your architecture has one), and the sequence of the arguments will be the same. So how would you know what actual parameters were passed? You will have to encode them (their size, offset), presumably in the first argument, sort of what printf does.
Recursion (ie being in a recursive call makes no difference) each instance has its activation record (did I say you have to convince your compiler never optimise tail calls?), but in C, unlike in Pascal, you don't have a link backwards to the caller's activation record (ie local variables) since there are no nested function declarations. Getting access to the full stack ie all the activation records before the current instance is pretty tedious, error prone and mostly interest to writers of malicious code who would like to manipulate the return address.
So that's a lot of hassle and assumptions for essentially nothing.
Yes you can access passed parameters directly via stack. But no, you can't use old-style function definition to create function with variable number and type of parameters. Following code shows how to access a param via stack pointer. It is totally platform dependent , so i have no clue if it going to work on your machine or not, but you can get the idea
long foo();
int main(void)
{
printf( "%lu",foo(7));
}
long foo(x)
long x;
{
register void* sp asm("rsp");
printf("rsp = %p rsp_ value = %lx\n",sp+8, *((long*)(sp + 8)));
return *((long*)(sp + 8)) + 12;
}
get stack head pointer (rsp register on my machine)
add the offset of passed parameter to rsp => you get pointer to long x on stack
dereference the pointer, add 12 (do whatever you need) and return the value.
The offset is the issue since it depends on compiler, OS, and who knows on what else.
For this example i simple checked checked it in debugger, but if it really important for you i think you can come with some "general" for your machine solution.
If you declare void foo(), then you will get a compilation error for foo("dummy1", "dummy2").
You can declare a function that takes an unspecified number of arguments as follows (for example):
int func(char x,...);
As you can see, at least one argument must be specified. This is so that inside the function, you will be able to access all the arguments that follow the last specified argument.
Suppose you have the following call:
short y = 1000;
int sum = func(1,y,5000,"abc");
Here is how you can implement func and access each of the unspecified arguments:
int func(char x,...)
{
short y = (short)((int*)&x+1)[0]; // y = 1000
int z = (int )((int*)&x+2)[0]; // z = 5000
char* s = (char*)((int*)&x+3)[0]; // s[0...2] = "abc"
return x+y+z+s[0]; // 1+1000+5000+'a' = 6098
}
The problem here, as you can see, is that the type of each argument and the total number of arguments are unknown. So any call to func with an "inappropriate" list of arguments, may (and probably will) result in a runtime exception.
Hence, typically, the first argument is a string (const char*) which indicates the type of each of the following arguments, as well as the total number of arguments. In addition, there are standard macros for extracting the unspecified arguments - va_start and va_end.
For example, here is how you can implement a function similar in behavior to printf:
void log_printf(const char* data,...)
{
static char str[256] = {0};
va_list args;
va_start(args,data);
vsnprintf(str,sizeof(str),data,args);
va_end(args);
fprintf(global_fp,str);
printf(str);
}
P.S.: the example above is not thread-safe, and is only given here as an example...

How can I call (not define) a function with a variable number of arguments in C?

Is there any way to make this code shorter?
long call_f(int argc, long *argv) {
switch (argc) {
case 0:
return f();
break;
case 1:
return f(argv[0]);
break;
case 2:
return f(argv[0], argv[1]);
break;
case 3:
return f(argv[0], argv[1], argv[2]);
break;
case 4:
return f(argv[0], argv[1], argv[2], argv[3]);
break;
// ...
}
return -1;
}
No, there isn't any good way to do this. See here:
http://c-faq.com/varargs/handoff.html
You can write a macro with token pasting to hide this behavior but that macro will be no simpler than this code, thus it's only worth writing if you have multiple functions like f() where you would otherwise have to duplicate this case statement.
I don't know how you can make your code shorter but I saw this line in your code:
return f();
From the next calls to f function, it seems that f is a function that takes variable number of arguments.
You can read in wikipedia that:
Variadic functions must have at least
one named parameter, so, for instance,
char *wrong(...);
is not allowed in C.
Based on that, maybe the return f(); statement is causing you trouble?
There's actually a method to call a function at run-time if you know its calling convention and which parameters it receives. This however lies out of the standard C/C++ language scope.
For x86 assembler:
Assuming the following:
You know to prepare all the parameters for your function in a solid buffer, exactly in the manner they'd be packed on the stack.
Your function doesn't take/return C++ objects by value.
You may use then the following function:
int CallAnyFunc(PVOID pfn, PVOID pParams, size_t nSizeParams)
{
// Reserve the space on the stack
// This is equivalent (in some sense) to 'push' all the parameters into the stack.
// NOTE: Don't just subtract the stack pointer, better to call _alloca, because it also takes
// care of ensuring all the consumed memory pages are accessible
_alloca(nSizeParams);
// Obtain the stack top pointer
char* pStack;
_asm {
mov pStack, esp
};
// Copy all the parameters into the stack
// NOTE: Don't use the memcpy function. Because the call to it
// will overwrite the stack (which we're currently building)
for (size_t i = 0; i < nSizeParams; i++)
pStack[i] = ((char*) pParams)[i];
// Call your function
int retVal;
_asm {
call pfn
// Most of the calling conventions return the value of the function (if anything is returned)
// in EAX register
mov retVal, eax
};
return retVal;
}
You may need to adjust this function, depending on the calling convention used
I'll post here the same answer as I posted at the duplicated question, but you should take a look at the discussion there:
What is libffi?
Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. ‘libffi’ can be used in such programs to provide a bridge from the interpreter program to compiled code.
The ‘libffi’ library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run time.
FFI stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language. The ‘libffi’ library really only provides the lowest, machine dependent layer of a fully featured foreign function interface. A layer must exist above ‘libffi’ that handles type conversions for values passed between the two languages.
‘libffi’ assumes that you have a pointer to the function you wish to call and that you know the number and types of arguments to pass it, as well as the return type of the function.
Historic background
libffi, originally developed by Anthony Green (SO user: anthony-green), was inspired by the Gencall library from Silicon Graphics. Gencall was developed by Gianni Mariani, then employed by SGI, for the purpose of allowing calls to functions by address and creating a call frame for the particular calling convention. Anthony Green refined the idea and extended it to other architectures and calling conventions and open sourcing libffi.
Calling pow with libffi
#include <stdio.h>
#include <math.h>
#include <ffi.h>
int main()
{
ffi_cif call_interface;
ffi_type *ret_type;
ffi_type *arg_types[2];
/* pow signature */
ret_type = &ffi_type_double;
arg_types[0] = &ffi_type_double;
arg_types[1] = &ffi_type_double;
/* prepare pow function call interface */
if (ffi_prep_cif(&call_interface, FFI_DEFAULT_ABI, 2, ret_type, arg_types) == FFI_OK)
{
void *arg_values[2];
double x, y, z;
/* z stores the return */
z = 0;
/* arg_values elements point to actual arguments */
arg_values[0] = &x;
arg_values[1] = &y;
x = 2;
y = 3;
/* call pow */
ffi_call(&call_interface, FFI_FN(pow), &z, arg_values);
/* 2^3=8 */
printf("%.0f^%.0f=%.0f\n", x, y, z);
}
return 0;
}
I think I can assert libffi is a portable way to do what I asked, contrary to Antti Haapala's assertion that there isn't such a way. If we can't call libffi a portable technology, given how far it's ported/implemented across compilers and architectures, and which interface complies with C standard, we too can't call C, or anything, portable.
Information and history extracted from:
https://github.com/atgreen/libffi/blob/master/doc/libffi.info
http://en.wikipedia.org/wiki/Libffi
You can check out my answer to:
Best Way to Store a va_list for Later Use in C/C++
Which seems to work, yet scare people. It's not guaranteed cross-platform or portable, but it seems to be workable on a couple of platforms, at least. ;)
Does f have to accept a variable number of pointers to long? Can you rewrite it to accept an array and a count?

Resources