Instruction pointer and function pointer constantness - c

I have the following piece of C code which prints the rip register and the address of a function foo. Running the executable multiple times results in the same values of rip and &foo being printed.
#include <stdio.h>
#include <inttypes.h>
void foo(int x) {
printf("foo sees %d\n", x);
}
int main(int argc, char *argv[]) {
uint64_t ip;
asm("leaq (%%rip), %0;": "=r"(ip));
printf("rip is 0x%016" PRIx64 "\n", ip);
void (*fp)(int) = &foo;
printf("foo is at offset %p\n", fp);
(*fp)(10);
return 0;
}
Q1: Why does rip remain the same?
Q2: Will &foo remain the same, provided the binary and machine remain the same?
Q3: When can &foo change?
Background: I am trying to store the execution times of functions in a history table. I am thinking of using the function address to index into the table and calculate deviations from previous executions.

Q1:
Depends on your platform. Some platforms load your program into a virtual address space, so the exact same code will have the exact same virtual address for foo (assuming the program and the OS's loader don't change between runs, and the loader isn't one that randomizes the load address per the comments). On other platforms that do not load your executable into a virtual address space, you may or may not get the same address depending on whether other programs have executed and/or terminated between runs.
Q2:
Don't count on it. If nothing changes at all, you will have deterministic behavior (same address). But there are many, many things that can change (again, dependent on the platform).
Q3:
They can change at any time on a platform that doesn't allocate a virtual address (as other processes start/continue doing work/terminate). On a platform that does allocate a virtual address, they addresses can change if your program or related libraries change at all, if there is an OS patch that changes loader behavior, or probably due to other circumstances I'm not thinking of at the moment.
Bottom Line
Storing the address may work for your very specific case, but it's a fragile solution.

Nothing is guaranteed.
The solution is to index using the function name, not its address (The C99 standard provides the __func__ identifier). That way your index is guaranteed to remain the same across all changes in OS, compiler, options, and phase of the moon. Until you refactor the function name, of course :-)

Since you're using Linux you can use dladdr() to ask about symbols near places in memory. For example:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
void foo() {
}
int main() {
Dl_info info;
void *test = foo; // Note: not standard C
dladdr(test, &info);
printf("closest symbol: %s in %s\n", info.dli_sname, info.dli_fname);
return 0;
}
when compiled with:
gcc -Wall -Wextra test.c -ldl -rdynamic
Correctly identifies the void* as foo, which will be correct no matter where foo gets loaded.

Related

The placement of static global and global variables with the same identifier

I'm learning some basics about linking and encountered the following code.
file: f1.c
#include <stdio.h>
static int foo;
int main() {
int *bar();
printf("%ld\n", bar() - &foo);
return 0;
}
file: f2.c
int foo = 0;
int *bar() {
return &foo;
}
Then a problem asks me whether this statement is correct: No matter how the program is compiled, linked or run, the output of it must be a constant (with respect to multiple runs) and it is non-zero.
I think this is correct. Although there are two definitions of foo, one of them is declared with static, so it shadows the global foo, thus the linker will not pick only one foo. Since the relative position of variables should be fixed when run (although the absolute addresses can vary), the output must be a constant.
I experimented with the code and on gcc 7.5.0 with gcc f1.c f2.c -o test && ./test it would always output 1 (but if I remove the static, it would output 0). But the answer says that the statement above is wrong. I wonder why. Are there any mistakes in my understanding?
A result of objdump follows. Both foos go to .bss.
Context. This is a problem related to the linking chapter of Computer Systems: A Programmer's Perspective by Randal E. Bryant and David R. O'Hallaron. But it does not come from the book.
Update. OK now I've found out the reason. If we swap the order and compile as gcc f2.c f1.c -o test && ./test, it will output -1. Quite a boring problem...
Indeed the static variable foo in the f1.c module is a different object from the global foo in the f2.c module referred to by the bar() function. Hence the output should be non zero.
Note however that subtracting 2 pointers that do not point to the same array or one past the end of the same array is meaningless, hence the difference might be 0 even for different objects. This may happen even as &foo == bar() would be non 0 because the objects are different. This behavior was common place in 16-bit segmented systems using the large model where subtracting pointers only affected the offset portion of the pointers whereas comparing them for equality compared both the segment and the offset parts. Modern systems have a more regular architecture where everything is in the same address space. Just be aware that not every system is a linux PC.
Furthermore, the printf conversion format %ld expects a value of type long whereas you pass a value of type ptrdiff_t which may be a different type (namely 64-bit long long on Windows 64-bit targets for example, which is different from 32-bit long there). Either use the correct format %td or cast the argument as (long)(bar() - &foo).
Finally, nothing in the C language guarantees that the difference between the addresses of global objects be constant across different runs of the same program. Many modern systems perform address space randomisation to lessen the risk of successful attacks, leading to different addresses for stack objects and/or static data in successive runs of the same executable.
Abstracting from the wring printf formats and pointer arithmetic problems static global variable from one compilation unit will be different than static and non-static variables having that same name in other compilation units.
to correctly see the difference in chars you should cast both to char pointer and use %td format which will print ptrdiff_t type. If your platform does not support it, cast the result to long long int
int main() {
int *bar();
printf("%td\n", (char *)bar() - (char *)&foo);
return 0;
}
or
printf("%lld\n", (long long)((char *)bar() - (char *)&foo));
If you want to store this difference in the variable use ptrdiff_t type:
ptrdiff_t diff = (char *)bar() - (char *)&foo;

Preinitialized function pointers in compiled binary?

I am currently trying to understand the translation of some simple C-Code into assembly by the clang compiler. However the following behaviour is confusing to me:
int a(void);
int b(void);
int a() {
return 1;
}
int b() {
return 2;
}
int c(){
return 3;
}
int main(int argc, char **argv) {
int (*procs[])(void) = {a,b};
int (*procs2[])(void) = {c,b};
...
gets translated to:
I figured out that the values at the addresses 0x4006XX hold the respective addresses of functions a, b and c. However I wonder why this extra step of using the 0x4006XX addresses is necessary (why not just use the literal address?). And even more curious as to why it uses two different addresses for the address of b.
I know this is probably an obscure question but any help is appreciated :)
It appears that your compiler generates position independent code. Position independent code can be loaded to an arbitrary address at runtime, making the addresses of functions and static variables unpredictable at compile time. The one thing that is predictable is the distance from the variable or function to the current instruction. The compiler uses the lea instruction to add the content of rip, the instruction pointer, to this distance to get the actual address. That's what you are seeing.

getting address of particular instruction in a function [duplicate]

I want to know the length of C function (written by me) at runtime. Any method to get it? It seems sizeof doesn't work here.
There is a way to determine the size of a function. The command is:
nm -S <object_file_name>
This will return the sizes of each function inside the object file. Consult the manual pages in the GNU using 'man nm' to gather more information on this.
You can get this information from the linker if you are using a custom linker script. Add a linker section just for the given function, with linker symbols on either side:
mysec_start = .;
*(.mysection)
mysec_end = .;
Then you can specifically assign the function to that section. The difference between the symbols is the length of the function:
#include <stdio.h>
int i;
__attribute__((noinline, section(".mysection"))) void test_func (void)
{
i++;
}
int main (void)
{
extern unsigned char mysec_start[];
extern unsigned char mysec_end[];
printf ("Func len: %lu\n", mysec_end - mysec_start);
test_func ();
return 0;
}
This example is for GCC, but any C toolchain should have a way to specify which section to assign a function to. I would check the results against the assembly listing to verify that it's working the way you want it to.
There is no way in standard C to get the amount of memory occupied by a function.
I have just came up with a solution for the exact same problem but the code i have written is platform depended.
The idea behind, putting known opcodes at the end of the function and searching for them from start while counting bytes we have skipped.
Here is the medium link which i have explained with some code
https://medium.com/#gurhanpolat/calculate-c-function-size-x64-x86-c1f49921aa1a
Executables (at least ones which have debug info stripped) doesn't store function lengths in any way. So there's no possibility to parse this info in runtime by self. If you have to manipulate with functions, you should do something with your objects in linking phase or by accessing them as files from your executable. For example, you may tell linker to link symbol tables as ordinary data section into the executable, assign them some name, and parse when program runs. But remember, this would be specific to your linker and object format.
Also note, that function layout is also platform specific and there are some things that make the term "function length" unclear:
Functions may have store used constants in code sections directly after function code and access them using PC-relative addressing (ARM compilers do this).
Functions may have "prologs" and "epilogs" which may may be common to several functions and thus lie outside main body.
Function code may inline other function code
They all may count or not count in function length.
Also function may be completely inlined by compiler, so it loose its body.
A fully worked out solution without linker or dirty platform dependent tricks:
#include <stdio.h>
int i;
__attribute__((noinline, section("mysec"))) void test_func (void)
{
i++;
}
int main (void)
{
extern char __start_mysec[];
extern char __stop_mysec[];
printf ("Func len: %lu\n", __stop_mysec - __start_mysec);
test_func ();
return 0;
}
That's what you get when you read FazJaxton's answer with jakobbotsch's comment
In e.g. Codewarrior, you can place labels around a function, e.g.
label1:
void someFunc()
{
/* code goes here. */
}
label2:
and then calculate the size like (int)(label2-label1), but this is obviously very compiler dependent. Depending on your system and compiler, you may have to hack linker scripts, etc.
The start of the function is the function pointer, you already know that.
The problem is to find the end, but that can be done this way:
#include <time.h>
int foo(void)
{
int i = 0;
++i + time(0); // time(0) is to prevent optimizer from just doing: return 1;
return i;
}
int main(int argc, char *argv[])
{
return (int)((long)main - (long)foo);
}
It works here because the program has ONLY TWO functions so if the code is re-ordered (main implemented before foo) then you will get an irrelevant (negative) calculation, letting you know that it did not work this way but that it WOULD work if you move the foo() code into main() - just substract the main() size you got with the initial negative reply.
If the result is positive, then it will be correct -if no padding is done (yes, some compilers happily inflate the code, either for alignment or for other, less obvious reasons).
The ending (int)(long) cast is for portability between 32-bit and 64-bit code (function pointers will be longer on a 64-bit platform).
This is faily portable and should work reasonably well.
There's no facility defined within the C language itself to return the length of a function; there are simply too many variables involved (compiler, target instruction set, object file/executable file format, optimization settings, debug settings, etc.). The very same source code may result in functions of different sizes for different systems.
C simply doesn't provide any sort of reflection capability to support this kind of information (although individual compilers may supply extensions, such as the Codewarrior example cited by sskuce). If you need to know how many bytes your function takes up in memory, then you'll have to examine the generated object or executable file directly.
sizeof func won't work because the expression func is being treated as a pointer to the function, so you're getting the size of a pointer value, not the function itself.
Just subtract the address of your function from the address of the next function. But note it may not work on your system, so use it only if you
are 100% sure:
#include <stdint.h>
int function() {
return 0;
}
int function_end() {
return 0;
}
int main(void) {
intptr_t size = (intptr_t) function_end - (intptr_t) function;
}
There is no standard way of doing it either in C or C++. There might naturally exist implementation/platform-specific ways of doiung it, but I am not aware of any
size_t try_get_func_size_x86(void* pfn, bool check_prev_opcode = true, size_t max_opcodes_runout = 10000)
{
const unsigned char* op = (const unsigned char*)pfn;
for(int i = 0; i < max_opcodes_runout; i++, op++)
{
size_t sz_at = (size_t)(op - (const unsigned char*)pfn) + 1;
switch(*op)
{
case 0xC3: // ret Opcode
case 0xC2: // ret x Opcode
if(!check_prev_opcode)
return sz_at;
switch(*(op-1)) // Checking Previous Opcode
{
case 0x5D: // pop ebp
case 0x5B: // pop ebx
case 0x5E: // pop esi
case 0x5F: // pop edi
case 0xC9: // leave
return sz_at;
}
}
}
return 0;
}
You can find the length of your C function by subtracting the addresses of functions.
Let me provide you an example
int function1()
{
}
int function2()
{
int a,b; //just defining some variable for increasing the memory size
printf("This function would take more memory than earlier function i.e function01 ");
}
int main()
{
printf("Printing the address of function01 %p\n",function01);
printf("Printing the address of function02 %p\n",function02);
printf("Printing the address of main %p\n",main);
return 0;
}
Hope you would get your answer after compiling it. After compiling you will able to see the
difference in size of function01 and function2.
Note : Normally there is 16bytes diff between one function and other.

Changing stdout (putch() function) on the fly in C

I'm using the XC8 compiler. For that, you have to define your own void putch(char data) function in order for functions like printf() to work, as is described here. Basically, putch() is the function which is used to write characters to stdout.
I now want to change this function on the fly. I have two different functions, putch_a() and putch_b() and want to be able to change which one is used for putch() itself, on the fly.
I thought of this:
unsigned use_a_not_b;
void putch(char data) {
if (use_a_not_b) {
putch_a(data);
} else {
putch_b(data);
}
}
However, this reduces execution speed. Would there be a way to use pointers for this? I have read this answer, and made the following code:
void putch_a(char data);
void putch_b(char data);
void (*putch)(char) = putch_a; // to switch to putch_a
void (*putch)(char) = putch_b; // to switch to putch_b
Would that work? Is there a faster or better-practice way?
No, and why not
To answer your question: no you can't, in the way that you are thinking (i.e. function pointer). What a function pointer is is a variable with an address of another variable. To illustrate, consider how this works when you have a function pointer foo pointing to function bar.
int bar() {
}
void baz(int (*foo)()) {
int x = foo(); // Calls the function pointed to bar foo
}
int main() {
int (*foo)();
foo = &bar;
baz(foo); // Cal baz() passing it foo, which points to bar()
}
What foo holds is the address of bar. When you pass foo to some function that expects a function pointer parameter (in this case baz()), the function dereferences the pointer, i.e. looks at the memory address associated with foo, gets the address sored in it, in our case the the address of bar, and then calls a function (in our case, bar) at that address. To be very careful about this: in the above example baz() says
Let me look at the memory associated with foo, it has another address in it
Load that address from memory, and call a function at that address. That function returns an int and takes no parameters.
Let's contrast this with a function that calls bar() directly:
void qux() {
int x = bar(); // Call bar()
}
In this case there is no function pointer. What there is, is an address, supplied by the linker. The linker lays out all the functions in your program, and it knows, for instance that bar() is at address 0xDEADBEEF. So in qux() there is just a jump 0xDEADBEEF call. In contrast in baz() there is something like (pseudo-addembly):
pop bar off the stack into register A
read memory address pointed to by register A into register B
jump to memory location pointed to by register B
The way putch() gets called from printf(), for instance, is exactly like qux() calls bar(), and not like the way baz() does: putch gets statically linked into your program, so the address of putch() is hardcoded in there, simple because fprintf() doesn't take a function pointer to call for a parameter.
Why #define is not the answer
#define is a preprocessor directive, that is, "symbols" defined with #define are replaced with their values before the compiler even sees your code. This means that #define makes your program less dynamically modifiable not more. This is desirable in some cases, but in your case it will not help you. To illustrate if you define a symbol like this:
#define Pi 3.14
Then everywhere you use Pi it is as if you typed 3.14. Bacause Pi does not exist, as far as the compiler is concerned, you cannot even take an address of it to make a pointer to it.
Closest you can get to a dynamic putch
Like the others have said, you can have some sort of case statement, conditional, or a global pointer, but the putch function itself has to be there in the same form.
Global function pointer solution:
void (*myPutch)(char);
putch(char ch) {
myPutch(ch);
}
int main() {
myPutch = putch_Type_A();
...
myPutch = putch_Type_B();
}
If/then/else solution has been provided in other answers
goto solution: This would be an ugly (but fun!) hack, and only possible on von Neumann-type machines, but in these conditions you could have your putch look like this:
putch(char ch) {
goto PutchTypeB
PutchTypeA:
// Code goes here
return;
PutchTypeB:
// Code goes here
return;
}
You would then overwrite the goto instruction with goto to some other memory address. You'd have to figure out the opcodes for doing this (from disassembly, probably), and this isn't possible on Harvard architecture machines, so it is out on AVR processors, but it would be fun, if cludgey.
No. That isn't guaranteed to work due to the way code is generated and linked. However...
void (*output_function)(char) = putch_a;
void putch(char c) {
output_function(c);
}
Now you can change output_function whenever you like...
There is no concept of "speed" in C. That's an attribute introduced by implementations. There are fast implementations (or rather, implementations that produce fast code, in the case of "compilers") and slow implementations (or implementations that produce slow code).
Either way, this is unlikely to be a significant bottleneck. Produce a program that solves a useful program, profile it to determine the most significant bottlenecks and work on optimising those.
Before you optimize this, make sure what you have really does reduce execution speed noticeably. In an i/o function there's usually a lot of other stuff going on (checking if buffer space is free, calculating buffer offsets, informing hardware data is available, being interrupted while data is actually transmitted to hardware, etc.) that would make a single extra if/else inconsequential.
In most cases your first block should be fine.
In comments you mention maybe needing to extend this structure to multiple putch() functions.
Maybe try
enum PUTCH { sel_putch_a, sel_putch_b, ... };
enum PUTCH putch_select;
void putch(char c) {
switch(putch_select) {
case sel_putch_a : putch_a(c); break;
case sel_putch_b : putch_b(c); break;
/* ... */
}
}
The compiler should be able to optimize the switch statement to a simple computation and a goto. If the putch_<n> functions are inlineable, this doesn't even cost an extra call/return.
The solution using a pointer-to-function in another answer is more flexible in terms of being able to change the available putch functions on the fly, or define them in other files (for example, if you're writing a library or framework to be used by others), but it does require an extra call/return overhead (compared to the simple case of just defining a single putch function).

Extern and global variable

I am unable to understand the weird behaviour of the program. It throws a warning but the output is surprising. I have 3 files, file1.c, file2c, file1.h
file1.c :
#include "file1.h"
extern void func(void);
int main(void)
{
func();
printf("%d\n",var);
return 0;
}
file2.c:
int var;
void func(void)
{
var = 1;
}
and file1.h is :
#include<stdio.h>
extern double var;
When I compile and run, it shows a warning as expected but on printing prints 1. But how is it possible when I am changing the var declared in file2.c
You seem to have undefined behavior -- but one that is probably innocuous on many machines much of the time.
Since your extern declaration says var is a double, when it's actually an int, attempting to access it as a double causes undefined behavior.
In reality, however, in a typical case, a double will be eight bytes, where an int is four bytes. This means when you pass var to printf, 8 bytes of data will be pushed onto the stack (but starting from the four bytes that var actually occupies).
printf will then attempt to convert the first four bytes of that (which does correspond to the bytes of var) as an int, which does match the actual type of var. This is why the behavior will frequently be innocuous.
At the same time, if your compiler was doing bounds checking, your attempt at loading 8 bytes of a four-byte variable could easily lead to some sort of "out of bounds" exception, so your program wouldn't work right. It's just your misfortune that most C compilers don't do any bounds-checking.
It's also possible that (probably on a big-endian machine) instead of the bytes corresponding to var happening to be in the right place on the stack for printf to access them as an int, that the other four bytes that would make up a double will end up in that place, and it would print garbage.
Yet another possibility would be that the bit pattern of the 8 bytes would happen to correspond to some floating point value that would cause an exception as soon as you tried to access it as a floating point number. Given the (small) percentage of bit patterns that correspond to things like signaling NaN's, chances of that arising are fairly small though (and even the right bit pattern might not trigger anything, if the compiler just pushed 8 bytes on the stack and left it at that).
Just setting the type on an extern definition doesn't change the value stored in memory - it only changes how the value is interpreted. So a value of 0x0001 is what is stored in memory no matter how try to interpret it.
Once you've called func() you've stored the value 0x0001 in the location where a is stored. Then in main when a is passed to printf the value stored at location a plus the next 4 bytes (assuming 32bits) is pushed onto the stack. That is the value 0x0001xxxx is pushed onto the stack. But the %d treats the value passed in as an int and will read the first 4 bytes on the stack, which is 0x0001 so it will print 1.
Try changing your code as follows:
file1.c :
#include "file1.h"
extern void func(void);
int main(void)
{
func();
printf("%d %d\n",var);
return 0;
}
file2.c:
int var;
int var2;
void func(void)
{
var = 1;
var2 = 2;
}
and file1.h is :
#include<stdio.h>
extern double var;
You will probably get the output 1 2 (although this depends on how the compiler stores the variables, but it's likely it will store them in adjacent locations)

Resources