Why compilers creates one variable "twice"? - c

I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.

Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.

So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.

What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)

Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.

not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10

Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

Related

How to undeclare (delete) variable in C?

Like we do with macros:
#undef SOMEMACRO
Can we also undeclare or delete the variables in C, so that we can save a lot of memory?
I know about malloc() and free(), but I want to delete the variables completely so that if I use printf("%d", a); I should get error
test.c:4:14: error: ‘a’ undeclared (first use in this function)
No, but you can create small minimum scopes to achieve this since all scope local variables are destroyed when the scope is exit. Something like this:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
It's not a direct answer to the question, but it might bring some order and understanding on why this question has no proper answer and why "deleting" variables is impossible in C.
Point #1 What are variables?
Variables are a way for a programmer to assign a name to a memory space. This is important, because this means that a variable doesn't have to occupy any actual space! As long as the compiler has a way to keep track of the memory in question, a defined variable could be translated in many ways to occupy no space at all.
Consider: const int i = 10; A compiler could easily choose to substitute all instances of i into an immediate value. i would occupy 0 data memory in this case (depending on architecture it could increase code size). Alternatively, the compiler could store the value in a register and again, no stack nor heap space will be used. There's no point in "undefining" a label that exists mostly in the code and not necessarily in runtime.
Point #2 Where are variables stored?
After point #1 you already understand that this is not an easy question to answer as the compiler could do anything it wants without breaking your logic, but generally speaking, variables are stored on the stack. How the stack works is quite important for your question.
When a function is being called the machine takes the current location of the CPU's instruction pointer and the current stack pointer and pushes them into the stack, replacing the stack pointer to the next location on stack. It then jumps into the code of the function being called.
That function knows how many variables it has and how much space they need, so it moves the frame pointer to capture a frame that could occupy all the function's variables and then just uses stack. To simplify things, the function captures enough space for all it's variables right from the start and each variable has a well defined offset from the beginning of the function's stack frame*. The variables are also stored one after the other.
While you could manipulate the frame pointer after this action, it'll be too costly and mostly pointless - The running code only uses the last stack frame and could occupy all remaining stack if needed (stack is allocated at thread start) so "releasing" variables gives little benefit. Releasing a variable from the middle of the stack frame would require a defrag operation which would be very CPU costly and pointless to recover few bytes of memory.
Point #3: Let the compiler do its job
The last issue here is the simple fact that a compiler could do a much better job at optimizing your program than you probably could. Given the need, the compiler could detect variable scopes and overlap memory which can't be accessed simultaneously to reduce the programs memory consumption (-O3 compile flag).
There's no need for you to "release" variables since the compiler could do that without your knowledge anyway.
This is to complement all said before me about the variables being too small to matter and the fact that there's no mechanism to achieve what you asked.
* Languages that support dynamic-sized arrays could alter the stack frame to allocate space for that array only after the size of the array was calculated.
There is no way to do that in C nor in the vast majority of programming languages, certainly in all programming languages that I know.
And you would not save "a lot of memory". The amount of memory you would save if you did such a thing would be minuscule. Tiny. Not worth talking about.
The mechanism that would facilitate the purging of variables in such a way would probably occupy more memory than the variables you would purge.
The invocation of the code that would reclaim the code of individual variables would also occupy more space than the variables themselves.
So if there was a magic method purge() that purges variables, not only the implementation of purge() would be larger than any amount of memory you would ever hope to reclaim by purging variables in your program, but also, in int a; purge(a); the call to purge() would occupy more space than a itself.
That's because the variables that you are talking about are very small. The printf("%d", a); example that you provided shows that you are thinking of somehow reclaiming the memory occupied by individual int variables. Even if there was a way to do that, you would be saving something of the order of 4 bytes. The total amount of memory occupied by such variables is extremely small, because it is a direct function of how many variables you, as a programmer, declare by hand-typing their declarations. It would take years of typing on a keyboard doing nothing but mindlessly declaring variables before you would declare a number of int variables occupying an amount of memory worth speaking of.
Well, you can use blocks ({ }) and defining a variable as late as possible to limit the scope where it exists.
But unless the variable's address is taken, doing so has no influence on the generated code at all, as the compiler's determination of the scope where it has to keep the variable's value is not significantly impacted.
If the variable's address is taken, failure of escape-analysis, mostly due to inlining-barriers like separate compilation or allowing semantic interpositioning, can make the compiler assume it has to keep it alive till later in the block than strictly neccessary. That's rarely significant (don't worry about a handful of ints, and most often a few lines of code longer keeping it alive are insignificant), but best to keep it in mind for the rare case where it might matter.
If you are that concerned about the tiny amount of memory that is on the stack, then you're probably going to be interested in understanding the specifics of your compiler as well. You'll need to find out what it does when it compiles. The actual shape of the stack-frame is not specified by the C language. It is left to the compiler to figure out. To take an example from the currently accepted answer:
void foo() {
// some codes
// ...
{ // create an extra minimum scope where a is needed
int a;
}
// a doesn't exist here
}
This may or may not affect the memory usage of the function. If you were to do this in a mainstream compiler like gcc or Visual Studio, you would find that they optimize for speed rather than stack size, so they pre-allocate all of the stack space they need at the start of the function. They will do analysis to figure out the minimum pre-allocation needed, using your scoping and variable-usage analysis, but those algorithms literally wont' be affected by extra scoping. They're already smarter than that.
Other compilers, especially those for embedded platforms, may allocate the stack frame differently. On these platforms, such scoping may be the trick you needed. How do you tell the difference? The only options are:
Read the documentation
Try it, and see what works
Also, make sure you understand the exact nature of your problem. I worked on a particular embedded project which eschewed the stack for everything except return values and a few ints. When I pressed the senior developers about this silliness, they explained that on this particular application, stack space was at more of a premium than space for globally allocated variables. They had a process they had to go through to prove that the system would operate as intended, and this process was much easier for them if they allocated everything up front and avoided recursion. I guarantee you would never arrive at such a convoluted solution unless you first knew the exact nature of what you were solving.
As another solution you could look at, you could always build your own stack frames. Make a union of structs, where each struct contains the variables for one stack frame. Then keep track of them yourself. You could also look at functions like alloca, which can allow for growing the stack frame during the function call, if your compiler supports it.
Would a union of structs work? Try it. The answer is compiler dependent. If all variables are stored in memory on your particular device, then this approach will likely minimize stack usage. However, it could also substantially confuse register coloring algorithms, and result in an increase in stack usage! Try and see how it goes for you!

How to stop RAM test destroying its own variables embedded C?

During startup tests I am required to test all RAM locations using a galpat test, I have wrote the function to do this but run into a problem that the functions variables exist in RAM and therefore get mashed as part of the test.
What would be the best way around this?
A possible approach could be - especially taking care of the processor stack, .data and .bss is something you can avoid - but there is no easy way to have C work without a proper stack:
Write your code that it exclusively uses ROM code and stack variables.
Have your startup code (that would normally be written in assembler anyhow) allocate the stack in the upper half of memory, test the lower half
move (copy) the stack from upper memory to already tested areas (can be done in C)
Reset the stack pointer to point to the copied stack (involves assembler coding)
Do the rest of the memtest (can be done in C again)
(This assumes your code runs from ROM, which it would normally do at such early point in the start-up). In case there is any memory failure in areas where you allocate the stack, your code will simply crash (What it does before that - Reactor meltdown... - depends on the application).
When moving the stack around outside C's control, you should be very careful what you actually store there - Pointers to stack variables will become invalid - or rather undefined - once you've moved the stack. Simple scalar variables and pointers to outside the stack as typically used in a RAM test should work, however.
You could try and declare your variables as register to try and keep RAM usage as low as possible - But you can't force C to put certain variables into registers, and a good C compiler will put them there anyhow.
Whether this is any better than writing the whole memtest in assembler (you'd need to do the stack adjustments in assembler anyhow, as there is no means to move the processor stack around in C) I dare to challenge. I don't see much of a point here using C on this low level, especially as assembler could run a memtest routine completely from registers, without using any RAM. This makes it much more immune to any RAM problem. A RAM testing routine shouldn't rely on working memory.
The bottom line is you cannot, you have two choices, you can either only test part of the ram or part of the ram at a time which means you are not doing a complete address test. Or you dont run from the ram you are testing, which is basically the rule if you really want to test the ram. So you have to run from rom with out using a stack in the ram under test or you use another ram, perhaps there is a cache somewhere that can be used direct access to give you a little ram.
Testing half or some other fraction at a time, which is not a complete test, but better than not testing part of it at all, can be done with either a position independent module or with multiple compiles of the test that are position dependent. No reason for the stack to be an issue, the rom based code copying and jumping to the code under test can set the stack pointer based on the fraction under test or not under test, and then repeat. Treat the module like a function not like an entire program and the preserving the stack or "moving the test" problem goes away it returns back to the rom based code which can relaunch further tests.
One crazy way to do it would be to attempt to turn on the I cache get the test code into the cache (before it hits itself), and then blast away at the ram including the code behind the ram. (cant have stack) I would only try this as a fun experiment but not for anything real. Lots of problems to solve with an approach like this.
My approach would be this:
Do the test before anything is initialized (variables, stack). In gcc you can: void RAM_test(void) _attribute_ ((section (".init0")));
Write it in assembly. Ensure that the test does not use/store any variables in the RAM, only uses processor registers.
Store the result somewhere so you can use it later in normal program.
If you do have ROM space and can afford the time at boot time, I would implement the test in assembly or in a naked C function using directives to put your variables in registers so that no RAM is consumed as part of the test. This is going to be pretty architecture and compiler specific, though, and neither of those were mentioned.

What's inside the stack?

If I run a program, just like
#include <stdio.h>
int main(int argc, char *argv[], char *env[]) {
printf("My references are at %p, %p, %p\n", &argc, &argv, &env);
}
We can see that those regions are actually in the stack.
But what else is there? If we ran a loop through all the values in Linux 3.5.3 (for example, until segfault) we can see some weird numbers, and kind of two regions, separated by a bunch of zeros, maybe to try to prevent overwriting the environment variables accidentally.
Anyway, in the first region there must be a lot of numbers, such as all the frames for each function call.
How could we distinguish the end of each frame, where the parameters are, where the canary if the compiler added one, return address, CPU status and such?
Without some knowledge of the overlay, you only see bits, or numbers. While some of the regions are subject to machine specifics, a large number of the details are pretty standard.
If you didn't move too far outside of a nested routine, you are probably looking at the call stack portion of memory. With some generally considered "unsafe" C, you can write up fun functions that access function variables a few "calls" above, even if those variables were not "passed" to the function as written in the source code.
The call stack is a good place to start, as 3rd party libraries must be callable by programs that aren't even written yet. As such, it is fairly standardized.
Stepping outside of your process memory boundaries will give you the dreaded Segmentation violation, as memory fencing will detect an attempt to access non-authorized memory by the process. Malloc does a little more than "just" return a pointer, on systems with memory segmentation features, it also "marks" the memory accessible to that process and checks all memory accesses that the process assignments are not being violated.
If you keep following this path, sooner or later, you'll get an interest in either the kernel or the object format. It's much easier to investigate one way of how things are done with Linux, where the source code is available. Having the source code allows you to not reverse-engineer the data structures by looking at their binaries. When starting out, the hard part will be learning how to find the right headers. Later it will be learning how to poke around and possibly change stuff that under non-tinkering conditions you probably shouldn't be changing.
PS. You might consider this memory "the stack" but after a while, you'll see that really it's just a large slab of accessible memory, with one portion of it being considered the stack...
The contents of the stack are basically:
Whatever the OS passes to the program.
Call frames (also called stack frames, activation areas, ...)
What does the OS pass to the program? A typical *nix will pass the environment, arguments to the program, possibly some auxiliary information, and pointers to them to be passed to main().
In Linux, you'll see:
a NULL
the filename for the program.
environment strings
argument strings (including argv[0])
padding full of zeros
the auxv array, used to pass information from the kernel to the program
pointers to environment strings, ended by a NULL pointer
pointers to argument strings, ended by a NULL pointer
argc
Then, below that are stack frames, which contain:
arguments
the return address
possibly the old value of the frame pointer
possibly a canary
local variables
some padding, for alignment purposes
How do you know which is which in each stack frame? The compiler knows, so it just treats its location in the stack frame appropriately. Debuggers can use annotations for each function in the form of debug info, if available. Otherwise, if there is a frame pointer, you can identify things relative to it: local variables are below the frame pointer, arguments are above the stack pointer. Otherwise, you must use heuristics, things that look like code addresses are probably code addresses, but sometimes this results in incorrect and annoying stack traces.
The content of the stack will vary depending on the architecture ABI, the compiler, and probably various compiler settings and options.
A good place to start is the published ABI for your target architecture, then check that your particular compiler conforms to that standard. Ultimately you could analyse the assembler output of the compiler or observe the instruction level operation in your debugger.
Remember also that a compiler need not initialise the stack, and will certainly not "clear it down", when it has finished with it, so when it is allocated to a process or thread, it might contain any value - even at power-on, SDRAM for example will not contain any specific or predictable value, if the physical RAM address has been previously used by another process since power on or even an earlier called function in the same process, the content will have whatever that process left in it. So just looking at the raw stack does not tell you much.
Commonly a generic stack frame may contain the address that control will jump to when the function returns, the values of all the parameters passed, and the value of all auto local variables in the function. However the ARM ABI for example passes the first four arguments to a function in registers R0 to R3, and holds the return value of the leaf function in the LR register, so it is not as simple in all cases as the "typical" implementation I have suggested.
The details are very dependent on your environment. The operating system generally defines an ABI, but that's in fact only enforced for syscalls.
Each language (and each compiler even if they compile the same language) in fact may do some things differently.
However there is some sort of system-wide convention, at least in the sense of interfacing with dynamically loaded libraries.
Yet, details vary a lot.
A very simple "primer" could be http://kernelnewbies.org/ABI
A very detailed and complete specification you could look at to get an idea of the level of complexity and details that are involved in defining an ABI is "System V Application Binary Interface AMD64 Architecture Processor Supplement" http://www.x86-64.org/documentation/abi.pdf

Are programming languages and methods inefficient? (assembler and C knowledge needed)

for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example:
1: Stack local scope memory allocation:
So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster.
2: Variable allocation duplicity:
When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?
How would you implement recursive functions? What you are describing is equivalent to using global variables everywhere.
That's just one problem. How can you link to a precompiled object file and be sure it won't corrupt the memory of your procedures?
Because C (and most other languages) support recursion, so a function can call itself, and each call of the function needs separate copies of any local variables. Also, on most current processors, your way would actually be slower -- indirect addressing is so common that processors are optimized for it.
You seem to want the behavior of C (or at least that C allows) for string literals. There are good and bad points to this, such as the fact that even though you've defined a "variable", you can't actually modify its contents (without affecting other variables that are pointing at the same location).
The answers to your questions are mostly wrapped up in the different semantics of different storage classes
Google "data segment"
Think about the difference in behavior between global and local variables.
Think about how constant and non-constant variables have different requirements when functions are called repeatedly (or as Mehrdad says, recursively)
Think about the difference between static and non static automatic variables again in the context of multiple or recursive calls.
Since you are comparing assembler and c (which are very close together from an architectural standpoint), I'm inclined to say that you're describing micro-optimization, which is meaningless unless you profile the code to see if it performs better.
In general, programming languages are evolving towards a more declarative style (i.e. telling the computer what you want done, rather than how you want it done). When you program in an imperative language (like assembly or c), you specify in extreme detail how you want the problem solved. This gives the compiler little room to make optimization decisions on your behalf.
However, as the languages become more declarative, the compilers are getting smarter, because we are giving them the room they need to make more intelligent performance optimizations.
If every function would put its first variable at offset 0 and so on then you would have to change the memory mapping each time you enter a function (you could not allocate all variables to unique addresses if you want recursion). This is doable, but with current hardware it's very slow. Furthermore, the address translation performed by the virtual memory is not free either, it's actually quite complicated to implement this efficiently.
Addressing off ebp (or any other register) costs having a mux (to select the register) and an adder (to add the offset to the register). The time taken for this can often be overlapped with other operations.
If you want to be able to modify the static value you have to copy it to the stack. If you don't (saying it's 'const') then a good C compiler will no copy it to the stack.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Resources