struct epoll_event memset or no memset? - c

When browsing through code on the Internet, I often see snippets like these:
struct epoll_event event;
memset(&event, 0, sizeof(event));
This pattern seems needless to me, if event is filled out in full, but it is widespread. Perhaps to take into account possible future changes of the struct?

This is surely just bad copy-and-paste coding. The man page for epoll does not document any need to zero-initialize the epoll_event structure, and does not do so in the examples. Future changes to the struct do not even seem to be possible (ABI), but if they were, the contract would clearly be that any parts of the structure not related to the events you requested would be ignored (and not even read, since the caller may be passing a pointer to storage that does not extend past the original definition).
Also, in general it's at best pointless and at worst incorrect/nonportable to use memset when a structure is supposed to be zero-initialized, since the zero representation need not be the zero value (for pointer and floating point types). Nowadays this generality is mostly a historical curiosity, and not relevant to a Linux-specific interface like epoll anyway, but it comes up as well with mbstate_t which exists in fully general C, and where zero initialization is required to correctly use the associated interfaces. The correct way to zero-initialize things that need zero values, rather than all-zero-bytes representations, is with the universal zero initializer, { 0 }.

Using memset like this can help you locate bugs faster. Consider it a defensive (even secure) style of programming.
Lets say you didn't use memset, and instead attempt to diligently fill in each member as documented by the API. But if you ever forget to fill in a field (or a later API change leads to the addition of a new field), then the value that field takes at run-time is undefined; and in practice will use whatever the memory previously held.
What are the consequences?
If you are lucky, your code will immediately fail in a nice way that can be debugged, for example, if the unset field needs a highly specific value.
If you are unlucky, your code may still work, and it may work for years. Perhaps on your current operating system the program memory somehow already held the correct value expected by the API. But as you move your code across systems and compilers, expect confusing behavior: "it works on my machine, but I don't understand why it doesn't work on yours".
So in this case, memset is helping you avoid this undeterministic behavior.
Of course, you can still profile your code, check for undefined memory, unit tests etc. Doing memset is not a replacement for those. It's just another technique to get to safe software.

Related

Is it better to redeclare a struct inside function or declare it static and set to 0 everytime?

Basically, if I have a struct like:
struct header {
char ptr[512];
};
and I have a function like so:
void some_function() {
struct header header = { 0 };
// do something with struct
}
Would it actually benefit performance-wise to do it like this:
void some_function() {
static struct header header;
memset((char *)&header, 0, sizeof(header));
// do something with struct
}
I know memset doesn't always work if the struct contains pointers since NULL might not be located at address 0x0000, but for this case when this doesn't matter, what is the better way to do it?
If the C program specified an actual static object or an object automatically allocated on the stack, the performance of these two pieces of code would be nearly identical. There might be minuscule performance differences based on how one was addressed or some performance differences based on where they happened to be allocated with respect to other data and cache properties. (In particular, the automatic version might have better properties since the memory is not reserved exclusively for the structure. It would be shared with other data while other functions are executing instead of some_function, and therefore it might reside in cache more often and result in fewer memory accesses. Additionally, since it would be shared with other functions, the entire program might use less memory overall, which improves performance.)
However, C programs do not directly specify what a computer must do (although some C implementations may be implemented that way or have switches to do it, or something near it). Per the C standard, a C program specifies an imaginary computation in an abstract machine. The job of a C compiler is to translate that computation into a program for a real machine. It is given great latitude to do so.
One thing this means is that if a compiler and see and sufficiently analyze enough of the source code to see that the two versions of the function behave identically (in terms of observable behavior), it can translate them to identical code. (Observable behavior includes input and output interactions, access to volatile objects, and data written to files.) In that case, there is no performance difference.
If anything, the automatic version is easier for the compiler to analyze. It knows the automatic object will vanish (in the abstract machine) when the function ends. Although, in both cases you clear the object at the start of the function, so the compiler, assuming knowledge about memset has been built into it, knows the object begins anew in this regard each time the function starts, there are other ways the behavior could differ that a compiler writer has to worry about. For example, if the address of the static structure is taken, and especially if it is passed to any other routines, the compiler has to be concerned that the data in it might be used after the function returns, by other code that has retained the address of it. In contrast, for the automatic structure, the compiler may behave as if the automatic object is never used after the function returns because, in the abstract machine, it ceases to exist when the function returns. (Therefore, if any other code did retain its address, the use of that address is not defined by the C standard, and the compiler does not have to do anything for it.)
So, except in esoteric circumstances or mere happenstance of memory and cache behavior, we can generally expect the automatic version to be at least as good as the static version.
In general, write software to express what you need to—and only what you need to. If an object does not need to persist beyond the function’s lifetime, then leave it as an automatic object and do not make it static.
Note that it is often unnecessary to zero all of such a structure anyway, because:
The part of the structure that is used might be indicated with a length or a sentinel (such as a null character marking the end), and so no software will attempt to read any later portion, so there is no need to initialize it.
Or, if all of the structure will be read, then the software could be designed to fill in the non-zero part and then zero only the remaining part, instead of first zeroing the entire structure.

Filler at the end of a struct for future growth

This document called "Good Practices in Library Design, Implementation, and Maintenance" by Ulrich Drepper says (bottom of page 5):
[...] the type definition should always create at least a minimal
amount of padding to allow future growth
[...]
Second, a structure should contain at the end a certain number of fill bytes.
struct the_struct
{
int foo;
// ...and more fields
uintptr_t filler[8];
};
[...]
If at a later time a field has to be added to the structure the type definition can be changed to this:
struct the_struct
{
int foo;
// ...and more fields
union
{
some_type_t new_field;
uintptr_t filler[8];
} u;
};
I don't see the point of adding this filler at the end of the structure. Yes it means that when adding a new field (new_field) to the structure, it doesn't actually grow. But isn't the whole point of adding new fields to a structure that you didn't knew you were going to need them? In this example, what if you want to add not one field but 20? Should you then use a filler of 1k bytes just in case? Also, why does is it important that the size of a struct doesn't change in subsequent versions of a library? If the library provides clean abstractions, that shouldn't matter right? Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not necessarily 64 bytes)) sounds like a waste of memory...
The document doesn't go into the details of this at all. Would you have any explanations to why this advice "adding fillers at the end of struct to plan for future growth" is a good one?
Depending on circumstances, yes, the size of the structure can be important for binary compatibility.
Consider stat(). It's typically called like this:
struct stat stbuf;
int r = stat(filename, &stbuf);
With this setup, if the size of the stat structure ever changes, every caller becomes invalid, and will need to be recompiled. If both the called and the calling code are part of the same project, that may not be a problem. But if (as in the case of stat(), which is a system call into the Unix/Linux kernel) there are lots and lots of callers out there, it's practically impossible to force them all to recompile, so the implication is that the size of the stat structure can never be changed.
This sort of problem mainly arises when the caller allocates (or inspects/manipulates) actual instances of the structure. If, on the other hand, the insides of the structure are only ever allocated and manipulated by library code -- if calling code deals only with pointers to the struct, and doesn't try to interpret the pointed-to structures -- it may not matter if the structure changes.
(Now, with all of that said, there are various other things that can be done to mitigate the issues if a struct has to change size. There are libraries where the caller allocates instances of a structure, but then passes both a pointer to the structure, and the size of the structure as the caller knows it, down into the library code. Newer library code can then detect a mismatch, and avoid setting or using newer fields which an older caller didn't allocate space for. And I believe gcc, at least, implements special hooks so that glibc can implement multiple versions of the same structure, and multiple versions of the library functions that use them, so that the correct library function can be used corresponding to the version of the structure that a particular caller is using. Going back to stat(), for example, under Linux there are at least two different versions of the stat structure, one which allocates 32 bits for the file size and one which allocates 64.)
But isn't the whole point of adding new fields to a structure that you
didn't knew you were going to need them?
Well yes, if you knew all along that you would need those members, then it would be counter-productive to intentionally omit them. But sometimes you indeed discover only later that you need some additional fields. Drepper's recommendations speak to ways to design your code -- specifically your structure definitions -- so that you can add members with the minimum possible side effects.
In this example, what if you
want to add not one field but 20?
You don't start out saying "I'm going to want to add 20 members". Rather, you start out saying "I may later discover a need for some more members." That's a prudent position to take.
Should you then use a filler of 1k
bytes just in case?
That's a judgment call. I recon that a KB of extra space in the structure definition is probably overkill in most cases, but there might be a context where that's reasonable.
Also, why does is it important that the size of a
struct doesn't change in subsequent versions of a library? If the
library provides clean abstractions, that shouldn't matter right?
How important it is that the size remains constant is a subjective question, but the size is indeed relevant to binary compatibility for shared libraries. Specifically, the question is whether I can drop a new version of the shared lib in place of the old one, and expect existing programs to work with the new one without recompilation.
Technically, if the definition of the structure changes, even without its size changing, then the new definition is incompatible with the old one as far as the C language is concerned. In practice, however, with most C implementations, if the structure size is the same and the layout does not change except possibly within previously-unused space, then existing users will not notice the difference in many operations.
If the size does change, however, then
dynamic allocation of instances of the structure will not allocate the correct amount of space.
arrays of the structure will not be laid out correctly.
copying from one instance to another via memcpy() will not work correctly.
binary I/O involving instances of the structure will not transfer the correct number of bytes.
There are likely other things that could go wrong with a size change that would (again, in practice) be ok under conversion of some trailing padding into meaningful members.
Do note: one thing that might still be a problem if the structure members change without the overall size changing is passing structures to functions by value and (somewhat less so) receiving them as return values. A library making use of this approach to provide for binary compatibility would do well to avoid providing functions that do those things.
Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not
necessarily 64 bytes)) sounds like a waste of memory...
In a situation in which those 64 bytes per structure is in fact a legitimate concern, then that might trump binary compatibility concerns. That would be the case if you anticipate a very large number of those structures to be in use at the same time, or if you are extremely memory-constrained. In many cases, however, the extra space is inconsequential, whereas the extra scope for binary compatibility afforded by including padding is quite valuable.
The document doesn't go into the details of this at all. Would you
have any explanations to why this advice "adding fillers at the end of
struct to plan for future growth" is a good one?
Like most things, the recommendation needs to be evaluated relative to your particular context. In the foregoing, I've touched on most of the points you would want to consider in such an evaluation.

Using Structs in Functions

I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.
This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton
First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.
Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.
In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.

Which functions in the C standard library commonly encourage bad practice? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.
Specifically, to quote R..
strncpy does not null-terminate, and
does null-pad the whole remainder of
the destination buffer, which is a
huge waste of time. You can work
around the former by adding your own
null padding, but not the latter. It
was never intended for use as a "safe
string handling" function, but for
working with fixed-size fields in Unix
directory tables and database files.
snprintf(dest, n, "%s", src) is the
only correct "safe strcpy" in standard
C, but it's likely to be a lot slower.
By the way, truncation in itself can
be a major bug and in some cases might
lead to privilege elevation or DoS, so
throwing "safe" string functions that
truncate their output at a problem is
not a way to make it "safe" or
"secure". Instead, you should ensure
that the destination buffer is the
right size and simply use strcpy (or
better yet, memcpy if you already know
the source string length).
And from Jonathan Leffler
Note that strncat() is even more
confusing in its interface than
strncpy() - what exactly is that
length argument, again? It isn't what
you'd expect based on what you supply
strncpy() etc - so it is more error
prone even than strncpy(). For copying
strings around, I'm increasingly of
the opinion that there is a strong
argument that you only need memmove()
because you always know all the sizes
ahead of time and make sure there's
enough space ahead of time. Use
memmove() in preference to any of
strcpy(), strcat(), strncpy(),
strncat(), memcpy().
So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:
What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?
In the interests of objectivity, I have a number of criteria for an answer:
Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
Please highlight the misuse to which the code is currently put.
Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.
Please avoid:
Debates over naming conventions of functions (except where this unequivocably causes confusion).
"I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.
As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.
I am also working as per C99.
What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies ?
I'm gonna go with the obvious :
char *gets(char *s);
With its remarkable particularity that it's simply impossible to use it appropriately.
A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.
Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.
The CERT C Secure Coding Standard lists many of these pitfalls you asked about.
In almost all cases, atoi() should not be used (this also applies to atof(), atol() and atoll()).
This is because these functions do not detect out-of-range errors at all - the standard simply says "If the value of the result cannot be represented, the behavior is undefined.". So the only time they can be safely used is if you can prove that the input will certainly be within range (for example, if you pass a string of length 4 or less to atoi(), it cannot be out of range).
Instead, use one of the strtol() family of functions.
Let us extend the question to interfaces in a broader sense.
errno:
technically it is not even clear what it is, a variable, a macro, an implicit function call? In practice on modern systems it is mostly a macro that transforms into a function call to have a thread specific error state. It is evil:
because it may cause overhead for the
caller to access the value, to check the "error" (which might just be an exceptional event)
because it even imposes at some places that the caller clears this "variable" before making a library call
because it implements a simple error
return by setting a global state, of the library.
The forthcoming standard gets the definition of errno a bit more straight, but these uglinesses remain
There is often a strtok_r.
For realloc, if you need to use the old pointer, it's not that hard to use another variable. If your program fails with an allocation error, then cleaning up the old pointer is often not really necessary.
I would put printf and scanf pretty high up on this list. The fact that you have to get the formatting specifiers exactly correct makes these functions tricky to use and extremely easy to get wrong. It's also very hard to avoid buffer overruns when reading data out. Moreover, the "printf format string vulnerability" has probably caused countless security holes when well-intentioned programmers specify client-specified strings as the first argument to printf, only to find the stack smashed and security compromised many years down the line.
Any of the functions that manipulate global state, like gmtime() or localtime(). These functions simply can't be used safely in multiple threads.
EDIT: rand() is in the same category it would seem. At least there are no guarantees of thread-safety, and on my Linux system the man page warns that it is non-reentrant and non-threadsafe.
One of my bêtes noire is strtok(), because it is non-reentrant and because it hacks the string it is processing into pieces, inserting NUL at the end of each token it isolates. The problems with this are legion; it is distressingly often touted as a solution to a problem, but is as often a problem itself. Not always - it can be used safely. But only if you are careful. The same is true of most functions, with the notable exception of gets() which cannot be used safely.
There's already one answer about realloc, but I have a different take on it. A lot of time, I've seen people write realloc when they mean free; malloc - in other words, when they have a buffer full of trash that needs to change size before storing new data. This of course leads to potentially-large, cache-thrashing memcpy of trash that's about to be overwritten.
If used correctly with growing data (in a way that avoids worst-case O(n^2) performance for growing an object to size n, i.e. growing the buffer geometrically instead of linearly when you run out of space), realloc has doubtful benefit over simply doing your own new malloc, memcpy, and free cycle. The only way realloc can ever avoid doing this internally is when you're working with a single object at the top of the heap.
If you like to zero-fill new objects with calloc, it's easy to forget that realloc won't zero-fill the new part.
And finally, one more common use of realloc is to allocate more than you need, then resize the allocated object down to just the required size. But this can actually be harmful (additional allocation and memcpy) on implementations that strictly segregate chunks by size, and in other cases might increase fragmentation (by splitting off part of a large free chunk to store a new small object, instead of using an existing small free chunk).
I'm not sure if I'd say realloc encourages bad practice, but it's a function I'd watch out for.
How about the malloc family in general? The vast majority of large, long-lived programs I've seen use dynamic memory allocation all over the place as if it were free. Of course real-time developers know this is a myth, and careless use of dynamic allocation can lead to catastrophic blow-up of memory usage and/or fragmentation of address space to the point of memory exhaustion.
In some higher-level languages without machine-level pointers, dynamic allocation is not so bad because the implementation can move objects and defragment memory during the program's lifetime, as long as it can keep references to these objects up-to-date. A non-conventional C implementation could do this too, but working out the details is non-trivial and it would incur a very significant cost in all pointer dereferences and make pointers rather large, so for practical purposes, it's not possible in C.
My suspicion is that the correct solution is usually for long-lived programs to perform their small routine allocations as usual with malloc, but to keep large, long-lived data structures in a form where they can be reconstructed and replaced periodically to fight fragmentation, or as large malloc blocks containing a number of structures that make up a single large unit of data in the application (like a whole web page presentation in a browser), or on-disk with a fixed-size in-memory cache or memory-mapped files.
On a wholly different tack, I've never really understood the benefits of atan() when there is atan2(). The difference is that atan2() takes two arguments, and returns an angle anywhere in the range -π..+π. Further, it avoids divide by zero errors and loss of precision errors (dividing a very small number by a very large number, or vice versa). By contrast, the atan() function only returns a value in the range -π/2..+π/2, and you have to do the division beforehand (I don't recall a scenario where atan() could be used without there being a division, short of simply generating a table of arctangents). Providing 1.0 as the divisor for atan2() when given a simple value is not pushing the limits.
Another answer, since these are not really related, rand:
it is of unspecified random quality
it is not re-entrant
Some of this functions are modifying some global state. (In windows) this state is shared per single thread - you can get unexpected result. For example, the first call of rand in every thread will give the same result, and it requires some care to make it pseudorandom, but deterministic (for debug purposes).
basename() and dirname() aren't threadsafe.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.

Resources