Related
I'm trying to optimize a C code project.
I would like to count how many times a global variable was used (read or write) in order to place it at the most suitable memory type.
For example, to store commonly used variables at the fast access memory type.
Data cache is disabled for determenistic reasons.
Is there a way to count how many times a variable was used without inserting counters or adding extra code? for example, using the assembly code?
The code is written in C.
In my possession:
A) (.map) file, generated by the GCC compiler from which I extracts the global variables names, addresses and sizes.
B) The assembly code of the project generated using the GCC compiler -S flag.
Thanks a lot,
GDB has something called watchpoints: https://sourceware.org/gdb/onlinedocs/gdb/Set-Watchpoints.html
Set a watchpoint for an expression. GDB will break when the expression
expr is written into by the program and its value changes. The
simplest (and the most popular) use of this command is to watch the
value of a single variable:
(gdb) watch foo
awatch [-l|-location] expr [thread thread-id] [mask maskvalue]
Set a watchpoint that will break when expr is either read from or written
into by the program.
Commands can be attached to watchpoints: https://sourceware.org/gdb/onlinedocs/gdb/Break-Commands.html#Break-Commands
You can give any breakpoint (or watchpoint or catchpoint) a series of commands to execute when your program stops due to that breakpoint… For example, here is how you could use breakpoint commands to print
the value of x at entry to foo whenever x is positive.
break foo if x>0
commands
silent
printf "x is %d\n",x
cont
end
The command should typically increment a variable or print "read/write" to a file, but you can really add other stuff too such as a backtrace. Unsure about the best way for outward communication using gdb. Maybe it is good enough for you to run it in interactive mode.
You can do this, using Visual Studio (or an other IDE): search for all places where your variable is used in the source code, put a conditional breakpoint, logging some information, attach to process, and launch the features which use that variable. You can count the instances in the output window.
I think, what you need is automatic instrumentation and/or profiling. GCC can actually do profile-guided optimization for you. As well as other types of instrumentation, the documentation even mentions a hook for implementing your own custom instrumentation.
There are several performance analysis tools out there such as perf and gprof profilers.
Also, execution inside a virtual machine could (at least in theory) do what you are after. valgrind comes to mind. I think, valgrind actually knows about all memory accesses. I'd look for ways to obtain this informaiton (and then corellate that with the map files).
I don't know if any of the above tools solves exactly your problem, but you definitely could use, say, perf (if it's available for your platform) to see in what areas of code significant time is spent. Then probably there are either a lot of expensive memory accesses, or just intensive computations, you can figure out which is the case by staring at the code.
Note that the compiler already allocates frequently accessed variables to registers, so the kind of information you are after won't give you an accurate picture. I.e. while some variable might be accessed a lot, cache-allocating it might not improve things much if its value already lives on a register most of the time.
Also consider that optimization affects your program greatly on the assembly level. So any performance statistics such as memory accesses counters would be different with and without optimization. And what should be of interest to you is the optimized case. On the other hand, restoring the information about what location corresponds to which variable is harder with the optimized program if feasible at all.
Of course, symbol and type information of each variable defined in a C/C++ program is available, otherwise the debuggers could not show them. But how to access this information?
A lot info about the elf is available, but that is about linking an seems to hold only global variables, not local ones on the stack i.e.
In a remote real time system (not under unix), I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
The best would be that the dump could be introduced at any time for any variable without the need to add some statements in the code upfront.
But how to access this information?
TL;DR: it's complicated.
You would need to build almost a complete debugger. You can watch this space. When the author gets around to step 9, you'll have an example to follow.
I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
RT systems do not usually lend themselves to easy debugging. The best you could probably do is take a snapshot of the entire (used portion of) the stack, and "fish out" variable values later.
To do that, you'll need to know current values of the stack pointer and instruction pointer, contents of the stack, and load addresses of all ELF objects. And you'll need to re-implement large part of a debugger (or modify existing one).
The easiest approach might be to convert (post-process) the above info into an ELF core, and then use existing debugger of your choice to analyse the values. You can use Google user-space coredumper to see what's involved. See also this answer.
In a C program that doesn't use recursion, it should be possible in theory to work out the maximum/worst case stack size needed to call a given function, and anything that it calls. Are there any free, open source tools that can do this, either from the source code or compiled ELF files?
Alternatively, is there a way to extract a function's stack frame size from an ELF file, so I can try to work it out manually?
I'm compiling for the MSP430 using MSPGCC 3.2.3 (I know it's an old version, but I have to use it in this case). The stack space to allocate is set in the source code, and should be as small as possible so that the rest of memory can be used for other things. I have read that you need to take account of the stack space used by interrupts, but the system I'm using already takes account of this - I'm trying to work out how much extra space to add on top of that. Also, I've read that function pointers make this difficult. In the few places where function pointers are used here, I know which functions they can call, so could take account of these cases manually if the stack space needed for the called functions and the calling functions was known.
Static analysis seems like a more robust option than stack painting at runtime, but working it out at runtime is an option if there's no good way to do it statically.
Edit:
I found GCC's -fstack-usage flag, which saves the frame size for each function as it is compiled. Unfortunately, MSPGCC doesn't support it. But it could be useful for anyone who is trying to do something similar on a different platform.
While static analysis is the best method for determining maximum stack usage you may have to resort to an experimental method. This method cannot guarantee you an absolute maximum but can provide you with a very good idea of your stack usage.
You can check your linker script to get the location of __STACK_END and __STACK_SIZE. You can use these to fill the stack space with an easily recognizable pattern like 0xDEAD or 0xAA55. Run your code through a torture test to try and make sure as many interrupts are generated as possible.
After the test you can examine the stack space to see how much of the stack was overwritten.
Interesting question.
I would expect this information to be statically available in the debugging data included in debug builds.
I had a brief look at the DWARF standard, and it does specify two attributes for functions called DW_AT_frame_base and DW_AT_static_link which can be used to "computes the frame
base of the relevant instance of the subroutine
that immediately encloses the subroutine or entry point".
I think that the only to go is by static analysis. You need to account the space for all non-static local variables, which are going to be mostly pointers, but pointers that are going to be stored in the stack anyway, you'll need also to reserve space for the current running address within the caller, as it's going to be stored by the compiler on the stack so control can be return to the caller after your function returns, and also, you need space for all your function parameters.
Based on that, if you have a tool able to count all parameters, auto variables and figure out their size, you should be able to calculate the minimum stack frame size you'll need.
Please note that the compiler could also try to align values on the stack for your particular architecture, what could make the stack space requirements a little bigger that what you'd expect from this calculation.
Some embedded IDE can give info on stack usageduring runtime
I know that IAR eembedded workbench supports it.
Be aware that you need to take in account that interrupts occur asynchronously, so take the biggest stack usage scenario and add interrupt context to it. If nested interrupts are supported like in ARM processors you need to take this in account also.
TinyOS has some work done on stack size analysis. It is described here:
http://tinyos.stanford.edu/tinyos-wiki/index.php/Stack_Analysis
They only support AVR, but say that "MSP430 is not difficult to support but this is not super high priority". In any case, the page provides lots of resources.
Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
:your_func_epilogue
mov esp, ebp
pop ebp
ret
:end_of_func
;expect a varying length run of NOPs here
:next_func_prologue
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
http://www.motherboardpoint.com/getting-code-size-function-c-t95049.html
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html#Function-Specific-Option-Pragmas
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.
Is there a way to know and output the stack size needed by a function at compile time in C ?
Here is what I would like to know :
Let's take some function :
void foo(int a) {
char c[5];
char * s;
//do something
return;
}
When compiling this function, I would like to know how much stack space it will consume whent it is called. This might be useful to detect the on stack declaration of a structure hiding a big buffer.
I am looking for something that would print something like this :
file foo.c : function foo stack usage is n bytes
Is there a way not to look at the generated assembly to know that ? Or a limit that can be set for the compiler ?
Update : I am not trying to avoid runtime stack overflow for a given process, I am looking for a way to find before runtime, if a function stack usage, as determined by the compiler, is available as an output of the compilation process.
Let's put it another way : is it possible to know the size of all the objects local to a function ? I guess compiler optimization won't be my friend, because some variable will disappear but a superior limit is fine.
Linux kernel code runs on a 4K stack on x86. Hence they care. What they use to check that, is a perl script they wrote, which you may find as scripts/checkstack.pl in a recent kernel tarball (2.6.25 has got it). It runs on the output of objdump, usage documentation is in the initial comment.
I think I already used it for user-space binaries ages ago, and if you know a bit of perl programming, it's easy to fix that if it is broken.
Anyway, what it basically does is to look automatically at GCC's output. And the fact that kernel hackers wrote such a tool means that there is no static way to do it with GCC (or maybe that it was added very recently, but I doubt so).
Btw, with objdump from the mingw project and ActivePerl, or with Cygwin, you should be able to do that also on Windows and also on binaries obtained with other compilers.
StackAnlyser seems to examinate the executable code itself plus some debugging info.
What is described by this reply, is what I am looking for, stack analyzer looks like overkill to me.
Something similar to what exists for ADA would be fine. Look at this manual page from the gnat manual :
22.2 Static Stack Usage Analysis
A unit compiled with -fstack-usage will generate an extra file that specifies the maximum amount of stack used, on a per-function basis. The file has the same basename as the target object file with a .su extension. Each line of this file is made up of three fields:
* The name of the function.
* A number of bytes.
* One or more qualifiers: static, dynamic, bounded.
The second field corresponds to the size of the known part of the function frame.
The qualifier static means that the function frame size is purely static. It usually means that all local variables have a static size. In this case, the second field is a reliable measure of the function stack utilization.
The qualifier dynamic means that the function frame size is not static. It happens mainly when some local variables have a dynamic size. When this qualifier appears alone, the second field is not a reliable measure of the function stack analysis. When it is qualified with bounded, it means that the second field is a reliable maximum of the function stack utilization.
I don't see why a static code analysis couldn't give a good enough figure for this.
It's trivial to find all the local variables in any given function, and the size for each variable can be found either through the C standard (for built in types) or by calculating it (for complex types like structs and unions).
Sure, the answer can't be guaranteed to be 100% accurate, since the compiler can do various sorts of optimizations like padding, putting variables in registers or completely remove unnecessary variables. But any answer it gives should be a good estimate at least.
I did a quick google search and found StackAnalyzer but my guess is that other static code analysis tools have similar capabilities.
If you want a 100% accurate figure, then you'd have to look at the output from the compiler or check it during runtime (like Ralph suggested in his reply)
Only the compiler would really know, since it is the guy that puts all your stuff together. You'd have to look at the generated assembly and see how much space is reserved in the preamble, but that doesn't really account for things like alloca which do their thing at runtime.
Assuming you're on an embedded platform, you might find that your toolchain has a go at this. Good commercial embedded compilers (like, for example the Arm/Keil compiler) often produce reports of stack usage.
Of course, interrupts and recursion are usually a bit beyond them, but it gives you a rough idea if someone has committed some terrible screw-up with a multi megabyte buffer on the stack somewhere.
Not exactly "compile time", but I would do this as a post-build step:
let the linker create a map file for you
for each function in the map file read the corresponding part of the executable, and analyse the function prologue.
This is similar to what StackAnalyzer does, but a lot simpler. I think analysing the executable or the disassembly is the easiest way you can get to the compiler output. While the compiler knows those things internally, I am afraid you will not be able to get it from it (you might ask the compiler vendor to implement the functionality, or if using open source compiler, you could do it yourself or let someone do it for you).
To implement this you need to:
be able to parse map file
understand format of the executable
know what a function prologue can look like and be able to "decode" it
How easy or difficult this would be depends on your target platform. (Embedded? Which CPU architecture? What compiler?)
All of this definitely can be done in x86/Win32, but if you never did anything like this and have to create all of this from the scratch, it can take a few days before you are done and have something working.
Not in general. The Halting Problem in theoretical computer science suggests that you can't even predict if a general program halts on a given input. Calculating the stack used for a program run in general would be even more complicated. So: no. Maybe in special cases.
Let's say you have a recursive function whose recursion level depends on the input which can be of arbitrary length and you are already out of luck.