Is there a tool to where I have spills in my c code?
I mean see what block of code potentially make a register move to memory.
EDIT: what is a spill:
In the process of compiling your code at some point you will have to do register allocation. The compiler will do an interference graph ( "variables" are nodes and they are connected if they are alive at the same time ). From this point there is a linear process that will do graph coloring: for each variable assign a register that wont interfere with other variables... If you don't have enough register to color the graph the algorithm will fail
and a variable(register) will be spilled ( moved to memory ).
From a software engineering point of view, this mean you should always minimize a variable live so you can minimize the chance of having a spill.
When you want to optimize code you should look for those kinds of things since a spill will give an extra time to read/write memory. I was looking for a tool or a compiler flag that could tell me where is spill so I can optimize.
I'm aware of no such tool.
Because decisions about spills vary from compiler to compiler, and version of the compiler and even by settings within a given version of a given compiler, any such tool would have to be tightly coupled to a compiler and would likely only support one.
On the other hand, you can always look at the generated assembly yourself and see if a given variable is spilled or not.
Generally either disassemble or compile to assembler instead of an object.
For specific compilers like gcc and llvm (where you have the source and can easily re-build the compiler), modify the compiler to print some sort of output to indicate how many times it had to spill, as you call it, to memory. Perhaps as you find the register allocation routine, you may find that the compiler already has such output. Personally I just disassemble or compile to assembler.
A generic assembler analysis tool is possible, but is it worth the effort? You would want to know where function/optimization boundaries are. You would want to distinguish volatile variables, or hardware registers where the write to ram was intentional. You could just look for stack based writes only. Or look for cases where there is a write to the stack that is not a push, where the register is destroyed on the next instruction. Actually it would be pretty easy to search for writes to a stack pointer relative address, with the next instruction destroying the register, with that stack based relative address being read back in a relatively nearby execution path where the stack frame has not been cleaned up in that execution path. Do I know of such a tool? Nope.
Related
During startup tests I am required to test all RAM locations using a galpat test, I have wrote the function to do this but run into a problem that the functions variables exist in RAM and therefore get mashed as part of the test.
What would be the best way around this?
A possible approach could be - especially taking care of the processor stack, .data and .bss is something you can avoid - but there is no easy way to have C work without a proper stack:
Write your code that it exclusively uses ROM code and stack variables.
Have your startup code (that would normally be written in assembler anyhow) allocate the stack in the upper half of memory, test the lower half
move (copy) the stack from upper memory to already tested areas (can be done in C)
Reset the stack pointer to point to the copied stack (involves assembler coding)
Do the rest of the memtest (can be done in C again)
(This assumes your code runs from ROM, which it would normally do at such early point in the start-up). In case there is any memory failure in areas where you allocate the stack, your code will simply crash (What it does before that - Reactor meltdown... - depends on the application).
When moving the stack around outside C's control, you should be very careful what you actually store there - Pointers to stack variables will become invalid - or rather undefined - once you've moved the stack. Simple scalar variables and pointers to outside the stack as typically used in a RAM test should work, however.
You could try and declare your variables as register to try and keep RAM usage as low as possible - But you can't force C to put certain variables into registers, and a good C compiler will put them there anyhow.
Whether this is any better than writing the whole memtest in assembler (you'd need to do the stack adjustments in assembler anyhow, as there is no means to move the processor stack around in C) I dare to challenge. I don't see much of a point here using C on this low level, especially as assembler could run a memtest routine completely from registers, without using any RAM. This makes it much more immune to any RAM problem. A RAM testing routine shouldn't rely on working memory.
The bottom line is you cannot, you have two choices, you can either only test part of the ram or part of the ram at a time which means you are not doing a complete address test. Or you dont run from the ram you are testing, which is basically the rule if you really want to test the ram. So you have to run from rom with out using a stack in the ram under test or you use another ram, perhaps there is a cache somewhere that can be used direct access to give you a little ram.
Testing half or some other fraction at a time, which is not a complete test, but better than not testing part of it at all, can be done with either a position independent module or with multiple compiles of the test that are position dependent. No reason for the stack to be an issue, the rom based code copying and jumping to the code under test can set the stack pointer based on the fraction under test or not under test, and then repeat. Treat the module like a function not like an entire program and the preserving the stack or "moving the test" problem goes away it returns back to the rom based code which can relaunch further tests.
One crazy way to do it would be to attempt to turn on the I cache get the test code into the cache (before it hits itself), and then blast away at the ram including the code behind the ram. (cant have stack) I would only try this as a fun experiment but not for anything real. Lots of problems to solve with an approach like this.
My approach would be this:
Do the test before anything is initialized (variables, stack). In gcc you can: void RAM_test(void) _attribute_ ((section (".init0")));
Write it in assembly. Ensure that the test does not use/store any variables in the RAM, only uses processor registers.
Store the result somewhere so you can use it later in normal program.
If you do have ROM space and can afford the time at boot time, I would implement the test in assembly or in a naked C function using directives to put your variables in registers so that no RAM is consumed as part of the test. This is going to be pretty architecture and compiler specific, though, and neither of those were mentioned.
In a C program that doesn't use recursion, it should be possible in theory to work out the maximum/worst case stack size needed to call a given function, and anything that it calls. Are there any free, open source tools that can do this, either from the source code or compiled ELF files?
Alternatively, is there a way to extract a function's stack frame size from an ELF file, so I can try to work it out manually?
I'm compiling for the MSP430 using MSPGCC 3.2.3 (I know it's an old version, but I have to use it in this case). The stack space to allocate is set in the source code, and should be as small as possible so that the rest of memory can be used for other things. I have read that you need to take account of the stack space used by interrupts, but the system I'm using already takes account of this - I'm trying to work out how much extra space to add on top of that. Also, I've read that function pointers make this difficult. In the few places where function pointers are used here, I know which functions they can call, so could take account of these cases manually if the stack space needed for the called functions and the calling functions was known.
Static analysis seems like a more robust option than stack painting at runtime, but working it out at runtime is an option if there's no good way to do it statically.
Edit:
I found GCC's -fstack-usage flag, which saves the frame size for each function as it is compiled. Unfortunately, MSPGCC doesn't support it. But it could be useful for anyone who is trying to do something similar on a different platform.
While static analysis is the best method for determining maximum stack usage you may have to resort to an experimental method. This method cannot guarantee you an absolute maximum but can provide you with a very good idea of your stack usage.
You can check your linker script to get the location of __STACK_END and __STACK_SIZE. You can use these to fill the stack space with an easily recognizable pattern like 0xDEAD or 0xAA55. Run your code through a torture test to try and make sure as many interrupts are generated as possible.
After the test you can examine the stack space to see how much of the stack was overwritten.
Interesting question.
I would expect this information to be statically available in the debugging data included in debug builds.
I had a brief look at the DWARF standard, and it does specify two attributes for functions called DW_AT_frame_base and DW_AT_static_link which can be used to "computes the frame
base of the relevant instance of the subroutine
that immediately encloses the subroutine or entry point".
I think that the only to go is by static analysis. You need to account the space for all non-static local variables, which are going to be mostly pointers, but pointers that are going to be stored in the stack anyway, you'll need also to reserve space for the current running address within the caller, as it's going to be stored by the compiler on the stack so control can be return to the caller after your function returns, and also, you need space for all your function parameters.
Based on that, if you have a tool able to count all parameters, auto variables and figure out their size, you should be able to calculate the minimum stack frame size you'll need.
Please note that the compiler could also try to align values on the stack for your particular architecture, what could make the stack space requirements a little bigger that what you'd expect from this calculation.
Some embedded IDE can give info on stack usageduring runtime
I know that IAR eembedded workbench supports it.
Be aware that you need to take in account that interrupts occur asynchronously, so take the biggest stack usage scenario and add interrupt context to it. If nested interrupts are supported like in ARM processors you need to take this in account also.
TinyOS has some work done on stack size analysis. It is described here:
http://tinyos.stanford.edu/tinyos-wiki/index.php/Stack_Analysis
They only support AVR, but say that "MSP430 is not difficult to support but this is not super high priority". In any case, the page provides lots of resources.
I just to want to get an idea about how the register variables are handled in C program executables. ie in which location(or register) it exactly get stored in case of an embedded system and in a X86 machine(C program executable in a desktop PC)?
What about this view? (correct me if am wrong)
Suppose we have declared/initialized one variable inside a function as 'int' datatype. Normally it will go to the stack segment and it will be there in that section only at run time ,when the caller calls the callee containing the local variable. But if we declare above local variable as 'register int' then also it'll go to the stack segment. But on run time , the processor put that local variable from stack to its general purpose register locations(because of extra compiler inserted code due to 'register' keyword) and a fast access of the same from there.
That is the only difference between them is at run time access and there is no memory loading differences between them.
__Kanu
The register keyword in C (rarely ever seen anymore) is only a hint to the compiler that it may be useful to keep a variable in a register for faster access.
The compiler is free to ignore the hint, and optimize as it sees best.
Since modern compilers are much better than humans at understanding usage and speed, the register keyword is usually ignored by modern compilers, and in some cases, may actually slow down execution speed.
From K&R C:
A register variable advises the
compiler that the variable in question
will be heavily used. The idea is that
register variables are to be placed in
machine registers, which may result in
smaller & faster programs. But
compilers are free to ignore this
advice.
It is not possible to take the address of a register variable, regardless of whether the variable is actually placed in a register.
Hence,
register int x;
int *y = &x; // is illegal
So, you must weigh in the cons of not being able to get the address of the register variable.
In addition to crypto's answer (that has my vote) just see the name register for the keyword as a historical misnomer. It has not much to do with registers as you learn it in class e.g for the von Neumann processor model, but is just a hint to the compiler that this variable doesn't need an address.
On modern machines an addressless variable can be realized by different means (e.g an immediate assembler operator) or optimized away completely. Tagging a variable as register can be a useful optimization hint for the compiler and also a useful discipline for the programmer.
When a compiler takes its internal code and the backend turns it into machine/assembler for the target processor, it keeps track of the registers it is generating instructions for as it creates the code. When it needs to allocate a register to load or keep track of a variable if there is an unused working variable then it marks it as used and generates the instructions using that register. But if all the working registers have something in them then it will usually evict the contents of one of those registers somewhere, often ram for example global memory or the stack if that variable had a home. The compiler may or may not be smart about that decision and may evict a variable that is highly used. By using the register keyword, depending on the compiler, you may be able to influence that decision, it may choose to keep the register keyword variables in registers and evict non-register keyword variables to memory as needed.
which location(or register) it exactly get stored in case of an embedded system and in a > X86 machine(C program executable in a desktop PC)?
You don't know without opening up the assembly output, which will be liable to shift based on compiler choices. It's a good idea to check the assembly just for educational purposes.
If you need to read and write particular registers that precisely, you should write inline assembly or link in an assembly module.
Typically when using a standard C compiler for x86/amd64 (gcc, icc, cl), you can reasonably assume that the compiler will optimize sufficiently well for most purposes.
If, however, you are using a non-standard compiler, e.g., one cooked up for a new embedded system, it is a good idea to consider hand optimization. If the architecture is new, it might also be a good idea to consider hand optimization.
for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example:
1: Stack local scope memory allocation:
So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster.
2: Variable allocation duplicity:
When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?
How would you implement recursive functions? What you are describing is equivalent to using global variables everywhere.
That's just one problem. How can you link to a precompiled object file and be sure it won't corrupt the memory of your procedures?
Because C (and most other languages) support recursion, so a function can call itself, and each call of the function needs separate copies of any local variables. Also, on most current processors, your way would actually be slower -- indirect addressing is so common that processors are optimized for it.
You seem to want the behavior of C (or at least that C allows) for string literals. There are good and bad points to this, such as the fact that even though you've defined a "variable", you can't actually modify its contents (without affecting other variables that are pointing at the same location).
The answers to your questions are mostly wrapped up in the different semantics of different storage classes
Google "data segment"
Think about the difference in behavior between global and local variables.
Think about how constant and non-constant variables have different requirements when functions are called repeatedly (or as Mehrdad says, recursively)
Think about the difference between static and non static automatic variables again in the context of multiple or recursive calls.
Since you are comparing assembler and c (which are very close together from an architectural standpoint), I'm inclined to say that you're describing micro-optimization, which is meaningless unless you profile the code to see if it performs better.
In general, programming languages are evolving towards a more declarative style (i.e. telling the computer what you want done, rather than how you want it done). When you program in an imperative language (like assembly or c), you specify in extreme detail how you want the problem solved. This gives the compiler little room to make optimization decisions on your behalf.
However, as the languages become more declarative, the compilers are getting smarter, because we are giving them the room they need to make more intelligent performance optimizations.
If every function would put its first variable at offset 0 and so on then you would have to change the memory mapping each time you enter a function (you could not allocate all variables to unique addresses if you want recursion). This is doable, but with current hardware it's very slow. Furthermore, the address translation performed by the virtual memory is not free either, it's actually quite complicated to implement this efficiently.
Addressing off ebp (or any other register) costs having a mux (to select the register) and an adder (to add the offset to the register). The time taken for this can often be overlapped with other operations.
If you want to be able to modify the static value you have to copy it to the stack. If you don't (saying it's 'const') then a good C compiler will no copy it to the stack.
I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.