Why I want to add an address to the stack pointer? - c

I'm trying to understand the first microcorruption challenge.
I want to ask about the first line of the main function.
Why would they add that address to the stack pointer?

This looks like a 16-bit ISA1, otherwise the disassembly makes no sense.
0xff9c is -100 in 16-bit 2's complement, so it looks like this is reserving 100 bytes of stack space for main to use. (Stacks grow downward on most machines). It's not an address, just a small offset.
See MSP430 Assembly Stack Pointer Behavior for a detailed example of MSP430 stack layout and usage.
Footnote 1: MSP430 possibly? http://mspgcc.sourceforge.net/manual/x82.html it's a 16-bit ISA with those register names, and those mnemonics, and I think its machine code uses variable-length 2 or 4 byte instructions.
It's definitely not ARM; call and jmp are not ARM mnemonics; that would be bl and b. Also, ARM uses op dst, src1, src2 syntax, while this disassembly uses op src, dst.

Related

How to determine offset values when disassembly?

long long int i=57745158985; #the C code
0000000000100004: li r7,13
0000000000100008: lis r8,29153
000000000010000c: ori r8,r8,0x3349
0000000000100010: stw r7,24(rsp)
0000000000100014: stw r8,28(rsp)
0000000000100018: lfd fp0,24(rsp)
000000000010001c: stfd fp0,8(rsp)
Hello, when I disassemble the CodeWarrior c code, this code comes up. In the code, offset is given to the registers. I don't understand why the given offsets are 24, 28, and 8. How does it determine this? Would it work if it assigned any other value?
This depends on the ABI for your platform and architecture.
Assuming your platform is the PowerPC e500 Application Binary Interface.
The register in question (rsp) is the stack pointer. Those offsets are references to within the functions stack frame. To understand the stack frame of a function you need to consult the ABI.
If you look at Section 2.3 The Stack Frame of 1 you will see the layout for a functions stack frame. That is what the compiler is targeting when those offsets like 24 and 28 are being determined.

x86_64 : is stack frame pointer almost useless?

Linux x86_64.
gcc 5.x
I was studying the output of two codes, with -fomit-frame-pointer and without (gcc at "-O3" enables that option by default).
pushq %rbp
movq %rsp, %rbp
...
popq %rbp
My question is :
If I globally disable that option, even for, at the extreme, compiling an operating system, is there a catch ?
I know that interrupts use that information, so is that option good only for user space ?
The compilers always generate self consistent code, so disabling the frame pointer is fine as long as you don't use external/hand crafted code that makes some assumption about it (e.g. by relying on the value of rbp for example).
The interrupts don't use the frame pointer information, they may use the current stack pointer for saving a minimal context but this is dependent on the type of interrupt and OS (an hardware interrupt uses a Ring 0 stack probably).
You can look at Intel manuals for more information on this.
About the usefulness of the frame pointer:
Years ago, after compiling a couple of simple routines and looking at the generated 64 bit assembly code I had your same question.
If you don't mind reading a whole lot of notes I have written for myself back then, here they are.
Note: Asking about the usefulness of something is a little bit relative. Writing assembly code for the current main 64 bit ABIs I found my self using the stack frame less and less. However this is just my coding style and opinion.
I like using the frame pointer, writing the prologue and epilogue of a function, but I like direct uncomfortable answers too, so here's how I see it:
Yes, the frame pointer is almost useless in x86_64.
Beware it is not completely useless, especially for humans, but a compiler doesn't need it anymore.
To better understand why we have a frame pointer in the first place it is better to recall some history.
Back in the real mode (16 bit) days
When Intel CPUs supported only "16 bit mode" there were some limitation on how to access the stack, particularly this instruction was (and still is) illegal
mov ax, WORD [sp+10h]
because sp cannot be used as a base register. Only a few designated registers could be used for such purpose, for example bx or the more famous bp.
Nowadays it's not a detail everybody put their eyes on but bp has the advantage over other base register that by default it implicitly implicates the use of ss as a segment/selector register, just like implicit usages of sp (by push, pop, etc), and like esp does on later 32-bit processors.
Even if your program was scattered all across memory with each segment register pointing to a different area, bp and sp acted the same, after all that was the intent of the designers.
So a stack frame was usually necessary and consequently a frame pointer.
bp effectively partitioned the stack in four parts: the arguments area, the return address, the old bp (just a WORD) and the local variables area. Each area being identified by the offset used to access it: positive for the arguments and return address, zero for the old bp, negative for the local variables.
Extended effective addresses
As the Intel CPUs were evolving, the more extensive 32-bit addressing modes were added.
Specifically the possibility to use any 32-bit general-purpose register as a base register, this includes the use of esp.
Being instructions like this
mov eax, DWORD [esp+10h]
now valid, the use of the stack frame and the frame pointer seems doomed to an end.
Likely this was not the case, at least in the beginnings.
It is true that now it is possible to use entirely esp but the separation of the stack in the mentioned four areas is still useful, especially for humans.
Without the frame pointer a push or a pop would change an argument or local variable offset relative to esp, giving form to code that look non intuitive at first sight. Consider how to implement the following C routine with cdecl calling convention:
void my_routine(int a, int b)
{
return my_add(a, b);
}
without and with a framestack
my_routine:
push DWORD [esp+08h]
push DWORD [esp+08h]
call my_add
ret
my_routine:
push ebp
mov ebp, esp
push DWORD [ebp+0Ch]
push DWORD [ebp+08h]
call my_add
pop ebp
ret
At first sight it seems that the first version pushes the same value twice. It actually pushes the two separate arguments however, as the first push lowers esp so the same effective address calculation points the second push to a different argument.
If you add local variables (especially lots of them) then the situation quickly becomes hard to read: Does mov eax, [esp+0CAh] refer to a local variable or to an argument? With a stack frame we have fixed offsets for the arguments and local variables.
Even the compilers at first still preferred the fixed offsets given by the use of the frame base pointer. I see this behavior changing first with gcc.
In a debug build the stack frame effectively adds clarity to the code and makes it easy for the (proficient) programmer to follow what is going on and, as pointed out in the comment, lets them recover the stack frame more easily.
The modern compilers however are good at math and can easily keep count of the stack pointer movements and generate the appropriate offsets from esp, omitting the stack frame for faster execution.
When a CISC requires data alignment
Until the introduction of SSE instructions the Intel processors never asked much from the programmers compared to their RISC brothers.
In particular they never asked for data alignment, we could access 32 bit data on an address not a multiple of 4 with no major complaint (depending on the DRAM data width, this may result on increased latency).
SSE used 16 bytes operands that needed to be accessed on 16 byte boundary, as the SIMD paradigm becomes implemented efficiently in the hardware and becomes more popular the alignment on 16 byte boundary becomes important.
The main 64 bit ABIs now require it, the stack must be aligned on paragraphs (ie, 16 bytes).
Now, we are usually called such that after the prologue the stack is aligned, but suppose we are not blessed with that guarantee, we would need to do one of this
push rbp push rbp
mov rbp, rsp mov rbp, rsp
and spl, 0f0h sub rsp, xxx
sub rsp, 10h*k and spl, 0f0h
One way or another the stack is aligned after these prologues, however we can no longer use a negative offset from rbp to access local vars that need alignment, because the frame pointer itself is not aligned.
We need to use rsp, we could arrange a prologue that has rbp pointing at the top of an aligned area of local vars but then the arguments would be at unknown offsets.
We can arrange a complex stack frame (maybe with more than one pointer) but the key of the old fashioned frame base pointer was its simplicity.
So we can use the frame pointer to access the arguments on the stack and the stack pointer for the local variables, fair enough.
Alas the role of stack for arguments passing has been reduced and for a small number of arguments (currently four) it is not even used and in the future it will probably be used even less.
So we don't use the frame pointer for local variables (mostly), nor for the arguments (mostly), for what do we use it?
It saves a copy of the original rsp, so to restore the stack pointer at function exit, a mov is enough. If the stack is aligned with an and, which is not invertible, an original copy is necessary.
Actually some ABIs guarantee that after the standard prologue the stack is aligned thereby allowing us to use the frame pointer as usual.
Some variables don't need alignment and can be accessed with an unaligned frame pointer, this is usually true for hand crafted code.
Some functions require more than four parameters.
Summary
The frame pointer is a vestigial paradigm from 16 bit programs that has proven itself still useful on 32 bit machines because of its simplicity and clarity when accessing local variables and arguments.
On 64 bit machines however the strict requirements vanish most of the simplicity and clarity, the frame pointer remains used in debug mode however.
On the fact that the frame pointer can be used to make fun things: it is true I guess, I've never seen such code but I can image how it would work.
I, however, focused on the housekeeping role of the frame pointer as this is the way I always have seen it.
All the crazy things can be done with any pointer set to the same value of the frame pointer, I give the latter a more "special" role.
VS2013 for example sometimes uses rdi as a "frame pointer", but I don't consider it a real frame pointer if it doesn't use rbp/ebp/bp.
To me the use of rdi means a Frame Pointer Omission optimization :)

Memory alignment today and 20 years ago

In the famous paper "Smashing the Stack for Fun and Profit", its author takes a C function
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
and generates the corresponding assembly code output
pushl %ebp
movl %esp,%ebp
subl $20,%esp
The author explains that since computers address memory in multiples of word size, the compiler reserved 20 bytes on the stack (8 bytes for buffer1, 12 bytes for buffer2).
I tried to recreate this example and got the following
pushl %ebp
movl %esp, %ebp
subl $16, %esp
A different result! I tried various combinations of sizes for buffer1 and buffer2, and it seems that modern gcc does not pad buffer sizes to multiples of word size anymore. Instead it abides the -mpreferred-stack-boundary option.
As an illustration -- using the paper's arithmetic rules, for buffer1[5] and buffer2[13] I'd get 8+16 = 24 bytes reserved on the stack. But in reality I got 32 bytes.
The paper is quite old and a lot of stuff happened since. I'd like to know, what exactly motivated this change of behavior? Is it the move towards 64bit machines? Or something else?
Edit
The code is compiled on a x86_64 machine using gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) like that:
$ gcc -S -o example1.s example1.c -fno-stack-protector -m32
What has changed is SSE, which requires 16 byte alignment, this is covered in this older gcc document for -mpreferred-stack-boundary=num which says (emphasis mine):
On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.
This is also backed up by the paper Smashing The Modern Stack For Fun And Profit which covers this an other modern changes that break Smashing the Stack for Fun and Profit.
Memory alignment of which stack alignment is just one aspect depends on the architecture. It is partly defined in the Applicaion Binary Interface of the language and a Procedure Call Standard (sometimes it is both in a single spec) for the architecture (CPU, it might even vary depending on platform) and also depends on the compiler/toolchain where the former documents leave room for variations.
The former two documents (names may vary) are mostly for the external interface between functions; they might leave internal structure to the toolchain. Howwever, that has to match the architecture. Normally the hardware requires a minimal alignment, but allows for a larger alignment for performance reasons (e.g.: byte-alignment minimum, but this would require multiple bus-cycles to read a 32 bit word, so the compiler uses a 32 bit alignment).
Normally, the compiler (following the PCS) uses an alignment optimal for the architecture and under control of optimization settings (optimize for speed or size). It takes into account not only the size of the object (aligned to its natural boundary), but also sizes of internal busses (e.g. a 32 bit x86 has internal 64 or 128 bit busses, ARM CPUs have internal 32 to 128 (possibly even wider) bit busses), caches, etc. For local variables, it may also take into account access-patterns, so two adjacent variables may be loaded in parallel into a register pair instead of two separate loads or even reorder such variables.
The stackpointer might require a higher alignment for instance, so the CPU can push in an interrupt frame two registers at once, push vector registers which require higher alignment, etc. You can write quite a thick book about this subject (and I bet, someone already has).
So, in general, there is no single one-alignment-fits all rule. However, for struct and array packing, the C standard does define some rules for packing/alignment, mostly to guarantee consistence of e.g. sizeof(type) and the address in an array (required for correct malloc()).
Even char arrays might be aligned for optimal cache layout. Note it is not only the CPU which might have caches, but also PCIe bridges, not to mention PCIe transfers themselves down to DRAM pages.
I have not tried that specific version of compiler or the distribution version you report. My guess would be the 16 is from byte alignment requirements on stack (i.e. all stack adjustments would be x byte aligned and x may be 16 for your invocation).
Note that variable alignment you seem to have started with, is slightly different from the above and is controlled by align markings on the variable in gcc. Try using those and you should see a difference.

How to use relative position in c/assembly?

It's said Position Independent Code only uses relative position instead of absolute positions, how's this implemented in c and assembly respectively?
Let's take char test[] = "string"; as an example, how to reference it by relative address?
In C, position-independent code is a detail of the compiler's implementation. See your compiler manual to determine whether it is supported and how.
In assembly, position-independent code is a detail of the instruction set architecture. See your CPU manual to find out how to read the PC (program counter) register, how efficient that is, and what the recommended best practices are in translating a code address to a data address.
Position-relative data is less popular now that code and data are separated into different pages on most modern operating systems. It is a good way to implement self-contained executable modules, but the most common such things nowadays are viruses.
On x86, position-independent code in principle looks like this:
call 1f
1: popl %ebx
followed by use of ebx as a base pointer with a displacement equal to the distance between the data to be accessed and the address of the popl instruction.
In reality it's often more complicated, and typically a tiny thunk function might be used to load the PIC register like this:
load_ebx:
movl 4(%esp),%ebx
addl $some_offset,%ebx
ret
where the offset is chosen such that, when the thunk returns, ebx contains a pointer to a designated special point in the program/library (usually the start of the global offset table), and then all subsequent ebx-relative accesses can simply use the distance between the desired data and the designated special point as the offset.
On other archs everything is similar in principle, but there may be easier ways to load the program counter. Many simply let you use the pc or ip register as an ordinary register in relative addressing modes.
In pseudo code it could look like:
lea str1(pc), r0 ; load address of string relative to the pc (assuming constant strings, maybe)
st r0, test ; save the address in test (test could also be PIC, in which case it could be relative
; to some register)
A lot depends on your compiler and CPU architecture, as the previous answer stated. One way to find out would be to compile with the appropriate flags (-PIC -S for gcc) and look at the assembly language you get.

pointers seen on assembly or registries

How does a pointer look like in assembly, I know a instruction like 'mov' for lets say a pic, is converted to a sequence of bits, these bits activate the circuits to do the job, but a pointer, how is it managed to assembly and then manage to control circuits?.
a simngle pointer is transformed in several assembly instructions, how do they look like?
Pointer is nothing but a unsigned integer indicating position in a memory (virtual address space to be precise).
The instruction
mov eax,[ebp]
Moves value stored in a memory location whose address is stored in ebp into eax. Here ebp is a pointer.
Coming back to your question. Code is also data stored somewhere in memory and address of that memory is pointer. So using [] for dereferencing the pointer we can get that instruction(as done in above statement) and then cpu can interpret the code and execute.
Actually on 32bit x36 machines register eip stores a pointer which points to the memory which got current instruction which is getting executed.
The main feature of assembly is that a pointer can be present as label so that assemblers compiler can translate it to instructions op-code.
Actually any pointer is composed from one or more unsigned integers. At 8 bit PIC16 MCPUs you must first set appropriate memory bank, after that you can write in to memory, the reason is that address op-code size is only 7-bit in case that you are using direct memory addressing. You can also use indirect memory addressing in that case you must use FSR register that is composed from two 8-bit registers FSRL and FSRH. After setting pointer in FSR register you can read the result in INDF register as byte of that address.
Pointers are memory offsets. They look like integers with a width appropriate for the addressable size of the memory segment.

Resources