Aligning a Stack pointer 8 byte from 4 byte in ARM assembly - c

How do I align a stack pointer to 8 byte which is now 4 byte aligned in ARM .As per my understanding stack pointer is 4 byte aligned if it points to some address like 0x4 ,0x8,0x12 and 0x16 so on.
So ,aliging a stack pointer to 8 byte means it should point to addresses like 0x8 ,0x16 ,0x24 and 0x32 and so on.
Now how Do I aligned 4 byte stack pointer to 8 byte aligned pointer?

Don't try to align sp manually yourself, instead push one more register to get alignment. For example instead of
push {r3, r4, lr}
add one more register to the list to get alignment to 8 easily.
push {r1, r3, r4, lr}
This may feel like extra memory access but in general caches works with wider bit vectors than native word sizes.
Another note is also, you don't need to force yourself to get stack alignment right if you are not doing external calls or receiving. So if you have closed box assembly routine which doesn't make calls to external world or receive some, you can live with broken stack alignment as long as it doesn't bite your own loadings.

To move a pointer up to the nearest 8 byte boundary, but leave it unmodified if it's already a multiple of 8 (pseudo-code - you'll need to add some casts if doing this in C):
p = (p + 7) & ~7;
or similarly to move it down to the nearest 8 byte boundary:
p = p & ~7;

Due to the decreasing stack
bic sp, sp, #7
should suffice. With EABI, you can use r12 or r0-r3 to (re)store the previous value.
All this should be done only in assembly; within C you can rely on correctly aligned stack pointer and trying to change it there will probably crash your program.
Compilers take care about correct alignment; misaligned stacks can happen when calling interrupts. Some CPUs (e.g. Cortex-M3) have special registers (STKALIGN) which can be used to enter irqs with 8-bit stack alignment.

If you are writing leaf functions(no subroutine calls), don't bother.
You are perfectly fine with a 4-byte aligned SP since this requirement is due to ldrd and strd instructions that need the address being a multiple of eight.
Therefore, if the function you are writing doesn't call any subroutine unknown to you, there really is no need for that. (ldrd and strd are so rarely used anyway)
The SP is already 8-byte aligned when your function is called from a higher level language anyway.
If you want the SP to be 8-byte aligned, either don't touch it, or preserve only even number of registers.

If you are in a situation where you don't control the value of SP that you got, and you want to align SP on 8 bytes boundary (for example, to call a subroutine), then the following sequence does that, without using any other registers:
; Check if SP is aligned on 8 bytes boundary.
tst sp, #0x7
; If SP is aligned on 8 bytes boundary, then we skip a word on the stack
; and then save SP. This consumes 8 bytes on the stack but keeps SP
; aligned on 8 bytes boundary.
streq sp, [sp, #-8]!
; If SP is aligned on 4 bytes boundary, then we save SP. This consumes
; 4 bytes on the stack and also aligns SP on 8 bytes boundary.
strne sp, [sp, #-4]!
; Here, SP is aligned on 8 bytes boundary, and the previous value of SP
; is stored on the top of the stack.
; For example, let's call some subroutine...
blx lr
; In order to restore the original value of SP, just load the value
; at the top of the stack.
ldr sp, [sp]
Notice that the code above assumes that:
You're running either in ARM 32-bits mode (e.g., ARMv5, ARMv6, ARMv7, AArch32,...).
SP is be aligned at least on 4 bytes boundary, which is usually the case as the stack is viewed as an array of words.

Related

Why stack grows by 16 bytes in this disassembly, when I only have one 4 byte local variable?

I'm having trouble understanding why the compiler chose to offset the stack space in the way it did with the code I wrote.
I was toying with Godbolt's Compiler Explorer in order to study the C calling convention, when I came up with a simple code that puzzled me by its choices.
The code is found in this link. I selected GCC 8.2 x86-64, but am targetting x86 processors and this is important. Bellow is the transcription of the C code and the generated assembly reported by the Compiler Explorer.
// C code
int testing(char a, int b, char c) {
return 42;
}
int main() {
int x = testing('0', 0, '7');
return 0;
}
; Generated assembly
testing(char, int, char):
push ebp
mov ebp, esp
sub esp, 8
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+16]
mov BYTE PTR [ebp-4], dl
mov BYTE PTR [ebp-8], al
mov eax, 42
leave
ret
main:
push ebp
mov ebp, esp
sub esp, 16
push 55
push 0
push 48
call testing(char, int, char)
add esp, 12
mov DWORD PTR [ebp-4], eax
mov eax, 0
leave
ret
Looking at the assembly column from now on, as I understood, line 15 is responsible for reserving space in the stack for the local variables. The problem is that I have only one local int and the offset was by 16 bytes instead of 4. This feels like wasted space.
Is this somewhat related to word alignment? But even if it is, if the sizes of the general purpose registers are 4 bytes, shouldn't this alignment be with regards to 4 bytes?
One other strange thing I see is with respect to the placement of the local chars of the testing function. They seem to be taking 4 bytes each in the stack, as seen in lines 7-8, but only the lower bytes are manipulated. Why not use only 1 byte each?
These choices are probably well intended, and I would really like to understand their purpose (or whether there is no purpose). Or maybe I'm just confused and didn't quite get it.
So, by the comments, I could figure out that the stack growth issue is due to the i386 SystemV ABI requirements, as stated by #PeterCordes.
The reason why the chars are word aligned may be due to GCC's default behavior to improve speed, as maybe inferenced from #Ped7g's comment. Although not definite, this is a good enough answer for me.
It's common today to acquire stack space in multiples of this size, for several reasons:
cache lines favor this behaviour by maintaining the whole data in the cache.
space for temporaries is preallocated, avoiding push and pop instructions to be used in case some temporary storage is needed out of the cpu.
individual push and pop instructions degrade pipeline execution, by requiring data to be updated before next instruction is executed. This decouples the data dependencies between consecutive instructions and allow them to run faster.
For this reasons, actual compilers specify ABIs to be designed in this way.

Stack Pointer reading incorrect value from register

Why is Stack-pointer register not reading correct value from another register? When I move a value from register (r0) to stack pointer (r13), the SP reads incorrect value.
This is what is mean:
MOV R0, 10
MOV R13, R0
In this case, "A" should move to R13 but instead it gets 8.
Similarly,
MOV R0, 9
MOV R13, R0
In this case R13 stores 8 instead of 9.
Here's a simple program program that demonstrates the problem,
void Init()
{
__asm(
"LDR R0, =0x3FFFFDA7\n"
"MOV R13, R0\n"
);
}
int main(void)
{
Init();
return (1);
}
void SystemInit(void)
{
}
Nothing much is going on here. Just a simple function call. Inside the function I moved the address to r0. Then I moved the address to R13(SP), but instead of actual address i.e. 0x3FFFFDA7, SP received 0x3FFFFDA4.
The images shows the disassembly,
So what is going on here? Why is Stack pointer Register reading incorrect values?
I am using ARM inline Assembly with C. The IDE is KEIL.
Thanks in Advance.
For those who might find this helpful.
Stack-Pointer for armv7 must be 4 bytes aligned. You can write there 0,4,8,12,16 etc but not 9,10,F etc.
So if you want to move any value to Stack-Pointer, make sure it is 4 bytes aligned.

How does the frame pointer work on the MSP430 and what does the `#llo` macro do?

Using the -S flag in gcc I created a assembly file from my C code and in order to better understand how memory is used. Here is some assembly from the top of the main function:
main:
mov r1, r4 ; FP = SP
add #2, r4 ; FP += 2
add #llo(-14), r1 ; SP -= 14 ?
mov #llo(-16), r15 ; ???
add r4, r15 ; r15 += FP
add #4, r15
Comments were placed by me as I tried to dissect what is happening. My question is the use of the #llo macro, and how the memory on the stack is being used, and lastly what is going into r15?
For context I have a variables including a structure being placed on the stack at the beginning of main that takes up 14 bytes (7 16bit words). What I don't understand is what is the #llo macro and what is r15 being used for? I know r4 is the frame pointer and r1 is the stack pointer.
The llo macro returns the lower 16 bits of it's argument. I guess that is needs it to avoid overflow when using a negative number (or the compiler is lazy).
It looks like the code computes the location of some object in R15. It's hard to tell with only part of the code... Also, if R4 isn't use more in the function, this code could be optimized a lot.
The line add #llo(-14), r1 allocates space on the stack.
It would be interesting to see what other compilers do with code like this (gcc for the MSP430 isn't really state of the art).

Confused about an ARM instruction

I can't figure out what this ARM instruction does:
strd.w r0, r1, [r2]
I know that it is a store instruction which stores something at *r2 but I'm not entirely sure what. Why are there two source registers (r0 and r1) and what does the d.w suffix mean?
This function stores the 64-bit contents of two 32-bit registers into memory. The 8-byte chunk is stored starting at the address held in r2. The first four bytes come from r0, the second four bytes from r1.
Roughly equivalent C would be:
int32 *ptr=(int32 *) r2;
*(ptr) = r0;
*(ptr+1) = r1; // 'ptr+1' adds four bytes to the memory position

What does 0x4 do in "movl $0x2d, 0x4(%esp)"?

I am looking into assembly code generated by GCC. But I don't understand:
movl $0x2d, 0x4(%esp)
In the second operand, what does 0x4 stands for? offset address? And what the use of register EAX?
movl $0x2d, 0x4(%esp) means to take the current value of the stack pointer (%esp), add 4 (0x4) then store the long (32-bit) value 0x2d into that location.
The eax register is one of the general purpose 32-bit registers. x86 architecture specifies the following 32-bit registers:
eax Accumulator Register
ebx Base Register
ecx Counter Register
edx Data Register
esi Source Index
edi Destination Index
ebp Base Pointer
esp Stack Pointer
and the names and purposes of some of then harken back to the days of the Intel 8080.
This page gives a good overview on the Intel-type registers. The first four of those in the above list can also be accessed as a 16-bit or two 8-bit values as well. For example:
3322222222221111111111
10987654321098765432109876543210
<- eax ->
<- ax ->
<- ah -><- al ->
The pointer and index registers do not allow use of 8-bit parts but you can have, for example, the 16-bit bp.
0x4(%esp) means *(%esp + 4) where * mean dereferencing.
The statement means store the immediate value 0x2d into some local variable occupying the 4th offset on the stack.
(The code you've shown is in AT&T syntax. In Intel syntax it would be mov [esp, 4], 2dh)
0x4 in the second operand is an offset from the value of the register in the parens. EAX is a general purpose register used for assembly coding (computations, storing temporary values, etc.) formally it's called "Accumulator register" but that's more historic than relevant.
You can read this page about the x86 architecture. Most relevant to your question are the sections on Addressing modes and General purpose registers
GCC assembly operands follow a byte (b), word (w), long (l) and so on such as :
movb
movw
movl
Registers are prefixed with a percentage sign (%).
Constants are prefixed with a dollar sign ($).
In the above example in your question that means the 4th offset from the stack pointer (esp).
Hope this helps,
Best regards,
Tom.
You're accessing something four bytes removed from where the stack pointer resides. In GCC this indicates a parameter (I think -- positive offset is parameters and negative is local variables if I remember correctly). You're writing, in other words, the value 0x2D into a parameter. If you gave more context I could probably tell you what was going on in the whole procedure.

Resources