ARM assembly pointers to pointers?

ARM assembly pointers to pointers? - c

Suppose I am given as input to a function foo some pointer *pL that points to a pointer to a struct that has a pointer field next in it. I know this is weird, but all I want to implement in assembly is the line of code with the ** around it:
typedef struct CELL *LIST;
struct CELL {
int element;
LIST next;
};
void foo(LIST *pL){
**(*pL)->next = NULL;**
}
How do I implement this in ARM assembly? The issue comes from having nested startements when I want to store such as:
.irrelevant header junk
foo:
MOV R1, #0
STR R1, [[R0,#0],#4] #This is gibberish, but [R0,#0] is to dereference and the #4 is to offeset that.

The sequence would be similar to:
... ; r0 = LIST *pL = CELL **ppC (ptr2ptr2cell)
ldr r0,[r0] ; r0 = CELL *pC (ptr2cell)
mov r1,#0 ; r1 = NULL
str r1,[r0,#4] ; (*pL)->next = pC->next = (*pC).next = NULL

The correct sequence would be (assuming ARM ABI and LIST *pL is in R0),
.global foo
foo:
ldr r0, [r0] # get *pL to R0
mov r1, #0 # set R1 to zero.
str r1, [r0, #4] # set (*pL)->List = NULL;
bx lr # return
You can swap the first two assembler statements, but it is generally better to interleave ALU with load/store for performance. With the ARM ABI, you can use R0-R3 without saving. foo() above should be callable from 'C' with most ARM compilers.
The 'C' might be simplified to,
void foo(struct CELL **pL)
{
(*pL)->next = NULL;
}
if I understand correctly.

Related

Why does writing to a bitfield-uint union by reference yield wrong assembly instruction?

First, some background:
This issue popped up while writing a driver for a sensor in my embedded system (STM32 ARM Cortex-M4).
Compiler: ARM NONE EABI GCC 7.2.1
The best solution to representing the sensor's internal control register was to use a union with a bitfield, along these lines
enum FlagA {
kFlagA_OFF,
kFlagA_ON,
};
enum FlagB {
kFlagB_OFF,
kFlagB_ON,
};
enum OptsA {
kOptsA_A,
kOptsA_B,
.
.
.
kOptsA_G // = 7
};
union ControlReg {
struct {
uint16_t RESERVED1 : 1;
FlagA flag_a : 1;
uint16_t RESERVED2 : 7;
OptsA opts_a : 3;
FlagB flag_b : 1;
uint16_t RESERVED3 : 3;
} u;
uint16_t reg;
};
This allows me to address the register's bits individually (e.g. ctrl_reg.u.flag_a = kFlagA_OFF;), and it allows me to set the value of the whole register at once (e.g. ctrl_reg.reg = 0xbeef;).
The problem:
When attempting to populate the register with a value fetched from the sensor through a function call, passing the union in by pointer, and then update only the opts_a portion of the register before writing it back to the sensor (as shown below), the compiler generates an incorrect bitfield insert assembly instruction.
ControlReg ctrl_reg;
readRegister(&ctrl_reg.reg);
ctrl_reg.opts_a = kOptsA_B; // <-- line of interest
writeRegister(ctrl_reg.reg);
yields
ldrb.w r3, [sp, #13]
bfi r3, r8, #1, #3 ;incorrectly writes to bits 1, 2, 3
strb.w r3, [sp, #13]
However, when I use an intermediate variable:
uint16_t reg_val = 0;
readRegister(&reg_val);
ControlReg ctrl_reg;
ctrl_reg.reg = reg_val;
ctrl_reg.opts_a = kOptsA_B; // <-- line of interest
writeRegister(ctrl_reg.reg);
It yields the correct instruction:
bfi r7, r8, #9, #3 ;sets the proper bits 9, 10, 11
The readRegister function does nothing funky and simply writes to the memory at the pointer
void readRegister(uint16_t* out) {
uint8_t data_in[3];
...
*out = (data_in[0] << 8) | data_in[1];
}
Why does the compiler improperly set the starting bit of the bitfield insert instruction?

I am not a fan of bitfields, especially if you're aiming for portability. C leaves a lot more unspecified or implementation-defined about them than most people seem to appreciate, and there are some very common misconceptions about what the standard requires of them as opposed to what happens to be the behavior of some implementations. Nevertheless, that's mostly moot if you're writing code for a specific application only, targeting a single, specific C implementation for the target platform.
In any case, C allows no room for a conforming implementation to behave inconsistently for conforming code. In your case, it is equally valid to set ctrl_reg.reg through a pointer, in function readRegister(), as to set it via assignment. Having done so, it is valid to assign to ctrl_reg.u.opts_a, and the result should read back correctly from ctrl_reg.u. It is also permitted to afterward read ctrl_reg.reg, and that will reflect the result of the modification.
However, you are making assumptions about the layout of the bitfields that are not supported by the standard. Your compiler will be consistent, but you need to carefully verify that the layout is actually what you expect, else going back and forth between the two union members will not produce the result you want.
Nevertheless, the way you store a value in ctrl_reg.reg is immaterial with respect to the effect that assigning to the bitfield has. Your compiler is not required to generate identical assembly for the two cases, but if there are no other differences between the two programs and they exercise no undefined behavior, then they are required to produce the same observable behavior for the same inputs.

It is 100% correct compiler generated code
void foo(ControlReg *reg)
{
reg -> opts_a = kOptsA_B;
}
void foo1(ControlReg *reg)
{
volatile ControlReg reg1;
reg1.opts_a = kOptsA_B;
}
foo:
movs r2, #1
ldrb r3, [r0, #1] # zero_extendqisi2
bfi r3, r2, #1, #3
strb r3, [r0, #1]
bx lr
foo1:
movs r2, #1
sub sp, sp, #8
ldrh r3, [sp, #4]
bfi r3, r2, #9, #3
strh r3, [sp, #4] # movhi
add sp, sp, #8
bx lr
As you see in the function 'foo' it loads only one byte (the second byte of the union) and the field is stored in 1 to 3 bits of this byte.
As you see in the function 'foo1' it loads half word (the whole structure) and the field is stored in 9 to 11 bits of the halfword.
Do not try to find errors in the compilers because they are almost always in your code.
PS
You do not need to name struct and the padding bitfields
typedef union {
struct {
uint16_t : 1;
uint16_t flag_a : 1;
uint16_t : 7;
uint16_t opts_a : 3;
uint16_t flag_b : 1;
uint16_t : 3;
};
uint16_t reg;
}ControlReg ;
EDIT
but if you want to make sure that the whole structure (union) is modified just make the function parameter volatile
void foo(volatile ControlReg *reg)
{
reg -> opts_a = kOptsA_B;
}
foo:
movs r2, #1
ldrh r3, [r0]
bfi r3, r2, #9, #3
strh r3, [r0] # movhi
bx lr

VLA prototype and multidimensional array argument

I created a C99 VLA function as such :
void create_polygon(int n, int faces[][n]);
I want to call this function in another function where I would allocate my two-dimensional array :
void parse_faces()
{
int faces[3][6];
create_polygon(6, faces);
}
When I pass a two-dimensional array as an argument, it passes a pointer to a 6 integer array, referencing the stack memory in the calling function.
The VLA argument here only acts as a type declaration (not allocating any actual memory), telling the compiler to access the data in row-major order with ((int*)faces)[i * 6 + j] instead of faces[i][j].
What is the difference between declaring functions with a VLA argument or with a fixed size ?

faces[i][j] always is equivalent to *(*(faces + i) + j), no matter if VLA or not.
Now let's compare two variants (not considering that you actually need the outer dimension as well to prevent exceeding array bounds on iterating):
void create_polygon1(int faces[][6]);
void create_polygon2(int n, int faces[][n]);
It doesn't matter if array passed to originally were created as classic array or as VLA, first function accepts arrays of length of exactly 6, second can accept arbitrary length array (assuming this being clear so far...).
faces[i][j] will now be translated to:
*((int*)faces + (i * 6 + j)) // (1)
*((int*)faces + (i * n + j)) // (2)
Difference yet looks marginal, but might get more obvious on assembler level (assuming all variables are yet stored on stack; assuming sizeof(int) == 4):
LD R1, i;
LD R2, j;
MUL R1, R1, 24; // using a constant! 24: 6 * sizeof(int)!
MUL R2, R2, 4; // sizeof(int)
ADD R1, R2, R2; // index stored in R1 register
LD R1, i;
LD R2, j;
LD R3, m; // need to load from stack
MUL R3, R3, 4; // need to multiply with sizeof(int) yet
MUL R1, R1, R3; // can now use m from register R3
MUL R2, R2, 4; // ...
ADD R1, R2, R2; // ...
True assembler code might vary, of course, especially if you use a calling convention that allows passing some parameters in registers (then loading n into into R3 might be unnecessary).
For completeness (added due to comments, unrelated to original question):There's yet the int* array[] case: Representation by array of pointers to arrays.
*((int*)faces + (i * ??? + j))
doesn't work any more, as faces in this case is no contiguous memory (well, the pointers themselves are in contiguous memory, of course, but not all the faces[i][j]). We are forced to do:
*(*(faces + i) + j)
as we need to dereference the true pointer in the array before we can apply the next index. Assembler code for (for comparison, need a more complete variant of the pointer to 2D-array case first):
LD R1, faces;
LD R2, i;
LD R3, j;
LD R4, m; // or skip, if no VLA
MUL R4, R4, 4; // or skip, if no VLA
MUL R2, R2, R3; // constant instead of R3, if no VLA
MUL R3, R3, 4;
ADD R2, R2, R3; // index stored in R1 register
ADD R1, R1, R2; // offset from base pointer
LD R1, [R1]; // loading value of faces[i][j] into register
LD R1, faces;
LD R2, i;
LD R3, j;
MUL R2, R2, 8; // sizeof(void*) (any pointer)
MUL R3, R3, 4; // sizeof(int)
ADD R1, R1, R2; // address of faces[i]
LD R1, [R1]; // now need to load address - i. e. de-referencing faces[i]
ADD R1, R1, R3; // offset within array
LD R1, [R1]; // loading value of faces[i][j] into register

I disassembled this code :
void create_polygon(int n, int faces[][6])
{
int a = sizeof(faces[0]);
(void)a;
}
With VLA argument :
movl %edi, -4(%rbp) # 6
movq %rsi, -16(%rbp) # faces
movl %edi, %esi
shlq $2, %rsi # 6 << 2 = 24
movl %esi, %edi
With fixed size :
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $24, %edi # 24
As Aconcagua pointed out, in the first example using a VLA, the size is computed at run time by multiplying the size of an int by the size of the second dimension, which is the argument stored in rsi, then moved into edi.
In the second example, the size is directly computed at compile time and placed into edi. The main advantage being the ability to check an incorrect pointer type argument if passing a different size, thus avoiding a crash.

Different gcc assembly when using designated initializers

I was checking some gcc generated assembly for ARM and noticed that I get strange results if I use designated initializers:
E.g. if I have this code:
struct test
{
int x;
int y;
};
__attribute__((noinline))
struct test get_struct_1(void)
{
struct test x;
x.x = 123456780;
x.y = 123456781;
return x;
}
__attribute__((noinline))
struct test get_struct_2(void)
{
return (struct test){ .x = 123456780, .y = 123456781 };
}
I get the following output with gcc -O2 -std=C11 for ARM (ARM GCC 6.3.0):
get_struct_1:
ldr r1, .L2
ldr r2, .L2+4
stm r0, {r1, r2}
bx lr
.L2:
.word 123456780
.word 123456781
get_struct_2: // <--- what is happening here
mov r3, r0
ldr r2, .L5
ldm r2, {r0, r1}
stm r3, {r0, r1}
mov r0, r3
bx lr
.L5:
.word .LANCHOR0
I can see the constants for the first function, but I don't understand how get_struct_2 works.
If I compile for x86, both functions just load the same single 64-bit value in a single instruction.
get_struct_1:
movabs rax, 530242836987890956
ret
get_struct_2:
movabs rax, 530242836987890956
ret
Am I provoking some undefined behavior, or is this .LANCHOR0 somehow related to these constants?

Looks like gcc shoots itself in the foot with an extra level of indirection after merging the loads of the constants into an ldm.
No idea why, but pretty obviously a missed optimization bug.
x86-64 is easy to optimize for; the entire 8-byte constant can go in one immediate. But ARM often uses PC-relative loads for constants that are too big for one immediate.

Kernel breaks on adding new code (that never runs)

I'm trying to add some logic at boundaries between userspace and kernelspace particularly on the ARM architecture.
One such boundary appears to be the vector_swi routine implemented in arch/arm/kernel/entry-common.S. Right now, I have most of my code written in a C function which I would like to call somewhere at the start of vector_swi.
Thus, I did the following:
ENTRY(vector_swi)
sub sp, sp, #S_FRAME_SIZE
stmia sp, {r0 - r12} # Calling r0 - r12
ARM( add r8, sp, #S_PC )
ARM( stmdb r8, {sp, lr}^ ) # Calling sp, lr
THUMB( mov r8, sp )
THUMB( store_user_sp_lr r8, r10, S_SP ) # calling sp, lr
mrs r8, spsr # called from non-FIQ mode, so ok.
str lr, [sp, #S_PC] # Save calling PC
str r8, [sp, #S_PSR] # Save CPSR
str r0, [sp, #S_OLD_R0] # Save OLD_R0
zero_fp
#ifdef CONFIG_BTM_BOUNDARIES
bl btm_entering_kernelspace # <--- My function
#endif
When the contents of my function are as follows everything works fine:
static int btm_enabled = 0;
asmlinkage inline void btm_entering_kernelspace(void)
{
int cpu;
int freq;
struct acpu_level *level;
if(!btm_enabled) {
return;
}
cpu = smp_processor_id();
freq = acpuclk_krait_get_rate(cpu);
(void) cpu;
(void) freq;
(void) level;
}
However, when I add some additional code, the kernel enters into a crash-reboot loop.
static int btm_enabled = 0;
asmlinkage inline void btm_entering_kernelspace(void)
{
int cpu;
int freq;
struct acpu_level *level;
if(!btm_enabled) {
return;
}
cpu = smp_processor_id();
freq = acpuclk_krait_get_rate(cpu);
(void) cpu;
(void) freq;
(void) level;
// --------- Added code ----------
for (level = drv.acpu_freq_tbl; level->speed.khz != 0; level++) {
if(level->speed.khz == freq) {
break;
}
}
}
Although the first instinct is to blame the logic of the added code, please note that none of it should ever execute since btm_enabled is 0.
I have double-checked and triple-checked to make sure btm_enabled is 0 by adding a sysfs entry to print out the value of the variable (with the added code removed).
Could someone explain what is going on here or what I'm doing wrong?

The first version will probably compile to just a return instruction as it has no side effect. The second needs to load btm_enabled and in the process overwrites one or two system call arguments.
When calling a C function from assembly language you need to ensure that registers that may be modified do not contain needed information.
To solve your specific problem, you could update your code to read:
#ifdef CONFIG_BTM_BOUNDARIES
stmdb sp!, {r0-r3, r12, lr} # <--- New instruction
bl btm_entering_kernelspace # <--- My function
ldmia sp!, {r0-r3, r12, lr} # <--- New instruction
#endif
The new instructions store registers r0-r3, r12 and lr onto the stack and restore them after your function call. These are the only registers a C function is allowed to modify, saving r12 here is unnecessary here is it's value is not used, but doing so keeps the stack 8-byte aligned as required by the ABI.

Inline Assembly: Passing pointers to a function and using it in that function in assembly

I'm using ARM/Cortex-A8 processor platform.
I have a simple function where I have to pass two pointers to a function. These pointers are later used in that function which has only my inline assembly code This plan is only to achieve performance.
function(unsigned char *input, unsigned char *output)
{
// What are the assembly instructions to use these two pointers here?
// I will inline the assembly instructions here
}
main()
{
unsigned char input[1000], output[1000];
function(input, output);
}
Thanks

Assuming you're using a normal ARM ABI, those two parameters will be passed in R0 and R1. Here is a quick example showing how to copy the bytes from the input buffer to the output buffer (gcc syntax):
.text
.globl _function
_function:
mov r2, #0 // initialize loop counter
loop:
ldrb r3, [r0, r2] // load r3 with input[r2]
strb r3, [r1, r2] // store r3 to output[r2]
add r2, r2, #1 // increment loop counter
cmp r2, #1000 // test loop counter
bne loop
mov pc, lr

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight