optimize function's parameter in c (GCCarm)

optimize function's parameter in c (GCCarm) - c

i have limited RAM in my microController.(using arm gcc)
i should write my code as efficient as possible. consider function blew :
int foo(uint32_t a , float b)
{
.....
return 0;
}
so I have two arguments a and b. now if I change the arguments to *a and *b is the function takes less ram than the first function during execution :
int foo(uint32_t *a,float *b)
{
....
return 0;
}
what if the arguments become String or array ( of int, float ....)??
any references would be great. thanks

You can actually waste memory when using pointers. This is for 2 reasons.
1. Register optimization lost
Code which calls foo must take address of the variable to pass as parameter. If passed variable is local, it could be in register but because you take its address it must be placed on stack. In general, using variable in register is faster than using variable in stack.
2. Value of variable unknown after call
When you give address of variable function, compiler no longer knows if variable is modified, and must refresh it if it's read again.
uint32_t u = 1;
float f = 2.0f;
foo(&u, &f); // 1. Address taken, u and f cannot be register variables
// 2. What is value of u now? It must refreshed from memory before addition happens
u++;
Bottom line: do not take address of primitive types unless you have to.
Strings and arrays are already passed using address, so there is no other option.

ARM's Procedure Call Standard defined in Application Binary Interface (ABI) states that the first four word-sized parameters passed to a function will be transferred in registers R0-R3.
Since pointers are also 32-bit in size, there is not that much difference between these two signatures.
int foo(uint32_t a, float b)
int foo(uint32_t *a, float *b)
Both will have a in r0, b in r1 and return value in r0 again.
ARM is a RISC architecture it has many registers and with trivial functions you may even get away from touching any memory at all.
If you are working with micro controllers it is a better idea to check if you should really use floating points. Since they might not be natively supported by the core you are targeting for.
I would check ARM Cortex-A Series Programmer's Guide (most of it applies to micro controllers as well), especially chapters 15 and 17 to learn more about tricks for ARM application development.

Related

Struct variable passed by value vs. passed by pointer to a function

Let's say I have the following structure:
typedef struct s_tuple{
double x;
double y;
double z;
double w;
} t_tuple;
Let's say I have the two following functions:
t_tuple tuple_sub_values(t_tuple a, t_tuple b)
{
a.x -= b.x;
a.y -= b.y;
a.z -= b.z;
a.w -= b.w;
return (a);
}
t_tuple tuple_sub_pointers(t_tuple *a, t_tuple *b)
{
t_tuple c;
c.x = a->x - b->x;
c.y = a->y - b->y;
c.z = a->z - b->z;
c.w = a->w - b->w;
return (c);
}
Will there be a performance difference between the functions ? Is one of these better than the other ?
Basically, what are the pros and cons of passing by value vs. passing by pointer when all of the structure elements are called ?
Edit: Completely changed my structure and functions to give a more precise example
I found this post that is related to my question but is for C++: https://stackoverflow.com/questions/40185665/performance-cost-of-passing-by-value-vs-by-reference-or-by-pointer#:~:text=In%20short%3A%20It%20is%20almost,reference%20parameters%20than%20value%20parameters.
Context: My structures are not huge in this example, but I am coding a ray-tracer and some structs of size around 100B can be called millions of times so I'd like to try to optimize these calls. My structs are kind of imbricated so it would be a mess to copy them here, this is why I tried to ask my question on a kind of general example.

Getting to the core of the question: for optimal arg-passing/value-returning performance, you basically want to follow the ABI of your platform to try and make sure that things are in registers and stay in registers. If they aren't in registers and or cannot stay in registers, then passing larger-than-pointer-size data by pointer will likely save some copying (unless the copying would need to be done in the callee anyway: void pass_copy(struct large x){ use(&x); } could actually be a small bit better for codegen than void pass_copy2(struct large const*x){ struct large cpy=*x; use(&cpy);
}`).
The concrete rules for e.g., the sysv x86-64 ABI are a bit complicated (see the chapter on calling conventions).
But a short version might be: args/return-vals go through registers as long as their type is "simple enough" and appropriate argument passing registers are available (6 for integer vals and 6 for doubles). Structs of up to two eightbytes can go through registers (as arguments or a return value) provided they're "simple enough".
Supposing your doubles are already loaded in registers (or aren't aggregated into t_tuples that you could point the callee to), the most efficient way to pass them on x86-64 SysV ABI would be individually or via structs of two doubles each, but you'd still need to return them via memory because the ABI can only accommodate two-double retvals with registers, not 4-double retvals. If you returned a fourdouble, the compiler would stack-alloc memory in the caller, and pass a pointer to it as a hidden first argument and then return a pointer to the allocated memory (under the covers). A more flexible approach would be to not return such a large aggregate but instead explicitly pass a pointer to a struct-to-be-filled. That way the struct can be anywhere you want it (rather then auto-alloced on the stack by the compiler).
So something like
void tuple_sub_values(t_tuple *retval,
t_twodoubles a0, t_twodoubles a1,
t_twodoubles b0, t_twodoubles b1);
would a better API for avoiding memory spillage on x86-64 SysV ABI (Linux, MacOS, BSDs...).
If your measurements show the codesize savings / performance boost to be worth it for you, you could wrap it in an inline function that'd do the struct-splitting.

When it comes to performance, that will most likely be implementation specific for reasons going far away from this post, but most likely we're talking about microseconds at the worst case. Now when it comes to the pros and cons:
Passing by value will only give you a copy of that struct, and modifications will be local only. In other words, your function will receive an entirely new copy of the struct, and it will only be able to modify that copy.
In contrast, passing by reference gives you the ability to modify the given struct directly from the function, and is often seen when multiple values need to be returned from a function.
It's entirely up to you to choose which one works for your case. But to add some extra help:
Passing by reference will reduce the function call overhead because you won't have to copy 32 bytes from scratch to the new function. It will also help significantly if you're planning to keep memory footprint low, if you plan to call the function multiple times. Why? Because instead of creating multiple different structs for those calls, you simply tell every call to reuse the same struct. Which is mainly seen in games, where structs may be thousands of bytes large.

Life cycle of a variable

I am not clear for how long a variable is guaranteed to be allocated in C.
For example, if I have:
void foo(void) {
int x;
int* y = &x;
...
}
Is the space allocated on the stack for x guaranteed to be reserved for this variable exclusively for the entire duration of foo()? Said differently, is y guaranteed to point to a location that will be preserved for the entire duration of foo, or could the compiler decide that since x isn't being used, the stack space can be used for another use within foo and therefore *y may change without accessing y (or x) directly?

When you ask questions like this, you should be clear whether you are asking about C semantics or about program implementation.
C semantics are described using a model of an abstract computer in which all operations are performed as the C standard describes them. When a compiler compiles a program, it can change how the program is implemented as long as it gets the same results. (The results that must be correct are the observable behavior of the program: its output, including data written to files, its input/output interactions, and its accesses to volatile objects.)
In the abstract computer, memory for x is reserved from the time an execution of foo starts until that execution of foo ends.1, 2
So, in the abstract computer, it does not matter if x is used or not; memory is reserved for it until foo returns or its execution is ended in some other way (such as a longjmp or program termination).
When the compiler implements this program, it is allowed optimize away x completely (if it and its address are not used in any way that requires the memory to be reserved) or to use the same memory for x that it uses for other things, as long as the uses do not conflict in ways that change the observable behavior. For example, if we have this code:
int x;
int *y = &x;
x = 3;
printf("%d\n", x);
int b = 4;
printf("%d\n", b);
then the compiler may use the same memory for b that it uses for x.
On the other hand, if we have this code:
int x;
int *y = x;
printf("%p\n", (void *) y);
int b = 4;
printf("%p\n", (void *) &b);
then the program must print different values for the two printf statements. This is because different objects that both exist at the same moment in the abstract computer model must have different addresses. The abstract computer would print different addresses for these, so the compiler must generate a program that is faithful to that model.
Footnotes
1 There can be multiple executions of a function live at one time, due to nested function calls.
2 Sometimes people say the lifetime of x is the scope of the function, but this is incorrect. The function could call another routine and pass it y, which has the address of x. Then the other routine can access x using this address. The memory is still reserved for x even though it is not in the scope of the other routine’s source code. During the subroutine call, the execution of foo is temporarily suspended, but it is not ended, so the lifetime of x has not ended.

The lifetime of an automatic variable is the entire duration of the scope in which it is declared; in your case, that scope is the whole of the foo function.
Compilers are allowed to make optimizations (including removing variables completely) that can have no possible observable effect; however, once you assign the address of x to y, then any use of *y will be using x, so the memory allocated for x cannot then be used for something else, all the time there is a possibility of accessing or modifying *y.

x is being used, y is being passed the address of it! In short the answer is "yes" as long as the compiler author(s) is(are) sensible. Most compilers ( visual studio at least ) wouldn't compile this or at least warn that x is uninitialized so this isn't a very realistic example.
Y most definitely cannot change by changing another variable than x or y. that's 100%. When you go into a function the parameters then the local variables are pushed onto the stack and then when you come out of a function they are popped off. There is no scope for shared memory (unless you are using a union).
Whats the reason behind this question? If you really want to know how c is defined you should read "The C Programming Language" by Kernighan and Ritchie"

Assembly's function parameter

I'm actually a beginner in assembly (Nios II) and I know that a functions parameters are stored in the registers (r4 -> r7)
But I wonder if f these registers contain the actual value of the parameter or it's adress ?
for example the C function :
int add (int x, int y) {}
Does r4 contain 'x' or '&x' ?

Here's the ABI for Nios II:
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/nios2/n2cpu_nii51016.pdf
From the table, we can tell that arguments are passed indeed in registers r4-r7, and each one of them holds 32 bits. From the same document we learn that int is 4 bytes. That means that x will be passed in r4. &x is not passed here, as this is call-by-value. If you want to access the address of x, good compiler will try first to see if it's ever needed, and only after giving up, will allocate memory on the stack frame.

How does the structure work in C internally? How is a data copy made from one structure to another?

Well, I know how the structure works in C, but I don't know how it works internally, because I'm still learning assembly, I'm at the beginning, well, my question is, in the code below I have a structure called P and create two variables from From this structure called A and B, after assigning A to B, thus being B = A, I can get the data from A, even without using a pointer, how is this copy of the data from A to B made?
#include <stdio.h>
struct P{
int x;
int y;
}A, B;
int main(void) {
printf("%p\n%p\n\n", &A, &B);
printf("Member x of a: %p\nMember y of a: %p\n", &A.x, &A.y);
printf("Member x of b: %p\nMember y of b: %p\n", &B.x, &B.y);
A.x = 10;
A.y = 15;
B = A; // 10
printf("%d\n%d\n", B.x, B.y);
return 0;
}

The interesting thing in your sample code, I think, is the line
B = A;
Typically, the compiler implements this in one of two ways.
(1) It copies the members individually, giving more or less exactly the same effect as if you had said
B.x = A.x;
B.y = A.y;
(2) It emits a low-level byte-copy loop (or machine instruction), giving the effect of
memcpy(&B, &A, sizeof(struct P));
(except that typically this is done in-line, with no actual function call involved).
The compiler will choose one or the other of these based on which one is smaller (less emitted code), or which one is more efficient, or whatever the compiler is trying to optimize for.

Your example limits what the compiler can do, basically mandating that the struct exist in memory. First, you're instructing the compiler to create A & B as globals, and, second, you are taking the address of the struct (and its fields) for your printf statement. Due to either of these, the compiler will choose memory as the placement for these structs.
However, since they are each only two int's in size, copy between them would take only two mov instructions (some architectures) or two loads and two stores (other architectures).
Yet, if you were working with these structs as local variables and/or parameters as is commonly done with these kind of small structs — and provided you did not take their addresses — these would frequently be optimized by the compiler to place the entire struct into the cpu registers. For example, A.x might get a cpu register, and A.y also its own register. Now, a copy or pass as of A as parameter (which is like an assignment) is just a pair of register movs (if even that is required, as the compiler might choose the proper registers in the first place). In other words, unless the user program forces the struct to memory, the compiler has the freedom to treat the struct as a pair of rather separate int's. So, by contrast, potentially rather different and more efficient.
The compiler can also do other kinds of optimizations, one involving remembering the constant values that were assigned (as so do constant assigns again with B instead of copies from A's memory), and another involving eliminating A and the assignments to A and doing assignments directly to B, as A is merely copied into B and not used later. Among other things, to reiterate from above, having the structs be local variables helps some of these optimizations as does not taking their addresses.

How to assign a variable to an absolute address in c?

I am trying to access the PMC_PCER0 and enable it for PID14 on an ARM Cortex M3. I am asked to make a void function that reads a button and "returns" (as my professor insists to call it) its value. Anyway, here is the problem:
void readButton( unsigned int *button)
{
do something yo;
}
I have an address for PMC_PCER0 and let's suppose for the sake of the question it is at 0xfff123da and this PMC_PER0 has 32 PIOs, one of which is PID and happens to start at the 14th place, so I need to activate this one.
In order to activate it, do I need to mask PMC_PCER0 using the `or operator?
I know I can define PMC_PCER0 as follows
#define PMC_PCER0 (0xfff123da)
However, this will just give PMC_PCER0 the value of 0xfff123da, but what I want to do is PMC_PCER0 to actually have that address. And later I want to mask it. I'd appreciate it if you explained it in details.

so how do you load or store something at some address in C? You need an array or pointer right? How do you assign a pointer an address?
unsigned int *p;
...
p = (unsigned int *)0x12345678;
And then *p = 10; will write a 10 to that address right? or p[0] = 10;, elementary C language programming, has nothing to do with microcontrollers or operating systems.
The problem you end up with though is optimizers. if *p or p[0] is not used later then there is no reason at all to generate code. even if it does generate code there is no guarantee that it actually does the store or load
void myfun ( unsigned int x )
{
unsigned int *p;
p = (unsigned int *)0x12345678;
*p = x;
}
How do you tell the compiler or how do you use the language to ask the compiler to actually do a memory operation?
volatile unsigned int *p;
p = (unsigned int *)0x12345678;
so try making your define look something like this
#define PMC_PCER0 (*((volatile unsigned int *)0xfff123da))
AND understand that that is still not a guarantee that the 1) compiler will do a bus operation 2) that the bus operation will be the size you desire. if you want to guarantee such things then instead
#define PMC_PCER0 (0xfff123da)
and make a small asm file to link into the project, or put this in your bootstrap.
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.thumb_func
.globl GET32
GET32:
ldr r0,[r0]
bx lr
and then use it
void myfun ( unsigned int x )
{
PUT32(PMC_PCER0,x);
}
This has a performance cost, but it has significant benefits as well, first off being the compiler if remotely compliant must perform the function calls in order and all of them. so you are insured you get your load or store and you are insured it is of the right size. Second all of your accesses to peripherals and other special addresses are controlled through an abstraction layer that when placed on a chip simulator or on top of a operating system, or when doing your own testing against a handmade test bench (emulator) you already have an abstraction layer at the C function level that is easy to port. if you want to debug what is going on, you can use this abstraction layer to insert breakpoints or printfs or whatever.
Take it or leave it, it took me years to trip up the compiler into not generating the right instructions using the volatile trick. If you dont learn at least a little assembly language, and dont regularly disassemble the tool produced code, you will struggle more than necessary when something goes wrong like the wrong instruction being generated by the compiler or items being loaded to the wrong places by the linker (chip doesnt boot, program hangs, etc). THEN with that how to move the toolchain past the problem.
Yes I know your desired function is a load not a store, I assume you can figure it out from here.
This was all elementary C language programming stuff, possibly why nobody wanted to jump in and answer. Also the libraries that come free for the microcontroller you are using and for all other families and brands uses these kinds of tricks although some of them are a bit scary so take them with a grain of salt (or use them as a reference and not necessarily directly as you may end up owning their issues and maintenance).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

optimize function's parameter in c (GCCarm) - c

Related

Struct variable passed by value vs. passed by pointer to a function

Life cycle of a variable

Assembly's function parameter

How does the structure work in C internally? How is a data copy made from one structure to another?

How to assign a variable to an absolute address in c?

Categories

Resources