C pointer argument vs local copy - c

Is it more cache friendly to use a local copy of an argument than using the pointer argument directly? Will the overhead of copying outweigh the performance gained?
I guess it depends on the size of the copied data.
void foo(struct data *p) {
/* Do stuff with p ... */
}
// VS
void bar(struct data *p) {
struct data copy = *p;
/* Do stuff with copy ... */
*p = copy;
}
I'm asserting that "Do stuff" pushes a lot of other local variables on the stack.
Edit: Also, the data is supposed to be altered/initialized by the function.
Edit: Is this something the compiler might optimize?

Take a look at your operating systems ABI.
Let's assume for the moment that you're running a 64 bit Linux system. A pointer is an integer type and the first six integer types are passed by registers during your function call - no cache involvement so far.
If you go for your copy you assign extra RAM-space. In this case your cache is part of your whole program.
I would go for the first one as it passes your pointer information directly and does not assign extra RAM.
But: If you like to measure your program. Go for valgrind (still assuming you're running linux).
Code both variants and do valgrind --tool=cachegrind myfile and compare the results. Though I really doubt that there's a difference on modern CPUs...

there are so many variables involved here it's very difficult to consider them all, but let me try and outline a few important aspects:
whether you use the pointer or use a local copy instead should not make a difference from the cash memory's point of view: modern caches based architectures will have the data in the cache by the time you need it (unless your data structure is larder than a cache line - which is very unlikely)
There is a very good chance that the generated code in both cases is 95% the same: when you read or write a field in the from your data structure, the compiler reads it locally (either in a register or places it on the stack) anyway. The difference would come from reading/writing all data fields or just some of them.
Considering a modern day parallel CPU architecture the overhead might not even be there, actually in some cases, because the compiler is able to group the instructions better, the generated code for the situation where you use a local copy might be faster.

Related

Serialize a function pointer in C and save it in a file?

I am working on a C file register program that handles arbitrary generic data so the user needs to supply functions to be used, these functions are saved in function pointer in the register struct and work nicely. But I need to be able to run these functions again when the program is restarted ideally without the user needing the supply them again. I serialize important data about the register structure and write it into a header.
I was wondering how I can save the functions there too, a compiled c function is just raw binary data, right? So there must be a way to store it into a file and load the function pointers from the content in the file, but I am not sure how to this. Can someone point me in the right direction?
I am assuming it's possible to do this is C since it allows you to do pretty much anything but I might be missing something, can I do this without system calls at all? Or if not what would be the simplest way to do this in posix?
The functions are supplied when creating the register or creating new secondary indexes:
registerHandler* createAndOpenRecordFile(
int overwrite, char *filename, int keyPos, fn_keyCompare userCompare, fn_serialize userSerialize, fn_deserialize userDeserialize, int type, ...)
And saved as functions pointers:
typedef void* (*fn_serialize)(void*);
typedef void* (*fn_deserialize)(void*);
typedef int (*fn_keyCompare) (const void *, const void *);
typedef struct {
...
fn_serialize encode;
fn_deserialize decode;
fn_keyCompare compare;
} registerHandler;
While your logic makes some sort of sense, things much, much more complex than that. My answer is going to contain most of the comments already made here, only in answer form...
Let's assume that you have a pointer to a function. If that function has a jump instruction in it, that jump instructions could jump to an absolute address. That means that when you deserialize the function, you have to have a way to force it to be loaded into the same address, so that the absolute jump jumps to the correct address.
Which brings us to the next point. Given that your question is tagged with posix, there is no POSIX-compliant way to load code into a specific address, there's MAP_FIXED, but it's not going to work unless you write your own dynamic linker. Why does that matter? because the function's assembly code might reference the function's start address, for various reasons, most prominent of which is if the function itself gives its own address as an argument to another function.
Which actually brings us to our next point. If the serialized function calls other functions, you'd have to serialize them too. But that's the "easy" part. The hard part is if the function jumps into the middle of another function rather than call the other function, which could happen e.g. as a result of tail-call optimization. That means you have to serialize everything the function jumps into (recursively), but if the function jumps to 0x00000000ff173831, how many bytes will you serialize from that address?
For that matter, how do you know when any function ends in a portable way?
Even worse, are you even guaranteed that the function is contiguous in memory? Sure, all existing, sane hardware OS memory managers and hardware architectures make it contiguous in memory, but is it guaranteed to be so 1 year from now?
Yet another issue is: What if the user passes a different function based on something dynamic? i.e. if the environment variable X is true, we want function x(), otherwise we want y()?
We're not even going to think about discussing portability across hardware architectures, operating systems, or even versions of the same hardware architecture.
But we are going to talk about security. Assuming that you no longer require the user to give you a pointer to their code, which might have had a bug that they fixed in a new version, you'll continue to use the buggy version until the user remembers to "refresh" your data structures with new code.
And when I say "bug" above, you should read "security vulnerability". If the vulnerable function you're serializing launches a shell, or indeed refers to anything outside the processes, it becomes a persistent exploit.
In short, there's no way to do what you want to do in a sane and economic way. What you can do, instead, is to force the user to package these functions for you.
The most obvious way to do it is asking them to pass a filename of a library which you then open with dlopen().
Another way to do it is pass something like a Lua or JavaScript string and embed an engine to execute these strings as code.
Yet another way is to pass paths to executables, and execute these when the data needs to be processed. This is what git does.
But what you should probably do is just require that the user always passes these functions. Keep it simple.

RAM Checksum in C language

I need to check the RAM of a MCU at startup using a checkerboard-like algorithm. I do not want to lose whatever data is already in RAM, and also i do not know how not to affect the variables im using to perform this algorithm.
I was thinking something like:
for (position=0; position< 4096; position++)
{
*Temporal = 0x5555;
if(*Temporal != 0x5555) Error = TRUE;
*Temporal = 0xAAAA;
if(*Temporal != 0xAAAA) Error= TRUE;
Temporal +=1;
}
should i modify the linker to know where Temporal and Error are being placed?
Use the modifier "register" when declaring your pointers (as in "register int *"). Registers are a special segment of memory inside the processor core that (usually) does not count as part of RAM, so any changes to them don't count as RAM read/writes. Some exceptions exist; for instance, in AVR microcontroles, the first 32 bytes of RAM are "faked" into registers.
Your question probably got a downvote for the lack of clarity, and the question about retaining whatever was in RAM being easily solved by most beginner C programmers (just copy the contents of the pointer into a temp variable before testing it, and copying back after the end of the test). Also, you're not performing a Checksum: you're just making a Memory Test. These are very different, separate things.
You need to ensure the memory that you are testing is in a different location than the memory containing the program you are running. You may be able to do this by running directly out of flash, or whatever other permanent storage you have connected. If not, you need to do something with the link map to ensure proper memory segmentation.
Inside the test function, ussing register as MVittiS suggests is a good idea. Another alternate is to use a global variable mapped to a different segment than the one under test.
You may want to read this article about memory testing to understand the limitations of the test you propose, how memory can fail, and what you should be testing for.
I don't know (not enough details) but your memory test may be a cache test and might not test any RAM at all.
Your memory test (if it does test memory) is also very badly designed. For example, you could chop (or short out) all of the address lines and all data lines except for the least significant 2 data lines, and even though there'd be many extremely serious faults the test will still pass.
My advice would be to read something like this web page about memory testing, just to get some ideas and background information: http://www.esacademy.com/en/library/technical-articles-and-documents/miscellaneous/software-based-memory-testing.html
In general, for non-destructive tests you'd copy whatever you want to keep somewhere else, then do the test, then copy the data back. It's very important that "somewhere else" is tested first (you don't want to copy everything into faulty RAM and then copy it back).
I would also recommend using assembly (instead of C) to ensure no unwanted memory accesses are made; including your stack.
If your code is in RAM that needs to be tested then you will probably need 2 copies of your RAM testing code. You'd use the second copy of your code when you're testing the RAM that contains the first copy of your code. If your code is in ROM then it's much easier (but you still need to worry about your stack).

Is there a performance benefit to copying CUDA C __constant__ variables to local memory

For example if you have a simple constant variable __device__ __constant__ int MY_CONSTANT; and it is accessed by the same kernel thread multiple times:
__global__ void move(int* dataA, int* dataB, int* dataC){
...
dataB[threadID] = dataA[threadID] * MY_CONSTANT;
dataC[threadID] = dataA[[threadID] * dataB[threadID] % MY_CONSTANT;
...
}
I can see that it would be beneficial to store the value of dataA[threadID] and dataA[threadID] * MY_CONSTANT in local variables/registers to avoid unnecessary global reads. Ignoring that, would it be beneficial to place the value of MY_CONSTANT in a local variable to avoid it being read twice, or would this be handled by the compiler, given that it cannot change, unlike the other global data.
The only way to know, for sure, is to code it up each way and do the following:
Generate and examine the PTX code to determine what the compiler emits for each implementation.
Instrument and profile each implementation and test performance using representative inputs.
What you're asking about is the sort of micro-optimization that warranted the phrase "premature optimization". My recommendation is that you forget about such kinds of optimization unless you find that performance is insufficient for your needs.
That being said, constant memory is stored in global memory space, and when accessed by a multiprocessor, becomes added to the cache. Subsequent accesses to this area of memory have much lower latency. When all threads in a warp access the same word of constant memory, there are no conflicts and everything happens rather quickly anyway (note that hardware differs a lot, and what I'm saying maybe isn't technically accurate for your device, but the takeaway is this: follow the access patterns for memory types, and you won't need to worry about the kind of thing you're asking here).

Using Structs in Functions

I have a function and i'm accessing a struct's members a lot of times in it.
What I was wondering about is what is the good practice to go about this?
For example:
struct s
{
int x;
int y;
}
and I have allocated memory for 10 objects of that struct using malloc.
So, whenever I need to use only one of the object in a function, I usually create (or is passed as argument) pointer and point it to the required object (My superior told me to avoid array indexing because it adds a calculation when accessing any member of the struct)
But is this the right way? I understand that dereferencing is not as expensive as creating a copy, but what if I'm dereferencing a number of times (like 20 to 30) in the function.
Would it be better if i created temporary variables for the struct variables (only the ones I need, I certainly don't use all the members) and copy over the value and then set the actual struct's value before returning?
Also, is this unnecessary micro optimization? Please note that this is for embedded devices.
This is for an embedded system. So, I can't make any assumptions about what the compiler will do. I can't make any assumptions about word size, or the number of registers, or the cost of accessing off the stack, because you didn't tell me what the architecture is. I used to do embedded code on 8080s when they were new...
OK, so what to do?
Pick a real section of code and code it up. Code it up each of the different ways you have listed above. Compile it. Find the compiler option that forces it to print out the assembly code that is produced. Compile each piece of code with every different set of optimization options. Grab the reference manual for the processor and count the cycles used by each case.
Now you will have real data on which to base a decision. Real data is much better that the opinions of a million highly experience expert programmers. Sit down with your lead programmer and show him the code and the data. He may well show you better ways to code it. If so, recode it his way, compile it, and count the cycles used by his code. Show him how his way worked out.
At the very worst you will have spent a weekend learning something very important about the way your compiler works. You will have examined N ways to code things times M different sets of optimization options. You will have learned a lot about the instruction set of the machine. You will have learned how good, or bad, the compiler is. You will have had a chance to get to know your lead programmer better. And, you will have real data.
Real data is the kind of data that you must have to answer this question. With out that data nothing anyone tells you is anything but an ego based guess. Data answers the question.
Bob Pendleton
First of all, indexing an array is not very expensive (only like one operation more expensive than a pointer dereference, or sometimes none, depending on the situation).
Secondly, most compilers will perform what is called RVO or return value optimisation when returning structs by value. This is where the caller allocates space for the return value of the function it calls, and secretly passes the address of that memory to the function for it to use, and the effect is that no copies are made. It does this automatically, so
struct mystruct blah = func();
Only constructs one object, passes it to func for it to use transparently to the programmer, and no copying need be done.
What I do not know is if you assign an array index the return value of the function, like this:
someArray[0] = func();
will the compiler pass the address of someArray[0] and do RVO that way, or will it just not do that optimisation? You'll have to get a more experienced programmer to answer that. I would guess that the compiler is smart enough to do it though, but it's just a guess.
And yes, I would call it micro optimisation. But we're C programmers. And that's how we roll.
Generally, the case in which you want to make a copy of a passed struct in C is if you want to manipulate the data in place. That is to say, have your changes not be reflected in the struct it self but rather only in the return value. As for which is more expensive, it depends on a lot of things. Many of which change implementation to implementation so I would need more specific information to be more helpful. Though, I would expect, that in an embedded environment you memory is at a greater premium than your processing power. Really this reads like needless micro optimization, your compiler should handle it.
In this case creating temp variable on the stack will be faster. But if your structure is much bigger then you might be better with dereferencing.

Why compilers creates one variable "twice"?

I know this is more "heavy" question, but I think its interesting too. It was part of my previous questions about compiler functions, but back than I explained it very badly, and many answered just my first question, so ther it is:
So, if my knowledge is correct, modern Windows systems use paging as a way to switch tasks and secure that each task has propriate place in memory. So, every process gets its own place starting from 0.
When multitasking goes into effect, Kernel has to save all important registers to the task´s stack i believe than save the current stack pointer, change page entry to switch to another proces´s physical adress space, load new process stack pointer, pop saved registers and continue by call to poped instruction pointer adress.
Becouse of this nice feature (paging) every process thinks it has nice flat memory within reach. So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
But, when there is no more segmentation for the process, why does still compilers create variables on the stack, or when global directly in other memory space, than directly in program code?
Let me give an example, I have a C code:int a=10;
which gets translated into (Intel syntax):mov [position of a],#10
But than, you actually ocupy more bytes in RAM than needed. Becouse, first few bytes takes the actuall instruction, and after that instruction is done, there is new byte containing the value 10.
Why, instead of this, when there is no need to switch any segment (thus slowing the process speed) isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
I know, that const variables do this, but they are not really variables, when you cannot change them.
I hope I eplained my question well, but I am still learning English, so forgive my sytactical and even semantical errors.
EDIT:
I have read your answers, and it seems that based on those I can modify my question:
So, someone told here that global variable is actually that piece of values attached directly into program, I mean, when variable is global, is it atached to the end of program, or just created like the local one at the time of execution, but instead of on stack on heap directly?
If the first case - attached to the program itself, why is there even existence of local variables? I know, you will tell me becouse of recursion, but that is not the case. When you call function, you can push any memory space on stack, so there is no program there.
I hope you do understand me, there always is ineficient use of memory, when some value (even 0) is created on stack from some instruction, becouse you need space in program for that instruction and than for the actual var. Like so: push #5 //instruction that says to create local variable with integer 5
And than this instruction just makes number 5 to be on stack. Please help me, I really want to know why its this way. Thanks.
Consider:
local variables may have more than one simultaneous existence if a routine is called recursively (even indirectly in, say, a recursive decent parser) or from more than one thread, and these cases occur in the same memory context
marking the program memory non-writable and the stack+heap as non-executable is a small but useful defense against certain classes of attacks (stack smashing...) and is used by some OSs (I don't know if windows does this, however)
Your proposal doesn't allow for either of these cases.
So, there is no far jumps, far pointers, memory segment or data segment. All is nice and linear.
Yes and no. Different program segments have different purposes - despite the fact that they reside within flat virtual memory. E.g. data segment is readable and writable, but you can't execute data. Code segment is readable and executable, but you can't write into it.
why does still compilers create variables on the stack, [...] than directly in program code?
Simple.
Code segment isn't writable. For safety reasons first. Second,
most CPUs do not like to have code segment being written into as it
breaks many existing optimization used to accelerate execution.
State of the function has to be private to the function due to
things like recursion and multi-threading.
isn´t just a value of 10 coded directly into program like this
Modern CPUs prefetch instructions to allow things like parallel execution and out-of-order execution. Putting the garbage (to CPU that is the garbage) into the code segment would simply diminish (or flat out cancel) the effect of the techniques. And they are responsible for the lion share of the performance gains CPUs had showed in the past decade.
when there is no need to switch any segment
So if there is no overhead of switching segment, why then put that into the code segment? There are no problems to keep it in data segment.
Especially in case of read-only data segment, it makes sense to put all read-only data of the program into one place - since it can be shared by all instances of the running application, saving physical RAM.
Becouse compiler know the exact position of every instruction, when operating with that variable, it would just use it´s adress.
No, not really. Most of the code is relocatable or position independent. The code is patched with real memory addresses when OS loads it into the memory. Actually special techniques are used to actually avoid patching the code so that the code segment too could be shared by all running application instances.
The ABI is responsible for defining how and what compiler and linker supposed to do for program to be executable by the complying OS. I haven't seen the Windows ABI, but the ABIs used by Linux are easy to find: search for "AMD64 ABI". Even reading the Linux ABI might answer some of your questions.
What you are talking about is optimization, and that is the compiler's business. If nothing ever changes that value, and the compiler can figure that out, then the compiler is perfectly free to do just what you say (unless a is declared volatile).
Now if you are saying that you are seeing that the compiler isn't doing that, and you think it should, you'd have to talk to your compiler writer. If you are using VisualStudio, their address is One Microsoft Way, Redmond WA. Good luck knocking on doors there. :-)
Why isn´t just a value of 10 coded directly into program like this:
xor eax,eax //just some instruction
10 //the value iserted to the program
call end //just some instruction
That is how global variables are stored. However, instead of being stuck in the middle of executable code (which is messy, and not even possible nowadays), they are stored just after the program code in memory (in Windows and Linux, at least), in what's called the .data section.
When it can, the compiler will move variables to the .data section to optimize performance. However, there are several reasons it might not:
Some variables cannot be made global, including instance variables for a class, parameters passed into a function (obviously), and variables used in recursive functions.
The variable still exists in memory somewhere, and still must have code to access it. Thus, memory usage will not change. In fact, on the x86 ("Intel"), according to this page the instruction to reference a local variable:
mov eax, [esp+8]
and the instruction to reference a global variable:
mov eax, [0xb3a7135]
both take 1 (one!) clock cycle.
The only advantage, then, is that if every local variable is global, you wouldn't have to make room on the stack for local variables.
Adding a variable to the .data segment may actually increase the size of the executable, since the variable is actually contained in the file itself.
As caf mentions in the comments, stack-based variables only exist while the function is running - global variables take up memory during the entire execution of the program.
not quite sure what your confusion is?
int a = 10; means make a spot in memory, and put the value 10 at the memory address
if you want a to be 10
#define a 10
though more typically
#define TEN 10
Variables have storage space and can be modified. It makes no sense to stick them in the code segment, where they cannot be modified.
If you have code with int a=10 or even const int a=10, the compiler cannot convert code which references 'a' to use the constant 10 directly, because it has no way of knowing whether 'a' may be changed behind its back (even const variables can be changed). For example, one way 'a' can be changed without the compiler knowing is, if you have a pointer which points 'a'. Pointers are not fixed at runtime, so the compiler cannot determine at compile time whether there will be a pointer which will point to and modify 'a'.

Resources