Dear all. I was wondering if there are examples of situations where you would purposefully pass an argument by value in C. Let me rephrase. When do you purposefully use C's pass-by-value for large objects? Or, when do you care that the object argument is fully copied in a local variable?
EDIT: Now that I think about it, if you can avoid pointers, then do. Nowadays, "deep" copying is possible for mostly everything in small apps, and shallow copying is more prone to pointer bugs. Maybe.
In C (sans const references), you pass by value for 3 reasons.
You don't want the source to be modified by the receiving function outside of its context. This is (was) the standard reason taught in school as it why to pass by value.
Passing by value is cheaper if the value fits within the architecture's register - or possibly registers if the compiler is very intelligent. Passing by value means no pointer creation and no dereference to get at the value being passed in. A small gain, but it does add up in certain circumstances.
Passing by value takes less typing. A weak reason to be sure, but there it is.
The const keyword negates most of reason 1, but reason 2 still has merit and is the main reason I pass by value.
Well, for one thing, if you want to change it.
Imagine the following contrived function:
int getNextCharAndCount (char *pCh);
Each time you call it, it returns the next most frequent character from a list by returning the count from the function and setting a character by way of the character pointer.
I'm having a hard time finding another use case which would require the pointer if you only ever wanted to use (but not change) the underlying character. That doesn't mean one doesn't exist of course :-)
In addition, I'm not sure what you're discussing is deep/shallow copy. That tends to apply to structures with pointers where a shallow copy just duplicates the top level while a deep copy makes copies of all levels.
What you're referring to is pass-by-value and pass-by-reference.
Passing by-reference is cheaper because you don't have to create a local copy of an object. If the function needs a local copy (for any purpose) - that could be a case.
I follow as a rule:
pass built-in types by value (int, char, double, float...)
pass classes and structs by (const) reference. There is no pointer handling involved whatsoever.
Never had any problems with this way of work.
If we're going to be pedantic about this, everyhing in C is pass-by-value. You may pass a pointer by value instead of passing the actual object by value, but it's still pass-by-value.
Anyway, why pass an entire object instead of a pointer to an object? Well, for one, your compiler may be able to optmize the call such that underneath the covers only an address is copied. Also/Alternatively, once you introduce pointers, your compiler may not be able to do as much optimization of your function because of aliasing. It's also less error prone to not have to remember to dereference. The caller can also be sure that what he passed in is not modified (const doesn't really guarantee this, it can be -dangerously- cast away)
I don't think your argument about chars holds water. Even though your char is conceptually 1 byte, each argument to a function call typically translates to a whole (word-sized) register and to the same amount of space on the stack for efficiency.
You can pass a whole struct on the stack as an argument if you really want to (and, I believe, return them as well). It's a way of avoiding both allocating memory and having to worry about pointer hygiene.
Depending on how the call stack is built the char and char* may take the same amount of space. It is generally better to have values aligned on word boundaries. The cost of accessing a 32 bit pointer on a word boundary may be significantly lower than accessing it on a non-word boundary.
Passing by value is safer if you don't want the value modified. Passing by reference can be dangerous. Consider passing by referennce
CONST int ONE = 1;
increment( *ONE );
print ONE;
Output is 2 if the constant was modified.
Related
Although the subject is discussed many times, I haven't found any satisfying answer so far. When to return data from a function by return or to pass a reference to change the data on address? The classic answer is to pass a variable as reference to a function when it becomes large (to avoid stack copying). This looks true for anything like a structure or array. However returning a pointer from a function is not uncommon. In fact some functions from the C library to the exact thing. For example:
char *strcat(char *dst, const char *src);
Always returns a pointer to destination even in case of an error. In this case we can just use the passed variable and leave the return for what it is (as most do).
When looking at structures I see the same thing happening. I often return pointers when functions only need to be used in variable initialization.
char *p = func(int i, const char *s);
Then there is the argument that stack coping variables is expensive, and so to use pointers instead. But as mentioned here some compilers are able to decide this themselves (assuming this goes for C as well). Is there a general rule, or at least some unwritten convention when to use one or the other? I value performance above design.
Start by deciding which approach makes the most sense at the logical level, irrespective of what you think the performance implications might be. If returning a struct by value most clearly conveys the intent of the code, then do that.
This isn't the 1980s anymore. Compilers have gotten a lot smarter since then and do a really good job of optimizing code, especially code that's written in a clear, straightforward manner. Similarly, parameter passing and value return conventions have become fairly sophisticated as well. The simplistic stack-based model doesn't really reflect the reality of modern hardware.
If the resulting application doesn't meet your performance criteria, then run it through a profiler to find the bottlenecks. If it turns out that returning that struct by value is causing a problem, then you can experiment with passing by reference to the function.
Unless you're working in a highly constrained, embedded environment, you really don't have to count every byte and CPU cycle. You don't want to be needlessly wasteful, but by that same token you don't want to obsess over how things work at the low level unless a) you have really strict performance requirements and b) you are intimately familiar with the details of your particular platform (meaning that you not only know your platform's function calling conventions inside and out, you know how your compiler uses those conventions as well). Otherwise, you're just guessing. Let the compiler do the hard work for you. That's what it's there for.
Rules of thumb:
If sizeof(return type) is bigger than sizeof(int), you should probably pass it by pointer to avoid the copy overhead. This is a performance issue. There's some penalty for dereferencing the pointer, so there are some exceptions to this rule.
If the return type is complex (containing pointer members), pass it by pointer. Copying the local return value to the stack will not copy dynamic memory, for example.
If you want the function to allocate the memory, it should return a pointer to the newly allocated memory. It's called the factory design pattern.
If you have more than one thing you want to return from a function - return one by value, and pass the rest by pointers.
If you have a complex/big data type which is both input and output, pass it by pointer.
I am fairly new to programming languages and wonder if it is possible to pass an argument without specific type to a function. For instance I have the following piece of code that defines a funcion add that will take a block of memory, check it if is filled via another function, and then adds an element to the list related to that block of memory.
This element can be an int, a float or a char. So I would like to write:
add(arrs1,20); //or also
add(arrs2,'b'); //or also
add(arrs3, 4.5);
Where arrs# are defined by struct arrs arrs#, and they refer to arrays of either floats, ints or chars but not mixed. How could I accomplish this?
int add(arrs list, NEW_ELEMENT){//adds NEW_ELEMENT at the end of an arrs
int check_if_resize;
check_if_resize=resize(list, list->size + 1);
list->ptr[list->used++] = NEW_ELEMENT;
return check_if_resize;
}
I appreciate your help.
C does, by design, not allow a single function to accept more than a single type for each argument. There are various ways to do something equivalent in C, though:
First and foremost, you can just write multiple different functions that do the same, but on different types. For instance, instead of add you could have three functions named add_int, add_char and add_float. I would recommend doing this in most cases, as it is by far the easiest and least error-prone.
Secondly, you might have noticed how printf can print both strings and numbers? So-called variadic functions, like printf, can take different types of arguments, but in the case of printf, you must still specify the type of arguments you want in the format string.
Finally, you can use void pointers if you need to work with the memory an object occupies, regardless of its type. Functions like memcpy and memset do this, for instance to copy the contents of one object directly to another. This is a bit harder to manage properly, as it is easy to make mistakes and end up corrupting memory, but its still doable (and sometimes even the best option).
But if you're a beginner in C, as you state you are, the first option is probably the easiest, especially when dealing with only a few different data types (in this case, three).
You can pass pretty much anything as a void * in a function that will add this content to a linked list by the use of memcpy (or even safe memmove).
As long as you have a pointer to the next node of your list, you don't have to worry about the type of the stored data.
Just be sure not to dereference a void *, but rather to cast it and use this casted variable (as a char if you want to work on this data byte by byte for instance)
I am confused because I haven't written C in a while. In C++, we would pass them as references, in order not to copy the whole struct. Does this apply to C too? Should we pass them as pointers, even if we don't want to modify them, in order to avoid copying?
In other words, for a function that checks if two structs are equal, we better do
int equal(MyRecord* a, MyRecord* b);
and decrease a bit the readability (because of pointers)
or
int equal(MyRecord a, MyRecord b);
will have the same performance?
Often, passing pointers is faster - and you'll call equal(&r1, &r2) where r1 and r2 are local struct variables. You might declare the formals to be const pointers to a const structure (this could help the optimizing compiler to generate more efficient code). You might also use the restrict keyword (if you are sure you'll never call your equal with two identical pointers, e.g. equal(&r1,&r1), i.e. without pointer aliasing).
However, some particular ABIs and calling conventions may mandate particular processing for some few particular structures. For example, the x86-64 ABI for Linux (and Unix SVR4) says that a struct with two pointers or integral values will be returned thru two registers. This is usually faster than modifying a memory zone with its pointer in a register. YMMV.
So to know what is faster, you really should benchmark. However, passing a large-enough struct (e.g. with at least 4 integral or pointer fields) by value is almost always slower than passing a pointer to it.
BTW, what really matters on current desktop and laptop processors is the CPU cache. Keeping frequently used data inside L1 or L2 cache will increase performance. See also this.
What is faster massively depends on the size of the struct and it’s use inside the called function.
If your struct is not larger than a pointer, passing by value is the best choice (less or equal amount of data needs to be copied).
If your struct is larger than a pointer, it heavily depends on the kind of access taking place inside the called function (and appearantly also on ABI specifics). If many random accesses are made to the struct, it may be faster to pass by value, even though it’s larger than a pointer, because of the pointer indirection taking place inside the function.
All in all, you have to profile to figure out what’s faster, if your struct is larger than a pointer.
Passing pointers is faster, for the reasons you say yourself.
Actually, I find C more readable than C++ in this case: by passing a pointer in the call, you acknowledge that your paramters might get changed by the called function. With C++ references, you can't immediately say that by seeing only the call, you also have to check out the called function prototype to see if it uses references.
I am learning C and get confused about something I read online.
At http://www.cs.bu.edu/teaching/c/stack/array/
I could read:
Let's look at the functions that determine emptiness and fullness.
Now, it's not necessary to pass a stack by reference to these
functions, since they do not change the stack. So, we could prototype
them as:
int StackIsEmpty(stackT stack);
int StackIsFull(stackT stack);
However, then some of the stack functions would take pointers (e.g.,
we need them for StackInit(), etc.) and some would not. It is more
consistent to just pass stacks by reference (with a pointer) all the
time
(I am not showing the code for what a stackT is, it is just a dynamic array)
From my (maybe limited) understanding, the disadvantage of passing by value is that the data is duplicated in the stack memory of the function. Since a stackT might be big, passing by value rather than pointer would be time consuming.
Do I get it right or am I still not clear with the basics ?
Correct, if you pass something "large" by value that item is copied onto the stack.
Passing a pointer to the data avoids the copy.
It is doubtful that the performance difference will be meaningful in most real-world applications, unless "large" is actually "huge" (which in turn may overflow the stack).
You are correct. Passing by value causes the program to copy, in the entirety, all of the data in that parameter. If it's only one or two ints, no problem, but copying multiple kilobytes is very expensive. Passing by reference only copies the pointer.
However, you must watch out for changing the data pointed at by the pointer and then expecting to return to it unchanged. C++ has passing by "const reference", which is like a guarantee that the data will not be changed, but C does not.
Let's say I have a loop which repeats millions of times. Inside of this loop I have a function call.
Inside of this function I need to operate on some temporary variable created at the very beginning. Now, which one is better:
a) Create temporary variable at the beginning of the code, initialize it at the beginning of the loop, and pass it as function parameter
b) Create just local temporary variable at the beginning of the called function?
Is this answerable question? I'd like to know which point is considered better practice, or which one is faster.
Let's throw up some possible definitions for some_function(), the function you will be calling from your loop.
// Method 1
void some_function() {
int temporary;
// Use temporary
}
// Method 2
void some_function(int temporary) {
// Use temporary
}
// Method 3
void some_function(int *temporary) {
// Use *temporary
}
Method 1 is likely to be the most readable out of these options, and so it's the one I would prefer unless you have a really good reason to do something else. It is also likely to be faster than either of the others, unless your compiler is inlining the function call. If it is, then all three are likely to perform exactly the same (method 3 might still be slower if the compiler isn't able to optimize away the pointer dereferences).
If the compiler is not inlining, then method 2 is likely to be slower than method 1. This is because, in terms of stack-allocated memory, they are the same -- function arguments are going to be stored on the stack the same way locals are. The only difference between a function argument and a local in this context is that the function argument can be given a value by the caller. This step of transferring the value from the caller to the function is (theoretically) going to slow down the invocation.
Method 3 is almost certainly going to be slower, since accesses to the temporary memory will include a level of indirection. Dereferencing a pointer is not a cheap operation compared to accessing a local.
Of course, if performance is absolutely critical then you should benchmark these approaches. I suspect that method 1 will turn out to be the fastest (or at least no slower than the others) and additionally seems more readable to me.
If the variable is not needed outside the function, then it should be inside the function. This allows the compiler to do the best job of optimising the code, as well as making the code most readable and easy to use (this applies generally, "declare variables with the smallest possible scope", although for small functions, declaring a handful of variables at the top of the function each time is the best option).
From a performance perspective, passing a variable to a function is either equivalent, or worse than having a local variable. [And of course, the compiler may inline everything and you end up with exactly the same code in both cases, but that's dependent on the compiler and the code you have].
As others have mentioned, passing a pointer to a local variable will incur a "penalty" for accessing the pointer to get the value. It may not make a huge difference, but it almost certainly makes some difference. This should definitely be the last resort. [Note that if the variable is LARGE, the overhead of passing a copy to the function may still be worse than the overhead of a pointer. But if we assume it's a simple type like int or float, then a pointer has noticeable overhead].
Any time there is a question on performance, you DEFINITELY should benchmark YOUR code. Asking someone else on the internet may be worthwhile if there is a choice between algorithms for sorting or something like that, but if it's a case of "is it better to do this or that" in some more subtle differences, then the differences are often small and what your particular compiler does will have much more influence than "which is theoretically better".
There is a subtle difference between these two approaches if you are passing the variable as a pointer, rather than a value. The pointer will get pushed onto the call stack and will have to be referenced in order to get/set the value.
Conversely, setting it as a local value, or pass by value, will put the value on the stack. It matters not whether it is a local or pass by value in that case... though there is one possible caveat based on how the variable is handled outside of the function in the case of pass by value... if it is stored in a variable (not passing a literal value) then it has to get fetched from memory and pushed on the stack. If it is set from a literal value inside the function, it is just a literal pushed on the stack and saved a memory cycle.
A third option you omit is the use of a global variable.
On the off chance the value is constant, always, then the best answer use a #define and compile it directly into the code as a literal.