I am dusting off my C skills working on some C libraries of mine. After having put together a first working implementation I am now going over the code to make it more efficient. Currently I am on the topic of passing function parameters by reference or value.
My question is, why would I ever pass any function parameter by value in C? The code might look cleaner, but wouldn't it always be less efficient than passing by reference?
Because it's not as important to code for the computer as it is to code for the next human being. If you are passing references around then any reader must assume that any called function could change the value of his parameters and would be obligated to check it or copy the parameter before calling.
Your function signature is a contract and divides your code up so that you don't have to fit the entire code base into your head in order to comprehend what is going on in some area, by passing references you are making the next guy's life worse, your biggest job as a programmer should be making the next guy's life better--because the next guy will probably be you.
In C, all arguments are passed by value. A true pass by reference is when you see the effect of a modification without any explicit indirection at all:
void f(int c, int *p) {
c++; // in C you can't change the original paramenter passed like this
p++; // or this
}
Using values instead of pointers though, is frequently desirable:
int sum(int a, int b) {
return a + b;
}
You would not write this like:
int sum(int *a, int *b) {
return *a + *b;
}
Because it is not safe and it is inefficient. Inefficient because there is an additional indirection. Moreover, in C, a pointer argument suggests the caller that the value will be modified through the pointer (especially true when the pointed type has a size less than or equal to the pointer itself).
Please refer to Passing by reference in C. Pass by reference is a misnomer in C. It refers to passing the address of a variable instead of the variable, but you are passing a pointer to the variable by value.
That said, if you were to pass the variable as a pointer, then yes it would be marginally more efficient, but the main reason is to be able to modify the original variable it points to. If you don't want to be able to do this, it is recommended you take it by value to make your intent clear.
Of course, all this is moot in terms of one of Cs heavier data structures. Arrays are passed by a pointer to their first variable whether you like it or not.
Two reasons:
Often times you will have to dereference the pointer you've passed in many times (think a long for-loop). You don't want to dereference every single time you want to look up the value at that address. Direct access is faster.
Sometimes you want to modify the passed-in value inside you function, but not in the caller. Example:
void foo( int count ){
while (count>0){
printf("%d\n",count);
count--;
}
}
If you wanted to do the above with something passed by reference, you would haev to create yet another variable inside your function to store it first.
Related
I understand the premise of pointers, but I find it very annoying, and I don't get why it's considered useful;
I've learned about pointers, and the next thing I know, I start seeing bubbles, asterisks, and ampersands everywhere.
#include <stdio.h>
int main () {
int *ptr, q;
q = 50;
ptr = &q;
printf("%d", *ptr);
return 0;
}
why is this important or useful?
First, parameters passed to a function can only be primitives(int, char, long....), structs or pointers. Then if you need to pass a more complex element like an array (strings) or a function, you have to pass a reference to this element.
The second things that I can quickly think of is: parameters are always passed by "value". This means the called function only get a copy of your variable. So, modifications will only affect the copy, the original variable will remain unchanged.
If you pass a variable by "reference" with a pointer, the pointer itself is immutable but as it is a reference to the original var, any modification to the pointed element will also affect the var in the caller function.
In other words, if you want to create a function that can alter a variable, you have to pass it a pointer to that variable to achieve this.
In the following code I try to print all paths from the root to leaves in a Binary Tree.
If I write a recursive function as follows :
void printPath(BinaryTreeNode * n, int path[],int pathlen)
{
//assume base case and initializations taken care of
path[pathlen++] = n->data;
printPath(root->left,path,pathlen);
printPath(root->right,path,pathlen);
}
(I have purposefully removed base cases and edge cases handling to improve readability)
What happens to the path array? Is it just one global copy that gets modified during each recursive call ?Does the pathlen variable overwrites some of the path values, giving the feeling that each stack frame has it's own local copy of path, since pathlen is local to each stack frame?
Passing around a int[] variable is really almost like passing a int* around. The first invocation of the recursive function passes the real int[] which is nothing more than an address in memory, that's the same address used in every recursive call.
Basically if you place a debug print, eg
printf("%p\n", path);
in your recursive function you will see that the address is always the same, it doesn't change nor it gets modified. The only thing that is pushed onto the stack frame during the invocation is the address of the array, which nonetheless remains is always the same.
Welcome to array-pointer-decay. When you pass around an array, two distinct things happen:
When you declare a function to take an array parameter, you are really defining the function to take a pointer parameter. I. e., the declaration
void foo(int bar[]);
is perfectly equivalent to
void foo(int* bar);
The declaration of the array has decayed into a declaration of a pointer to its first element.
Whenever you use an array, it also decays into a pointer to its first element, so the code
int baz[7];
foo(baz);
is again perfectly equivalent to
int baz[7];
foo(&baz[0]);
There are only two exceptions, where this array-pointer-decay does not happen: the statements sizeof(baz) and &baz.
Together, these two effects create the illusion that arrays are passed by reference, even though C only ever passes a pointer by value.
The array-pointer-decay was invented to allow the definition of the array subscript operator in terms of pointer arithmetic: the statement baz[3] is defined to be equivalent to *(baz + 3). An expression which tries to add 3 to an array. But due to array-pointer-decay, baz decays into an int*, on which pointer arithmetic is defined, and which yields a pointer to the fourth element in the array. The modified pointer can then be dereferenced to get the value at baz[3].
void printPath(BinaryTreeNode * n, int path[],int pathlen);
compiler really looks at that like this
void printPath(BinaryTreeNode * n, int *path, int pathlen);
what happens to the path array? Is it just one global copy that gets modified during each recursive call
Nothing. The same path gets passed around, since in C array passing is just a pointer copy operation; and no, it isn't a global copy, but a parameter passed to the first call of the function, and will almost always live on the stack.
and the pathlen variable overwrites some of the path values giving the feeling that each stack frame has it's own local copy of path since pathlen is local to each stack frame?
Since you modify the value of the array elements and not the pointer pointing to the beginning of the array, nothing changes what path itself is pointing to (which is the array all the time). Like like you said it may give a feeling (particularly if you're used to that construct in other languages), but in reality only the same path gets passed around.
Aside: You don't seem to handle the exit condition and as it stands it'll be an infinite loop and mostly probably undefined behaviour when you start modifying elements that are out of the array's bounds.
consider the the two functions :
int add1(int x,int y)
{
return x+y;
}
void add2(int x,int y,int *sum)
{
*sum=x+y;
}
I generally use functions of the form add1 but I found some codes using functions of the form add2.
Even if the size return value is large(like an array or struct) we can just return its ponter
I wonder if there any reason for using the second form?
There's also the reason of returning success state.
There are a lot of functions like:
bool f(int arg1, int arg2, int *ret)
{
}
Where bool (or enum) return the success of the function. Instead of checking if ret is null... (And if you had more than 1 variable).
If you want to return two values from your function, then C is helpless unless you use pointers just like your function add2.
void add2()
{
/* Some Code */
*ptr1=Something;
*ptr2=Something;
}
Form 2 is very common for "multiple returns" in C. A canonical example is returning the address to a buffer and the length of the buffer:
/* Returns a buffer based on param. Returns -1 on failure, or 0 on success.
Buffer is returned in buf and buflen. */
int get_buffer(void *param, char **buf, int *buflen);
Functions of the form 2 are not faster than functions of the form 1 when you're using things as small as int. In fact, in this case, the second one is slower because you have to dereference the passed pointer. It's only useful in this case if your aim was to pass in an array of values)
Always use functions of the form 1 unless you want to pass in a very large piece of data to the function. In that case, the form 2 would be faster.
The reason we use the second form is because for large objects, we want to avoid copying them. Instead of copying them, we could just pass their memory addresses to the function. This is where the pointer comes in. So instead of giving the function all the data, you would just tell it where this data. (I hope this analogy is good enough)
It is largely a matter of preference and local conventions. The second form might be used alongside a bunch of other similar functions where the third parameter in each of them is always passed as a pointer to a return value.
Personally, I like the first form for almost all purposes: it does not require a pointer to be passed, and it allows some type flexibility in handling the return value.
Returning a value by writing to memory passed via a pointer is reasonable, when the returned object is large, or when the return value of the function is used for other purposes (e.g. signaling error conditions). In the code you have shown, neither of these two is the case, so I'd go for the first implementation.
When you return a pointer from a function, you have to make sure that the pointed to memory is valid after the function call. This means, the pointer must point to the heap, making an allocation on the heap necessary. This puts a burdon on the caller; he has to deallocate memory that he did not explicitly allocate.
I have a function written in C
FindBeginKey(KeyListTraverser, BeginPage, BeginKey, key1);
BeginKey is a pointer before function invoking, and I didn't initiate it, like
BeginKey = NULL;
In the FindBeginKey() function, I assign BeginKey to another pointer, and try to print out the current address of BeginKey in the function, it works correct.
But when code returns from function, I try to print out the address of BeginKey again, it shows 0x0.
Why does this happen, and if I want to preserve the address assigned in the function, what should I do?
To pass a value out of a function you have to pass by reference rather than by value as is normally the case with C functions. TO do this make the parameter a pointer to the type you want to pass out. Then pass the value into the call with the & (address operand).
e.g.
FindFoo(FOO** BeginKey);
and call it:
FindFoo(&BeginKey);
and in the function:
*BeginKey = 0xDEADC0DE;
From what I understand, you are calling the function like:
FindBeginKey(KeyListTraverser, BeginPage, BeginKey, key1);
However, when you try to write at the BeginKey address, you're basically passing in a pointer to 0x00. Rather, you need to pass a pointer to BeginKey.
FindBeginKey(KeyListTraverser, BeginPage, &BeginKey, key1);
If this is isn't what you meant, it would certainly help if you posted a code sample.
If you want to modify a parameter in a subroutine, you should pass a pointer of the thing you wanna modify.
void subroutine(int* x) {
*x = 5; // will modify the variable which x points to
x = 5; // INVALID! x is a pointer, not an integer
}
I don't know what all the C parameter passing rules are now, so this answer might be a little dated. From common practice in building applications and libraries that those applications called, the return from a C function would contain status, so the caller of the function could make a decision depending on the status code.
If you wanted the function to modify its input parameters, you would pass those parameters by reference &my_val, where int my_val;. And your function must dereference my_val like this *my_val to get its value.
Also, for performance reasons, and address (by reference) might be preferable, so that the your application did not bother copying the parameter's value into a local variable. That prolog code is generated by the compiler. Single parameters, char, int, and so on are fairly straight forward.
I am so used to C++ that passing by reference in C++ does not require dereferencing. The compiler's code takes care of that for you.
However, think about passing a pointer to a structure.
struct my_struct
{
int iType;
char szName[100];
} struct1;
struct my_struct *pStruct1 = &struct1;
If the structure contains lookup data that is filled in once on initialization and then referenced throughout your program, then pass a pointer to the structure by value pStruct1. If you are writing a function to fill that structure or alter already present data, then pass a pointer to the structure by value. You still get to alter what the structure pointer points to.
If on the other hand you are writing a function to assign memory to the pointer, then pass the address of the pointer (a pointer to the pointer) &pStruct1, so you will get your pointer pointing to the right memory.
I have the following problem with a program which I wrote in Visual C++ and I hope that anyone can help me please:
typedef struct spielfeld
{
int ** Matrix;
int height;
int width;
Walker walker;
Verlauf history;
} Spielfeld;
void show(Spielfeld fieldToShow); //Prototype of the Function where I have this
//problem
int main(int argc, char *argv[])
{
int eingabe;
Spielfeld field;
//Initialize .. and so on
//Call show-Function and pass the structure with Call by Value
show(field);
//But what's happened? field.Matrix has changed!!
//can anyone tell me why? I don't want it to become changed!
//cause that's the reason why I pass the field as Call by Value!
}
void show(Spielfeld fieldToShow)
{
//Here is the problem: Alltough the parameter fieldToShow has been passed
//with call by value, "fieldToShow.Matrix[0][0] = 1" changes the field in
//main!!
fieldToShow.Matrix[0][0] = 1;
//Another try: fieldToShow.walker.letter only affects the local fieldToShow,
//not that field in main! That's strange for me! Please help!
fieldToShow.walker.letter = 'v';
}
When you pass the structure in, you are passing it in by value. However, the matrix within it is implemented as a pointer to pointer to int. Those pointers are references, and so when you modify the value referenced by them in your function, the same value is referenced by the original structure in main.
If you want to pass these objects by value, you need to do a deep copy yourself, in which you allocate a new matrix, and copy all of the values from the original matrix into it.
As Drew points out, in C++, the preferred way to implement that deep copy is via a copy constructor. A copy constructor allows you to perform your deep copy any time your object is passed by value, without having to explicitly copy the object yourself.
If you are not ready for classes and constructors yet, you can simply write a function, perhaps Spielfeld copySpielfeld(Spielfeld original), that will perform that deep copy; it will essentially be the same as your initialization code that you elided in your example, except it will take values from the Spielfeld passed in, instead of creating a new Spielfeld. You may call this before passing your field into the show function, or have the show function do it for any argument passed in, depending on how you want your API to work.
You're copying the pointer when you pass fieldToShow. Pass-by-value does not perform a deep copy, so both the Spielfeld in an invocation of show(...) and main(...) (although distinct) have the same value for Matrix.
Fixing this is non-trivial. Probably the easiest thing to do would be to change show(...) to pass-by-reference (using a Spielfeld* basically) and make an explicit copy at the start of the function.
When your Spielfeld object is copied:
The copy has its own "walker", which is a copy of the original's "walker". Since walker is a struct, that means you have two structs.
The copy has its own "Matrix" member, which is a copy of the original's "Matrix" member. But Matrix is a pointer, which means you have two pointers. A copy of a pointer points to the same thing the original points to.
So, modifications to the contents of the copy's walker don't affect the original, because they have different walkers. Modifications to the contents of the copy's matrix do affect the original, because they share the same matrix.
The structure is begin passed by value, but since it contains a pointer (the matrix) what that pointer is pointing to can be changed by anyone that has access to the structure. If you don't want this to happen, you can make the pointer const.
As interesting trivia: this is how call by value works in java. Object references are always passed by value. If you manipulate the objects to which these references point tough it will feel like call by reference happened.
Has really nothing to do with your question but maybe you find that interestring.
Happy hacking