Specifics matter. Especially when talking about how something works, and even more so when we consider why something works. Currently, as I understand it, EVERYTHING in C is passed by value. NOTHING is passed by reference. Some programmers mention that arrays in C are passed by reference.
But as per my limited understanding,
Even if we pass an array to a function like this void traverse(int arr[4]);, it is actually being taken in as a copy of the pointer variable storing the location in memory of the first element in that array. It is then dereferenced inside the function, but the initial value being passed is actually a local variable. Since memory allocated to arrays in the program stack would be contiguous, the compiler is able to make square bracket notation work as well as pointer arithmetic.
This and passing by reference are not the same thing to me. I would think this is an important distinction.
But on the other hand, we can then just say that everything in computing is passed by value, since something like Java would do the same in a more subtle manner. And it is actually just simulating a pass by reference. Please advise.
At the level of bits in the computer, arguments can only be passed by value. Bits representing some argument are written to a processor register or memory location designated as the place to pass an argument. Passing by reference is a construct built upon passing by value by using an address as the value that is passed. Passing an address can be implemented automatically or manually. Both methods are pass-by-reference.
When we pass some entity by passing its address instead of passing its value directly, that is called pass by reference. This terminology long antedates the creation of “references” in C++. In assembly language, when we load the address of some thing into a register to pass it to a function, that was, and is, called pass by reference. The C standard specifies a pointer provides a reference to an entity (C 2018 6.2.5 20). So, when we have a pointer to an object, we have a reference to an object, and when we pass the pointer to a function, we are passing a reference to the object to the function.
Some languages automated pass-by-reference. FORTRAN passes everything by reference except for some special syntax for calling routines outside FORTRAN. However, whether passing-by-reference is implemented as an automatic feature of the programming language, by a programmer manually loading an address in assembly language, or by a programmer manually requesting an address with a language operator such as C’s &, when a reference to an object is passed, then the object is passed by reference.
C++ created a new type that it called a “reference,” but this was a new use of the word. The C++ meaning of “reference” applies to C++ only. It does not change the existing use of that word outside the context of C++. Outside of C++, “reference” has its ordinary English meaning of providing information on another thing.
Regarding your specific question about passing an array in C,
in C, an array argument is automatically converted to the address of its first element, and this address is typically used to access the entire array. So the array is in fact passed by reference. Describing this as an automatic conversion to a pointer is merely documenting the details. The effect is the same: The function is given access to the object the caller designated by providing a reference to it.
Further, any dispute over the meaning of “pass by reference” is merely one about terminology, not about the actual mechanisms used in the computer.
Related
Going over some code (written in the C language) I have stumbled upon the following:
//we have a parameter
uint8_t num_8 = 4;
//we have a function
void doSomething(uint32_t num_32) {....}
//we call the function and passing the parameter
doSomething(num_8);
I have trouble understanding if this a correct function calling or not. Is this a casting or just a bug?
In general, I know that in the C / C++ languages only a copy of the variable is passed into the function, however, I am not sure what is actually happening. Does the compiler allocates 32 bits on the stack (for the passed parameter), zeros all the bits and just after copies the parameter?
Could anyone point me to the resource explaining the mechanics behind the passing of parameter?
As the function declaration includes a parameter type list, the compiler knows the type of the expected parameter. It can then implicitely cast the actual parameter to the expected type. In fact the compiler processes the function call as if it was:
doSomething((uint32_t) num_8);
Old progammers that once used the (not so) good K&R C can remember that pre-ansi C only had identifier lists and the type of parameters was not explicitely declared. In those day, we had to explicitely cast the parameters, and when we forgot, we had no compile time warnings but errors at execution time.
The way the cast and the call is precisely translated in assembly language or machine instruction is just an implementation detail. What matters is that everything must happen as if the compiler had declared a temporary uint32_t variable had coerced the uint8_t value into the temporary and had passed the temporary to the function. But optimizing compilers often use a register to pass a single integer parameter. Depending on the processor, it could zero the accumulator, load the single byte in the low bits and call the function.
Here's the signature of pthread_setschedparam:
#include <pthread.h>
int pthread_setschedparam(pthread_t thread, int policy, const struct sched_param *param);
Will this piece of code result in unexpected behavior:
void schedule(const thread &t, int policy, int priority) {
sched_param params;
params.sched_priority = priority;
pthread_setschedparam(t.native_handle(), policy, ¶ms);
}
It is completely unclear if the scope of params needs to be broader than the function call alone. When I see a function that takes in a pointer, it suggests (to me at least) that it's asking for ownership of it. Is this signature just badly designed? Should "sched_params params" live on the heap? Does it need to outlive the thread to stay valid? Can it be deleted?
I have no idea.
Thanks!
pthread_setschedparam sets the scheduling policy for the given thread. The parameters need not be alive after the call.
If the lifetime of the last argument mattered (as you put, if pthread_setschedparam takes ownership of it), it would have been explicitly documented so. But it's not in POSIX documentation pthread_setschedparam .
The probable reason why it takes a pointer (instead of value) is that it's less expensive to pass a pointer than a struct.
When I see a function that takes in a pointer, it suggests (to me at least) that it's asking for ownership of it.
I don't jump straight there when I see a function that accepts a pointer parameter, and I don't think you should, either. Although it is important to be aware of the possibility, and you do well to look for documentation, there is a variety of reasons for a function to take a pointer parameter, among them:
the function accepts arrays via the parameter. This is surely the most common reason.
the function wants to modify an object specified to it by the caller (via the pointer). This is probably the second most common reason.
the function accepts a pointer to a structure or union of large or potentially-large size to lighten the function-call overhead
the function accepts a pointer to a structure or union because it conforms to interface conventions that accommodate ancient C compilers that did not accept structures and unions as arguments. This was normal for early C compilers, as it's the way the language was originally specified:
[T]he only operations you can perform on a structure are take its address with & and access one of its members. [... Structures] can not be passed to or returned from functions. [...] Pointers to structures do not suffer these limitations[.]
(Kernighan & Ritchie, The C Programming Language, 1st ed., section 6.2)
Standard C does not have those restrictions, but their effect can still be felt in some places.
That the function expects to take (and typically reassign) responsibility for freeing dynamically-allocated space to which the pointer points, or that it otherwise intends to make a copy of the pointer that survives the function's return, are way down the list. If a function intends to do one of those things, then I fully expect its documentation to indicate so in some manner.
Is this signature just badly designed?
No, I think its design is prompted by one or both of the latter two points from my list.
Should "sched_params params" live on the heap?
I would not expect that to be a requirement.
Does it need to outlive the thread to stay valid? Can it be deleted?
I do not think it needs to outlive the thread whose properties are set. In addition to my general interpretation of the interface, I read (weak) support for that position in the wording of the function's POSIX specification:
The pthread_setschedparam() function shall set the scheduling policy
and associated scheduling parameters for the thread whose thread ID is
given by thread to the policy and associated parameters provided in
policy and param, respectively.
(POSIX specification for pthread_setscheduleparam(); emphasis added)
The "provided in" language indicates to me (again, weakly) that the function uses the contents of the pointed-to structure, not the structure itself.
Do gcc's function attributes extensions have any effect when used on the type of function pointers?
The function attributes can be used when declaring the type of function pointers but at least some of them seem to have no effect.
https://stackoverflow.com/a/28740576/1128289
The gcc documentation itself does not address this issue.
Very generally, the C standard basically says that handling an object over a pointer whose type is not aligned with the type of the object itself generates undefined behaviour - There are many exceptions to this general rule, but, apparently, none of them seems to apply to your case.
That means we're moving on very unsafe grounds here, in the first place.
First, you need to distinguish function attributes into two classes:
Function attributes that actually change something in the behavior or location of the function itself like aligned or interrupt- A function that is not attributed that way will not change its inner code once you declare a pointer to it interrupt, for example (the code that is generated for the function would need to dynamically change - for example, a "return from interrupt" instruction replaced by a "return normally" one - depending on along what type of pointer it would have been called). This is obviously not possible.
Function attributes that tell the calling code something about the behaviour that can be expected from the function - like noreturn or malloc, for example. Those attributes sometimes might not modify the function code itself (you'll never know, however...), but rather tell the calling code something about assumptions it can make in order to optimise. These assumptions will affect the calling function only and thus can be triggered by tweaking a pointer (you don't even need a pointer to do that, a modified function prototype should suffice - for the language, that would actually turn out to have the same effect). If you tell the compiler it can make such assumptions and those turn out not to be true, this will, however, lead to all sorts of things go wrong. After all, C makes the programmer responsible for making sure a pointer points to the right type of thing.
There are, however, function attributes that actually restrict the amount of assumptions a calling function would be allowed to make (the most obvious would be malloc that tells calling code it returns yet untyped and uninitialized memory). Those should be relatively safe to use (I do, however, fail to come up with a use case atm)
Without very detailed knowledge on what belongs into (1) or (2) above, and what exactly a function attribute might affect in both called and calling code (which would be very difficult to achieve, because I don't recall to ever have seen that documented in detail), you thus will not be able to decide whether this pointer tweaking actually is possible and what side effects it might generate.
Also, in my opinion, there is not much of a difference between tweaking a pointer to a function and calling an external function with a (deliberately) wrong prototype. This might give you some insight on what you are actually trying to do...
Arrays can be passed as pointer to a function or even as reference. Passing it as reference gives an alias on which sizeof and count operators will also work. This makes pass by reference look superior.
However, pass by pointer seems to be the norm in books. Why? Is there something I particularly need to know about pass by reference for arrays?
Passing by reference means that your function can only accept fixed-size arrays (that's why it knows their size, because the compiler enforces it). Passing by pointer means otherwise. Also, passing by pointer lets you pass nullptr, for better or worse.
I usually use std::vector and like to pass by const reference. That said, if my api may at some point be called by c code, using pass by const pointer may make sense, though you then have to also want to send down the size. If the function may be called with an std::array or a std::vector, you could decide to send down a pointer (and size), or a set of iterators (begin/end).
If we are talking about using std::array, the template argument requires the size of the array. This would mean in a normal function, you'd need a fixed size:
void myfunc( const std::array<int, 5>& mydata ){...}
However, if we do a templated function, templating on size, that is no longer a problem.
template<unsigned int SZ>
void myfunc(const std::array<int, SZ>& mydata) {...}
If we are talking about stack allocated c-style arrays... Good C++ style is to prefer std::array/std::vector to c-style arrays. I would recommend reading C++ Coding Standard by Herb Sutter chapter 77 on page 152 speaks about the subject. When using c-style arrays, sending down the pointer and size is the standard way to go.
Is it guaranteed safe/portable to use the address of a function parameter on a C89/C99-compliant compiler?
As an example, the AAPCS for 32-bit ARM uses registers r0-r3 for parameter passing if the function parameters meet specific size and alignment requirements. I would assume that using the address of a parameter passed through a register would yield unexpected results, but I ran a test on the ARM compiler I'm using and it appears to relocate these parameters to the stack if the code attempts to reference the addresses of these parameter. While it would appear safe in my particular application, I'm wondering if this is guaranteed across architectures (with an ANSI/ISO-compliant compiler) that can utilize registers directly to pass function parameters.
Do the standards define this behavior?
In C, the only lvalues you cannot take addresses of are bitfields (which cannot appear in function parameters) and variables or function parameters of register storage class. It is perfectly safe to take the address of a parameter, but keep in mind that arguments are passed by value, thus you must make sure that you don't use the address of a local variable or parameter once its life time ends.
Generally, the compiler has a pass where it checks which local variables and parameters are operands to unary & operators. These are then copied to a suitable piece of RAM when appropriate. The calling convention does not affect this.