Is it guaranteed safe/portable to use the address of a function parameter on a C89/C99-compliant compiler?
As an example, the AAPCS for 32-bit ARM uses registers r0-r3 for parameter passing if the function parameters meet specific size and alignment requirements. I would assume that using the address of a parameter passed through a register would yield unexpected results, but I ran a test on the ARM compiler I'm using and it appears to relocate these parameters to the stack if the code attempts to reference the addresses of these parameter. While it would appear safe in my particular application, I'm wondering if this is guaranteed across architectures (with an ANSI/ISO-compliant compiler) that can utilize registers directly to pass function parameters.
Do the standards define this behavior?
In C, the only lvalues you cannot take addresses of are bitfields (which cannot appear in function parameters) and variables or function parameters of register storage class. It is perfectly safe to take the address of a parameter, but keep in mind that arguments are passed by value, thus you must make sure that you don't use the address of a local variable or parameter once its life time ends.
Generally, the compiler has a pass where it checks which local variables and parameters are operands to unary & operators. These are then copied to a suitable piece of RAM when appropriate. The calling convention does not affect this.
Related
Specifics matter. Especially when talking about how something works, and even more so when we consider why something works. Currently, as I understand it, EVERYTHING in C is passed by value. NOTHING is passed by reference. Some programmers mention that arrays in C are passed by reference.
But as per my limited understanding,
Even if we pass an array to a function like this void traverse(int arr[4]);, it is actually being taken in as a copy of the pointer variable storing the location in memory of the first element in that array. It is then dereferenced inside the function, but the initial value being passed is actually a local variable. Since memory allocated to arrays in the program stack would be contiguous, the compiler is able to make square bracket notation work as well as pointer arithmetic.
This and passing by reference are not the same thing to me. I would think this is an important distinction.
But on the other hand, we can then just say that everything in computing is passed by value, since something like Java would do the same in a more subtle manner. And it is actually just simulating a pass by reference. Please advise.
At the level of bits in the computer, arguments can only be passed by value. Bits representing some argument are written to a processor register or memory location designated as the place to pass an argument. Passing by reference is a construct built upon passing by value by using an address as the value that is passed. Passing an address can be implemented automatically or manually. Both methods are pass-by-reference.
When we pass some entity by passing its address instead of passing its value directly, that is called pass by reference. This terminology long antedates the creation of “references” in C++. In assembly language, when we load the address of some thing into a register to pass it to a function, that was, and is, called pass by reference. The C standard specifies a pointer provides a reference to an entity (C 2018 6.2.5 20). So, when we have a pointer to an object, we have a reference to an object, and when we pass the pointer to a function, we are passing a reference to the object to the function.
Some languages automated pass-by-reference. FORTRAN passes everything by reference except for some special syntax for calling routines outside FORTRAN. However, whether passing-by-reference is implemented as an automatic feature of the programming language, by a programmer manually loading an address in assembly language, or by a programmer manually requesting an address with a language operator such as C’s &, when a reference to an object is passed, then the object is passed by reference.
C++ created a new type that it called a “reference,” but this was a new use of the word. The C++ meaning of “reference” applies to C++ only. It does not change the existing use of that word outside the context of C++. Outside of C++, “reference” has its ordinary English meaning of providing information on another thing.
Regarding your specific question about passing an array in C,
in C, an array argument is automatically converted to the address of its first element, and this address is typically used to access the entire array. So the array is in fact passed by reference. Describing this as an automatic conversion to a pointer is merely documenting the details. The effect is the same: The function is given access to the object the caller designated by providing a reference to it.
Further, any dispute over the meaning of “pass by reference” is merely one about terminology, not about the actual mechanisms used in the computer.
Going over some code (written in the C language) I have stumbled upon the following:
//we have a parameter
uint8_t num_8 = 4;
//we have a function
void doSomething(uint32_t num_32) {....}
//we call the function and passing the parameter
doSomething(num_8);
I have trouble understanding if this a correct function calling or not. Is this a casting or just a bug?
In general, I know that in the C / C++ languages only a copy of the variable is passed into the function, however, I am not sure what is actually happening. Does the compiler allocates 32 bits on the stack (for the passed parameter), zeros all the bits and just after copies the parameter?
Could anyone point me to the resource explaining the mechanics behind the passing of parameter?
As the function declaration includes a parameter type list, the compiler knows the type of the expected parameter. It can then implicitely cast the actual parameter to the expected type. In fact the compiler processes the function call as if it was:
doSomething((uint32_t) num_8);
Old progammers that once used the (not so) good K&R C can remember that pre-ansi C only had identifier lists and the type of parameters was not explicitely declared. In those day, we had to explicitely cast the parameters, and when we forgot, we had no compile time warnings but errors at execution time.
The way the cast and the call is precisely translated in assembly language or machine instruction is just an implementation detail. What matters is that everything must happen as if the compiler had declared a temporary uint32_t variable had coerced the uint8_t value into the temporary and had passed the temporary to the function. But optimizing compilers often use a register to pass a single integer parameter. Depending on the processor, it could zero the accumulator, load the single byte in the low bits and call the function.
Does the C ABI require arguments to always be passed to functions using only the stack?
ABI or Application Binary Interface covers various details in the contract between pieces of binary code.
(A broad definition) - It defines the mechanisms by which functions are invoked, how parameters are passed between caller and callee, how return values are provided to callers, how libraries are implemented, and how programs are loaded into memory.
(Specifically) the calling convention, which controls how functions' arguments are passed and return values retrieved; for example, whether all parameters are passed on the stack or some are passed in registers, which registers are used for which function parameters, and whether the first function parameter passed on the stack is pushed first or last onto the stack.
A live example - refer to calling convention mentioned by ARM ABI Procedure Call Standard for ARM Architecture - you may refer to section on Stack (end of page 16) - The stack is a contiguous area of memory that may be used for storage of local variables and for passing
additional arguments to subroutines when there are insufficient argument registers available
Yes, arguments to functions are always passed using the stack. This is why if you are going to be passing something large, it is advisable to pass a pointer to avoid a stack overflow.
I am working on some legacy C code. The original code was written in the mid-90s, targeting Solaris and Sun's C compiler of that era. The current version compiles under GCC 4 (albeit with many warnings), and it seems to work, but I'm trying to tidy it up -- I want to squeeze out as many latent bugs as possible as I determine what may be necessary to adapt it to 64-bit platforms, and to compilers other than the one it was built for.
One of my main activities in this regard has been to ensure that all functions have full prototypes (which many did not have), and in that context I discovered some code that calls a function (previously un-prototyped) with fewer arguments than the function definition declares. The function implementation does use the value of the missing argument.
Example:
impl.c:
int foo(int one, int two) {
if (two) {
return one;
} else {
return one + 1;
}
}
client1.c:
extern foo();
int bar() {
/* only one argument(!): */
return foo(42);
}
client2.c:
extern int foo();
int (*foop)() = foo;
int baz() {
/* calls the same function as does bar(), but with two arguments: */
return (*foop)(17, 23);
}
Questions: is the result of a function call with missing arguments defined? If so, what value will the function receive for the unspecified argument? Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a predictable implementation-specific behavior that I can emulate by adding a particular argument value to the affected calls?
EDIT: I found a stack thread C function with no parameters behavior which gives a very succinct and specific, accurate answer. PMG's comment at the end of the answer taks about UB. Below were my original thoughts, which I think are along the same lines and explain why the behaviour is UB..
Questions: is the result of a function call with missing arguments defined?
I would say no... The reason being is that I think the function will operate as-if it had the second parameter, but as explained below, that second parameter could just be junk.
If so, what value will the function receive for the unspecified argument?
I think the values received are undefined. This is why you could have UB.
There are two general ways of parameter passing that I'm aware of... (Wikipedia has a good page on calling conventions)
Pass by register. I.e., the ABI (Application Binary Interface) for the plat form will say that registers x & y for example are for passing in parameters, and any more above that get passed via stack...
Everything gets passed via stack...
Thus when you give one module a definition of the function with "...unspecified (but not variable) number of parameters..." (the extern def), it will not place as many parameters as you give it (in this case 1) in either the registers or stack location that the real function will look in to get the parameter values. Therefore the second area for the second parameter, which is missed out, essentially contains random junk.
EDIT: Based on the other stack thread I found, I would ammended the above to say that the extern declared a function with no parameters to a declared a function with "unspecified (but not variable) number of parameters".
When the program jumps to the function, that function assumes the parameter passing mechanism has been correctly obeyed, so either looks in registers or the stack and uses whatever values it finds... asumming them to be correct.
Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a >> predictable implementation-specific behavior
You'd have to check your compiler documentation. I doubt it... the extern definition would be trusted completely so I doubt the registers or stack, depending on parameter passing mechanism, would get correctly initialised...
If the number or the types of arguments (after default argument promotions) do not match the ones used in the actual function definition, the behavior is undefined.
What will happen in practice depends on the implementation. The values of missing parameters will not be meaningfully defined (assuming the attempt to access missing arguments will not segfault), i.e. they will hold unpredictable and possibly unstable values.
Whether the program will survive such incorrect calls will also depend on the calling convention. A "classic" C calling convention, in which the caller is responsible for placing the parameters into the stack and removing them from there, will be less crash-prone in presence of such errors. The same can be said about calls that use CPU registers to pass arguments. Meanwhile, a calling convention in which the function itself is responsible for cleaning the stack will crash almost immediately.
It is very unlikely the bar function ever in the past would give consistent results. The only thing I can imagine is that it is always called on fresh stack space and the stack space was cleared upon startup of the process, in which case the second parameter would be 0. Or the difference between between returning one and one+1 didn't make a big difference in the bigger scope of the application.
If it really is like you depict in your example, then you are looking at a big fat bug. In the distant past there was a coding style where vararg functions were implemented by specifying more parameters than passed, but just as with modern varargs you should not access any parameters not actually passed.
I assume that this code was compiled and run on the Sun SPARC architecture. According to this ancient SPARC web page: "registers %o0-%o5 are used for the first six parameters passed to a procedure."
In your example with a function expecting two parameters, with the second parameter not specified at the call site, it is likely that register %01 always happened to have a sensible value when the call was made.
If you have access to the original executable and can disassemble the code around the incorrect call site, you might be able to deduce what value %o1 had when the call was made. Or you might try running the original executable on a SPARC emulator, like QEMU. In any case this won't be a trivial task!
While reading from a site a read that you can not make a global variable of type register.Why is it so?
source:
http://publib.boulder.ibm.com/infocenter/lnxpcomp/v8v101/index.jsp?topic=/com.ibm.xlcpp8l.doc/language/ref/regdef.htm
In theory, you could allocate a processor register to a global scope variable - that register would simply have to remain allocated to that variable for the whole life of the program.
However, C compilers don't generally get to see the entire program during the compile phase - the C standard was written so that each translation unit (roughly corresponding to each .c file) could be compiled independently of the others (with the compiled objects later linked into a program). This is why global scope register variables aren't allowed - when the compiler is compiling b.c, it has no way to know that there was a global variable allocated to a register in a.c (and that therefore functions in b.c must preserve the value in that register).
Actually, GCC allows this. A declaration in global scope in the form:
register int foo asm ("r12");
Allocates the register "r12" (on x86_64) for the global "foo". This has a number of limitations and the corresponding manual page is probably the best reference to all the hassle global register variables would make:
http://gcc.gnu.org/onlinedocs/gcc/Explicit-Reg-Vars.html
Because it would be senseless. Global variables exist all the time the application is working. There surely is no free processor register for such a long time ;)
The register keyword has a different meaning than what its name seems to indicate, nowadays it has not much to do with a register of the processing environment. (Although it probably once was chosen for this.) The only text that constrains the use of a variable that is declared with register is this
The operand of the unary & operator
shall be either a function designator,
the result of a [] or unary *
operator, or an lvalue that designates
an object that is not a bit-field and
is not declared with the register
storage-class specifier
So it implements a restriction to automatic variables (those that you declare in a function) such that it is an error to take the address of such a variable. The idea then is that the compiler may represent this variable in whatever way pleases, as a register or as an immediate assembler value etc. You as a programmer promise that you wouldn't take an address of it. Usually this makes not much sense for global variables (they have an address, anyhow).
To summarize:
No, the register keyword is not
ignored.
Yes, it can only be used for stack
variables if you want to be standard conformant
Originally, register variables were meant to be stored in processor registers, but global variables have to be stored in the data or the BSS section to be accessible from every function. Today, compilers don't interpret the register storage class strictly, so it remains largely for compatibility reasons.
The register word is used in C/C++ as request to the compiler to use registers of processor like variables. A register is a sort of variable used by CPU, very very fast in access because it isn't located in memory (RAM). The use of a register is limited by the architecture and the size of the register itself (this mean that some could be just like memory pointers, other to load special debug values and so on).
The calling conventions used by C/C++ doesn't use general registers (EAX, EBX and so on in 80x86 Arch) to save parameters (But the returned value is stored in EAX), so you could declare a var like register making code faster.
If you ask to make it global you ask to reserve the register for all the code and all your source. This is impossible, so compiler give you an error, or simply make it a usual var stored in memory.
Some compilers provide a means of dedicating a register permanently to a variable. The register keyword, however, is insufficient. A compiler's decision to allocate the local variables for a routine in registers generally does not require coordination with anything in other source modules (while some development systems do register optimization between routines, it's far more common to simply define the calling convention so that all routines are allowed to freely alter certain registers (so a caller is responsible for saving the contents if they're needed after the function call) but must not alter others (so the called routine is responsible for saving and restoring the contents if the registers are needed in the function). Thus, a linker doesn't need to concern itself with register usage.
Such an approach is fine for local register variables, but useless for global ones. For global register variables to be useful, the programmer must generally tell the compiler which register is to be used for what variable, and make sure that such reservations are known to the compiler when compiling all modules--even those that don't use the register otherwise. This can be useful in embedded systems, especially with variables that are used by interrupts, but there's usually a very limited number (e.g. 2 or so) of such variables allowed in a system.
So do we all agree now? Do we all see that making a global variable a register variable would be a really, really bad idea? If the original C definition did not forbid it, it was probably because nobody thought anyone would actually implement it that way -- as they should not have especially back in CISC days.
Besides: modern optimizing compilers do a better job of deciding when to keep variables in registers than humans can do. If yours can't do it, then you really, REALLY need to get a better compiler.
Because they're in registers. It's a contradiction in terms.