I have something like the folowing C code:
struct MyStruct MyFunction (something here)
{
struct MyStruct data;
//Some code here
return data;
}
would the returning value be a reference or a copy of the memory block for data?
Should MyFunction return struct MyStruct* (with the corresponding memory allocation) instead of struct MyStruct?
There is no such thing as a reference in C. So semantically speaking, you are returning a copy of the struct. However, the compiler may optimise this.
You cannot return the address of a local variable, as it goes out of scope when the function returns. You could return the address of something that you've just malloc-ed, but you'll need to make it clear that someone will need to free that pointer at some point.
It would return a copy. C is a pass-by-value language. Unless you specify that you are passing pointers around, structures get copied for assignments, return statements, and when used as function parameters.
It is returned as copy. BTW, you should not return it's reference because it has automatic storage duration ( i.e., data resides on stack ).
I have had problems with this (a function in a DLL returning a struct) and have investigated it. Returning a struct from a DLL to be used by people who might have a different compiler is not good practice, because of the following.
How this works depends on the implementation. Some implementations return small records in registers, but most get an invisible extra argument that points to a result struct in the local frame of the caller. On return, the pointer is used to copy data to the struct in the local frame of the caller. How this pointer is passed depends on the implementation again: as last argument, as first argument or as register.
As others said, returning references is not a good idea, as the struct you return might be in your local frame. I prefer functions that do not return such structs at all, but take a pointer to one and fill it up from inside the function.
Related
[Note: This is reposted from https://softwareengineering.stackexchange.com/q/369604/126197, where for some reason this question got immediately downvoted. Twice. Clearly there's more love over here!]
Here's a bit of code paraphrased from a vendor's example.
I've looked for authoritative documentation on passing stack-allocated structures by value, but haven't found the definitive word. In a nutshell: Does C99 guarantee this to be safe?
typedef struct {
int32_t upper;
int32_t lower;
} boundaries_t;
static boundaries_t calibrate() {
boundaries_t boundaries; // struct allocated on stack
boundaries.upper = getUpper();
boundaries.lower = getLower();
return boundaries; // return struct by value
}
int main() {
boundaries_t b;
b = calibrate();
// do stuff with b
...
}
Note that calibrate() allocates the boundaries struct on the stack and then returns it by value.
If the compiler can guarantee that the stack frame for calibrate() will be intact at the time of the assignment to b, then all is well. Perhaps that's part of the contract in C99's pass-by-value?
(Context: my world is embedded systems where pass-by-value is rarely seen. I do know that returning a pointer from a stack-allocated structure is a recipe for disaster, but this pass-by-value stuff feels alien.)
Yes, it's perfectly safe. When you return by value it copies the members of the structure into the caller's structure. As long as the structure doesn't contain any pointers to local objects, it's valid.
Returning structures tends to be uncommon, because if they're large it requires lots of copying. But sometimes we put arrays into structures to allow them to be passed and returned by value (arrays normally decay to pointers when used as parameters or return values) like other data types.
addendum by original asker
(I trust #Barmar won't mind...)
As #DanielH pointed out, in the case of SysV ABI for amd64, the compiler will make provisions for returning the struct by value. If it's small, the entire struct can be returned in a register (read: fast). If it's larger, the compiler allocates room in the caller's stack frame and passes a pointer to the callee. The callee then copies the value of the struct into that upon return. From the doc:
If the type has class MEMORY, then the caller provides space for the
return value and passes the address of this storage in %rdi as if it
were the first argument to the function. In effect, this address
becomes a “hidden” first argument.
b = calibrate();
// do stuff with b
is well behaved.
boundaries_t contains only integral types as members. Passing it by value and using the object it is assigned to in the function call is perfectly safe.
I dont have a link to a C99 reference, but what caught my eye was the struct assignment.
Assign one struct to another in C
It's basically Barmar's response.
Generally it is preferred to pass pointer to structure to a function in C, in order to avoid copying during function call. This has an unwanted side effect that the called function can modify the elements of the structure inadvertently. What is a good programming practice to avoid such errors without compromising on the efficiency of the function call ?
Pass a pointer-to-const is the obvious answer
void foo(const struct some_struct *p)
That will prevent you from modifying the immediate members of the struct inadvertently. That's what const is for.
In fact, your question sounds like a copy-paste from some quiz card, with const being the expected answer.
In general, when it comes to simple optimizations like what you've described, it is often preferable to use a pointer-to-struct rather than passing a struct itself, as passing a whole struct means more overhead from extra data being copied onto the call stack.
The example below is a fairly common approach:
#include <errno.h>
typedef struct myStruct {
int i;
char c;
} myStruct_t;
int myFunc(myStruct_t* pStruct) {
if (!pStruct) {
return EINVAL;
}
// Do some stuff
return 0;
}
If you want to avoid modifying the data passed to the function, just make sure that the data is immutable by modifying the function prototype.
int myFunc(const myStruct_t* pStruct)
You will also benefit from reading up on "const correctness".
A very common idiom, particularly in unix/posix style system code is to have the caller allocate a struct, and pass a pointer to that struct through the function call.
This is a little different than what I think your asking about where you are passing data into a function with a struct (where as others have mention you may the function to treat the struct as const). In these cases, the struct is empty (or only partially full) before the function call. The caller will do something like allocate an empty struct and then passes a pointer to this struct. Probably different than your general question, but relevant to the discussion I think.
This accomplishes a couple handy things. It avoids copying a possibly large structure, also it lets the caller fill in some fields and the callee to fill out other (giving an effective shared space for communication).
The most important aspect to this idiom is that the caller has full control over the allocation of the struct. It can have it on the stack, heap, reuse the same one repeatedly, but where it comes from the caller is responsible for the handling the memory.
This is one of the problems with passing around struct pointers; you can easily lose track of who allocated the struct and whose responsibility it is to free it. This idiom gives you the advantage of not having to copy the struct around, while making it clear who has the job of free'ing the memory is.
Something's wrong with the following function:
typedef struct Data1{
float result;
struct Data1* next;
} Data;
Data* f(Data* info){
Data item;
item.result=info->result;
item.next=info->next;
return &item;
}
I notice two things here:
The returned value is a pointer of local value. However it's still a pointer- the compiler gives a warning: function returns address of local variable. but would it really be a problem? ( I don't return a local value itself)
I believe that the main problem here is that this function suppose to copy the Data struct. it would be OK for the results value, but regarding the 'next' pointers, I believe that at the end of the call to the function the pointers would not be changed, Am I correct? It's like equalize two ints in a outside function, should *(item.next)=*(info->next); solve the problem?
So what's the main problem here? is it both 1 and 2?
The returned value is a pointer of local value. However it's still a pointer- the compiler gives a warning: function returns address of local variable. but would it really be a problem? ( I don't return a local value itself)
That is the main problem. After the function returns, the local variable doesn't exist anymore. The space it occupied may be overwritten immediately or later, but you can't count on ever reading meaningful data from that address.
If you want to copy things, you have to return a pointer to malloced memory.
Data* f(Data* info){
Data *item = malloc(sizeof *item);
item->result=info->result;
item->next=info->next;
return item;
}
But that has the drawback that now the caller has to free the memory allocated by f, so
Data* f(Data* info, Data* item){
item->result=info->result;
item->next=info->next;
return item;
}
with a pointer allocated by the caller.
The problem with returning pointers to local variables is that the space the local variables occupies will be reclaimed when the function returns, so the pointer no longer points to valid memory, or even memory used by other functions called later.
Yes, it would be a problem, since the returned pointer is useless: it's pointing at an object which no longer exists. Hence the warning.
Not sure I follow your reasoning here ... You are not changing anything in the Data passed in, so that's a problem if you expected it to.
1) Yes, it's a problem, because your pointer now points to what used to be on your stack, but is no longer managed memory, which means another function call (or an interrupt) will, with almost 100% certainty, begin mangling that memory.
2) I have no idea what you're asking here.
The main problem is that you're unclear on how memory works in C programs, which leads to constructs like this; not a ding, just an honest observation: http://www.geeksforgeeks.org/archives/14268 gives a relatively good overview and should serve you well.
I know C pretty well, however I'm confused of how temporary storage works.
Like when a function returns, all the allocation happened inside that function is freed (from the stack or however the implementation decides to do this).
For example:
void f() {
int a = 5;
} // a's value doesn't exist anymore
However we can use the return keyword to transfer some data to the outside world:
int f() {
int a = 5;
return a;
} // a's value exists because it's transfered to the outside world
Please stop me if any of this is wrong.
Now here's the weird thing, when you do this with arrays, it doesn't work.
int []f() {
int a[1] = {5};
return a;
} // a's value doesn't exist. WHY?
I know arrays are only accessible by pointers, and you can't pass arrays around like another data structure without using pointers. Is this the reason you can't return arrays and use them in the outside world? Because they're only accessible by pointers?
I know I could be using dynamic allocation to keep the data to the outside world, but my question is about temporary allocation.
Thanks!
When you return something, its value is copied. a does not exist outside the function in your second example; it's value does. (It exists as an rvalue.)
In your last example, you implicitly convert the array a to an int*, and that copy is returned. a's lifetime ends, and you're pointing at garbage.
No variable lives outside its scope, ever.
In the first example the data is copied and returned to the calling function, however the second returns a pointer so the pointer is copied and returned, however the data that is pointed to is cleaned up.
In implementations of C I use (primarily for embedded 8/16-bit microcontrollers), space is allocated for the return value in the stack when the function is called.
Before calling the function, assume the stack is this (the lines could represent various lengths, but are all fixed):
[whatever]
...
When the routine is called (e.g. sometype myFunc(arg1,arg2)), C throws the parameters for the function (arguments and space for the return value, which are all of fixed length) on to the stack, followed by the return address to continue code execution from, and possibly backs up some processor registers.
[myFunc local variables...]
[return address after myFunc is done]
[myFunc argument 1]
[myFunc argument 2]
[myFunc return value]
[whatever]
...
By the time the function fully completes and returns to the code it was called from, all of it's variables have been deallocated off the stack (they might still be there in theory, but there is no guarantee)
In any case, in order to return the array, you would need to allocate space for it somewhere else, then return the address to the 0th element.
Some compilers will store return values in temporary registers of the processor rather than using the stack, but it's rare (only seen it on some AVR compilers).
When you attempt to return a locally allocated array like that, the calling function gets a pointer to where the array used to live on the stack. This can make for some spectacularly gruesome crashes, when later on, something else writes to the array, and clobbers a stack frame .. which may not manifest itself until much later, if the corrupted frame is deep in the calling sequence. The maddening this with debugging this type of error is that real error (returning a local array) can make some other, absolutely perfect function blow up.
You still return a memory address, you can try to check its value, but the contents its pointing are not valid beyond the scope of function,so dont confuse value with reference.
int []f() {
int a[1] = {5};
return a;
} // a's value doesn't exist. WHY?
First, the compiler wouldn't know what size of array to return. I just got syntax errors when I used your code, but with a typedef I was able to get an error that said that functions can't return arrays, which I knew.
typedef int ia[1];
ia h(void) {
ia a = 5;
return a;
}
Secondly, you can't do that anyway. You also can't do
int a[1] = {4};
int b[1];
b = a; // Here both a and b are interpreted as pointer literals or pointer arithmatic
While you don't write it out like that, and the compiler really wouldn't even have to generate any code for it this operation would have to happen semantically for this to be possible so that a new variable name could be used to refer the value that was returned by the function. If you enclosed it in a struct then the compiler would be just fine with copying the data.
Also, outside of the declaration and sizeof statements (and possibly typeof operations if the compiler has that extension) whenever an array name appears in code it is thought of by the compiler as either a pointer literal or as a chunk of pointer arithmetic that results in a pointer. This means that the return statement would end looking like you were returning the wrong type -- a pointer rather than an array.
If you want to know why this can't be done -- it just can't. A compiler could implicitly think about the array as though it were in a struct and make it happen, but that's just not how the C standard says it is to be done.
void my_cool_function()
{
obj_scene_data scene;
obj_scene_data *scene_ptr = &scene;
parse_obj_scene(scene_ptr, "test.txt");
}
Why would I ever create a pointer to a local variable as above if I can just do
void my_cool_function()
{
obj_scene_data scene;
parse_obj_scene(&scene, "test.txt");
}
Just in case it's relevant:
int parse_obj_scene(obj_scene_data *data_out, char *filename);
In the specific code you linked, there isn't really a reason.
It could be functionally necessary if you have a function taking an obj_scene_data **. You can't do &&scene, so you'd have to create a local variable before passing the address on.
Yes absolutely you can do this for many reasons.
For example if you want to iterate over the members of a stack allocated array via a pointer.
Or in other cases if you want to point sometimes to one memory address and other times to another memory address. You can setup a pointer to point to one or the other via an if statement and then later use your common code all within the same scope.
Typically in these cases your pointer variable goes out of scope at the same time as your stack allocated memory goes out of scope. There is no harm if you use your pointer within the same scope.
In your exact example there is no good reason to do it.
If the function accepts a NULL pointer as input, and you want to decide whether to pass NULL based on some condition, then a pointer to a stack variable is useful to avoid having to call the same function in separate code paths, especially if the rest of the parameters are the same otherwise. For example, instead of this:
void my_function()
{
obj_data obj = {0};
if( some condition )
other_function(&scene, "test.txt");
else
other_function(NULL, "test.txt");
}
You could do this:
void my_function()
{
obj_data obj = {0};
obj_data *obj_ptr = (condition is true) ? &obj : NULL;
other_function(obj_ptr, "test.txt");
}
If parse_obj_scene() is a function there may be no good reason to create a separate pointer. But if for some unholy reason it is a macro it may be necessary to reassign the value to the pointer to iterate over the subject data.
Not in terms of semantics, and in fact there is a more general point that you can replace all local variables with function calls with no change in semantics, and given suitable compiler optimisations, equal efficiency. (see section 2.3 of "Lambda: The Ultimate Imperative".)
But the point of writing code to communicate with the next person to maintain it, and in an imperative language without tail call optimisation, it is usual to use local variables for things which are iterated over, for automatic structures, and to simplify expressions. So if it makes the code more readable, then use it.