Preventing GCC from merging variables in braced groups - c

Edit:
Apparently accessing variables inside braced groups after they end is undefined behaviour. Since I don't want to use dynamic allocation for nodes (as suggested by #dbush, #ikegami) I assume the next best way to keep hidden variables (within a function) is generating unique variable names for the nodes (with __LINE__) and 'declaring' without the use of a braced group. The code now reads something like
#define PASTE_(x, y) x ## y
#define PASTE(x, y) PASTE_(x, y)
#define LT_mark_(LABEL, NAME, DELETE)\
struct LifeTime LABEL ={\
.delete=DELETE,\
.prev=lt_head,\
.ref=NAME\
};\
\
lt_head = &LABEL;\
#define LT_mark(NAME, DELETE) LT_mark_(PASTE(lt_, __LINE__), NAME, DELETE)
/Edit
I'm trying to keep records for memory allocated within a function's scope.
Records are kept by a LifeTime structure, which form a linked list. This list is later traversed when returning from said function, in order to automatically free the memory. The lt_head variable is used to keep track of the current head of the list.
struct LifeTime {
void (*delete)(void*);
struct LifeTime *prev;
void *ref;
};
#define LT_mark(NAME, DELETE)\
{\
struct LifeTime _ ={\
.delete=DELETE,\
.prev=lt_head,\
.ref=NAME\
};\
\
lt_head = &_;\
}
int example (){
struct LifeTime *lt_head = NULL;
char *s = malloc(64); LT_mark(s, free);
char *s2 = malloc(64); LT_mark(s2, free);
...
}
Using this code, the temporary variables (named _) within the braced groups created by the LT_mark macro, are created with the same memory address.
I assume the reason for this is, as stated in the answer to this question: In C, do braces act as a stack frame?
that variables with non-overlapping usage lifetimes may be merged if the compiler deems it appropriate.
Is there any way to override this behaviour? I acknowledge it may be impossible (I am using GCC without any optimization flags, so I can't simply remove them), but the actual code I am working with requires that the variables inside these groups are kept afterwards, though hidden from visibility (as braced groups do usually). I considered using __attribute__((used)) but apparently this is only valid for functions and global variables.

The lifetime of a variable is that of its enclosing scope, so when that scope ends the variable no longer exits. Saving the address of that variable and attempting to use it when its lifetime has ended causes undefined behavior.
For example:
int *p;
{
int i=4;
p=&i;
printf("*p=%d\n", *p); // prints *p=4
}
printf("*p=%d\n", *p); // undefined behavior, p points to invalid memory
Inside of the braces, p points to valid memory and can be dereferenced. Outside of the braces p cannot be safely defererenced.
You'll need to do some dynamic allocation to create these structures. Also, this isn't a place where you should be using a macro instead of a function:
void LT_mark(void *p, void (*cleanup)(void *))
{
struct LifeTime *l = malloc(sizeof *l);
l->delete = cleanup;
l->prev = lt_head;
l->ref = p;
lt_head = l;
}
And similarly the cleanup function:
void LT_clean()
{
struct LiftTime *p;
while (lt_head) {
lt_head->delete(lt_head->ref);
p = lt_head->prev;
lt_head = lt_head->prev;
free(p);
}
}
Also, the prev field should be renamed to next, as the existing name is misleading.

Under most circumstances, you'll want to use #dbush's dynamic allocation solution. Since you're presumably using this with dynamic memory allocations of some kind anyway, dynamically allocating the descriptor blocks shouldn't be a huge overhead.
However, under some really restricted circumstances which you will have to police yourself, and assuming that you're not using an antediluvian version of the C compiler, it is possible to do this fairly simply with compound literals. Aside from the C compiler version limitation (C99 or better, which shouldn't be a huge burden), this will work in exactly the same circumstances as your edit#1 using token concatenation to generate a unique name: that is, if no use of the LT_mark macro is inside a braced-block subordinate to the function.
The reason for this restriction -- which, as I said, applies also to your solution with token concatenation -- is that the lifetime of automatic allocations terminates when control exits from the block in which they were declared. This is an essential aspect of C (and many other programming languages), so it's important to be clear about how it works.
Here's a simple example:
int example (){
struct LifeTime *lt_head = NULL;
char *s = malloc(64); LT_mark(s, free);
for (int i = 0; i < 4; ++i) {
/* InnerBlock */
char *s2 = malloc(64); LT_mark(s2, free);
....
}
/* Lifetime of all variables declared in InnerBlock expires */
....
/* If lt_head points to a struct automatically allocated inside
* InnerBlock, it is now a dangling pointer and cannot be used.
* The next statement is Undefined Behaviour.
*/
freeTheMallocs(lt_head);
}
Note that the problem is not that the inner block is executed more than once (although that will probably guarantee that you notice the problem). The same thing would happen had I written it as a conditional:
int example (int flag){
struct LifeTime *lt_head = NULL;
char *s = malloc(64); LT_mark(s, free);
if (flag) {
/* InnerBlock */
char *s2 = malloc(64); LT_mark(s2, free);
....
}
/* Lifetime of all variables declared in InnerBlock expires */
....
freeTheMallocs(lt_head); /* Dangling pointer */
}
The above cannot work with automatic allocation of descriptor blocks (but it will work fine with dynamic allocation).
OK, so what happens if you absolutely promise to only use LT_mark in the outermost block of your function, as with your original example:
int example (){
struct LifeTime *lt_head = NULL;
char *s = malloc(64); LT_mark(s, free);
char *s2 = malloc(64); LT_mark(s2, free);
freeTheMallocs(lt_head);
}
That will work. Your only problem is how to enforce the restriction, including on all the maintenance programmers who will modify the code after you leave the project, and may not have the foggiest idea of why they're not allowed to nest LT_mark inside a block (or even know that they're not allowed to do that).
But if you like playing with fire, you can do it like this:
#define LT_mark(NAME, DELETE) \
lt_head = &(struct LifeTime){ \
.delete=DELETE, \
.prev=lt_head, \
.ref=NAME \
}
This will work, in the limited set of cases in which it does work, because the the compound literal created by the macro "has automatic storage duration associated with the enclosing block." (§6.5.2.5/5).
Honestly, I sincerely hope you don't use the above code. I contribute this answer mostly in the hopes that it provides some kind of explanation of the importance of understanding lifetimes.

Related

how to allocate a struct outside the main()?

I have built a book struct that looks like this:
typedef struct _book{
char name[NAME_LENGTH];
char authors[AUTHORS_NAME_LENGTH];
char publisher[PUBLISHER_NAME_LENGTH];
char genre[GENRE_LENGTH];
int year;
int num_pages;
int copies;
}book;
i'm trying to define a library which is an array of books, so that later on i could deposit books in the library with another function.
had problems with memory write/read when defined the library like this library[BOOK_NUM], so i decided to allocate.
the thing is, it only lets my allocate inside the main function.
when i write this line:
book *library = (book*)malloc(BOOK_NUM*sizeof(book));
outside the main() it gives me an error:
IntelliSense: function call is not allowed in a constant expression
error C2099: initializer is not a constant
but if i move the above line to be inside main() it works. why is that?
also, what is the better way to define the array so that i could change it later with other functions?
You might declare a global or static variable, assuming BOOK_NUM is some #define-d constant (e.g. #define BOOK_NUM 100 somewhere before in your code):
book library[BOOK_NUM];
However, heap allocation is generally preferable, because the resource usage is limited at runtime, not at compile-time or start of execution time.
If BOOK_NUM was extremely big (eg a billion) you could have an issue (program won't be runnable because of lack of memory).
If BOOK_NUM was slightly small (e.g. a dozen) you could have an issue in running some cases (not enough space for books).
If you (wrongly!) declared book library[BOOK_NUM]; as some local variable (e.g. in main), the call frame should be small enough (because the entire call stack is limited to a few mega-bytes, so individual call frames should not exceed a few kilobytes) so BOOK_NUM should be kept small (a few dozens at most).
To quote the GNU coding standards:
4.2 Writing Robust Programs
Avoid arbitrary limits on the length or number of any data structure, including file names, lines, files, and symbols, by allocating all data structures dynamically
So a better way could be to have:
typedef struct book_st {
char* name;
char* authors;
char* publisher;
char* genre;
int year;
int num_pages;
int copies;
} book;
then a "making function" (or "constructing" function) like
/* returns a freshly allocated book to be deleted by delete_book;
the strings arguments should be not null and are duplicated. */
book* make_book(const char*n, const char*a, const char*p,
const char*g, int y, int np, int c) {
assert (n != NULL);
assert (a != NULL);
assert (p != NULL);
assert (g != NULL);
book* b = malloc(sizeof(book));
if (!b) { perror("malloc book"); exit(EXIT_FAILURE); };
memset (b, 0, sizeof(book)); // useless, but safe
char* pname = strdup(n);
if (!pname) { perror("strdup name"); exit(EXIT_FAILURE); };
char* pauth = strdup(a);
if (!pauth) { perror("strdup author"); exit(EXIT_FAILURE); };
char *ppub = strdup(p);
if (!ppub) { perror("strdup publisher"); exit(EXIT_FAILURE); };
char *pgenre = strdup(g);
if (!pgenre) { perror("strdup genre"); exit(EXIT_FAILURE); };
b->name = pname;
b->authors = pauth;
b->publishers = ppub;
b->genre = pgenre;
b->year = y;
b->num_pages = np;
b->copies = c;
return b;
}
Notice that every call to malloc should be tested, because malloc could fail. Here I just exit with some error message; in some cases you would want to recover from malloc failure (e.g. a server might want to continue processing future requests), but that is boringly difficult (you might need to free any unseless malloc-ed pointer so far, etc...).
Of course, you need a destroying or deleting function to release memory, like:
/* destroy and free a book obtained by make_book */
void delete_book(book*b) {
if (!b) return;
free (b->name), b->name = NULL;
free (b->authors), b->authors = NULL;
free (b->publisher), b->publisher = NULL;
free (b->genre), b->genre = NULL;
free (b);
}
Notice my defensive programming style. I am clearing the malloc-ed book pointer before filling it; I am setting to NULL every pointer field in book just after free-ing it. In principle both are useless.
BTW, you could make your library a struct ending with a flexible array member:
struct library_st {
unsigned size; // allocate size
unsigned nbbooks; // actual number of books
book* books[]; // actually, size slots
};
and have functions like struct library_st*make_library(unsigned s); and struct library_st*add_book(struct library_st*lib, book*book); which would return perhaps an updated and reallocated library.
The main thing in C is to document the memory allocation discipline. Every function should say (at least in a comment) who is in charge of freeing pointers and how.
Read much more (at least for concepts and terminology) about virtual address space, C dynamic memory allocation, memory leaks, garbage collection. Notice that reference counting is not a silver bullet.
Consider using Linux as your primary development environment on your laptop. It has good tools (gcc -Wall -g -fsanitize=address with a recent GCC, gdb, valgrind, Boehm's conservative GC ...) and lots of free software whose source code is worth studying to learn more about C programming.
BTW, to store your library on the disk, consider serialization techniques (and textual formats à la JSON), or perhaps sqlite or some real database (PostGreSQL, MongoDB, ...)
You can only call malloc inside a function. main () is a function. You can write other functions. You can't just declare a global variable and initialise it by calling a function.

Smart Pointers in a language that compiles to C

I'm writing a simple language that compiles to C, and I want to implement smart pointers. I need a bit of help with that though, as I can't seem to think of how I would go around it, or if it's even possible. My current idea is to free the pointer when it goes out of scope, the compiler would handle inserting the frees. This leads to my questions:
How would I tell when a pointer has gone out of scope?
Is this even possible?
The compiler is written in C, and compiles to C. I thought that I could check when the pointer goes out of scope at compile-time, and insert a free into the generated code for the pointer, i.e:
// generated C code.
int main() {
int *x = malloc(sizeof(*x));
*x = 5;
free(x); // inserted by the compiler
}
The scoping rules (in my language) are exactly the same as C.
My current setup is your standard compiler, first it lexes the file contents, then it parses the token stream, semantically analyzes it, and then generates code to C. The parser is a recursive descent parser. I would like to avoid something that happens on execution, i.e. I want it to be a compile-time check that has little to no overhead, and isn't full blown garbage collection.
For functions, each { starts a new scope, and each } closes the corresponding scope. When a } is reached, the variables inside that block go out-of-scope. Members of structs go out of scope when the struct instance goes out of scope. There's a couple exceptions, such as temporary objects go out-of-scope at the next ;, and compilers silently put for loops inside their own block scope.
struct thing {
int member;
};
int foo;
int main() {
thing a;
{
int b = 3;
for(int c=0; c<b; ++c) {
int d = rand(); //the return value of rand goes out of scope after assignment
} //d and c go out of scope here
} //b goes out of scope here
}//a and its members go out of scope here
//globals like foo go out-of-scope after main ends
C++ tries really hard to destroy objects in the opposite order they're constructed, you should probably do that in your language too.
(This is all from my knowledge of C++, so it might be slightly different from C, but I don't think it is)
As for memory, you'll probably want to do a little magic behind the scenes. Whenever the user mallocs memory, you replace it with something that allocates more memory, and "hide" a reference count in the extra space. It's easiest to do that at the beginning of the allocation, and to keep alignment guarantees, you use something akin to this:
typedef union {
long double f;
void* v;
char* c;
unsigned long long l;
} bad_alignment;
void* ref_count_malloc(int bytes)
{
void* p = malloc(bytes + sizeof(bad_alignment)); //does C have sizeof?
int* ref_count = p;
*ref_count = 1; //now is 1 pointer pointing at this block
return p + sizeof(bad_alignment);
}
When they copy a pointer, you silently add something akin to this before the copy
void copy_pointer(void* from, void* to) {
if (from != NULL)
ref_count_free(free); //no longer points at previous block
bad_alignment* ref_count = to-sizeof(bad_alignment);
++*ref_count; //one additional pointing at this block
}
And when they free or a pointer goes out of scope, you add/replace the call with something like this:
void ref_count_free(void* ptr) {
if(ptr) {
bad_alignment* ref_count = ptr-sizeof(bad_alignment);
if (--*ref_count == 0) //if no more pointing at this block
free(ptr);
}
}
If you have threads, you'll have to add locks to all that. My C is rusty and the code is untested, so do a lot of research on these concepts.
The problem is slightly more difficult, since your code is straightforward, but... what if another pointer is made to point to the same place as x?
// generated C code.
int main() {
int *x = malloc(sizeof(*x));
int *y = x;
*x = 5;
free(x); // inserted by the compiler, now wrong
}
You doubtlessly will have a heap structure, in which each block has a header that tells a) whether the block is in use, and b) the size of the block. This can be achieved with a small structure, or by using the highest bit for a) in the integer value for b) [is this a 64bit compiler or 32bit?]. For simplicity, lets consider:
typedef struct {
bool allocated: 1;
size_t size;
} BlockHeader;
You would have to add another field to that small structure, which would be a reference count. Each time a pointer points to that block in the heap, you increment the reference count. When a pointer stops pointing to a block, then its reference count is decremented. If it reaches 0, then it can be compacted or whatever. The use of the allocated field has now gone.
typedef struct {
size_t size;
size_t referenceCount;
} BlockHeader;
Reference counting is quite simple to implement, but comes with a down side: it means there is overhead each time the value of a pointer changes. Still, is the simplest scheme to work, and that's why some programming languages still use it, such as Python.

Memory allocation and changing values

I am very new to C so sorry in advance if this is really basic. This is related to homework.
I have several helper functions, and each changes the value of a given variable (binary operations mostly), i.e.:
void helper1(unsigned short *x, arg1, arg2) --> x = &some_new_x
The main function calls other arguments arg3, arg4, arg5. The x is supposed to start at 0 (16-bit 0) at first, then be modified by helper functions, and after all the modifications, should be eventually returned by mainFunction.
Where do I declare the initial x and how/where do I allocate/free memory? If I declare it within mainFunc, it will reset to 0 every time helpers are called. If I free and reallocate memory inside helper functions, I get the "pointer being freed was not allocated" error even though I freed and allocated everything, or so I thought. A global variable doesn't do, either.
I would say that I don't really fully understand memory allocation, so I assume that my problem is with this, but it's entirely possible I just don't understand how to change variable values in C on a more basic level...
The variable x will exist while the block in which it was declared is executed, even during helper execution, and giving a pointer to the helpers allows them to change its value. If I understand your problem right, you shouldn't need dynamic memory allocation. The following code returns 4 from mainFunction:
void plus_one(unsigned short* x)
{
*x = *x + 1;
}
unsigned short mainFunction(void)
{
unsigned short x = 0;
plus_one(&x);
plus_one(&x);
plus_one(&x);
plus_one(&x);
return x;
}
By your description I'd suggest declaring x in your main function as a local variable (allocated from the stack) which you then pass by reference to your helper functions and return it from your main function by value.
int main()
{
int x; //local variable
helper(&x); //passed by reference
return x; //returned by value
}
Inside your helper you can modify the variable by dereferencing it and assigning whatever value needed:
void helper(int * x)
{
*x = ...; //change value of x
}
The alternative is declaring a pointer to x (which gets allocated from the heap) passing it to your helper functions and free-ing it when you have no use for it anymore. But this route requires more careful consideration and is error-prone.
Functions receive a value-wise copy of their inputs to locally scoped variables. Thus a helper function cannot possibly change the value it was called with, only its local copy.
void f(int n)
{
n = 2;
}
int main()
{
int n = 1;
f(n);
return 0;
}
Despite having the same name, n in f is local to the invocation of f. So the n in main never changes.
The way to work around this is to pass by pointer:
int f(int *n)
{
*n = 2;
}
int main()
{
int n = 1;
f(&n);
// now we also see n == 2.
return 0;
}
Note that, again, n in f is local, so if we changed the pointer n in f, it would have no effect on main's perspective. If we wanted to change the address n in main, we'd have to pass the address of the pointer.
void f1(int* nPtr)
{
nPtr = malloc(sizeof int);
*nPtr = 2;
}
void f2(int** nPtr)
{
// since nPtr is a pointer-to-a-pointer,
// we have to dereference it once to
// reach the "pointer-to-int"
// typeof nPtr = (int*)*
// typeof *nPtr = int*
*nPtr = malloc(sizeof int);
// deref once to get to int*, deref that for int
**nPtr = 2;
}
int main()
{
int *nPtr = NULL;
f1(nPtr); // passes 'NULL' to param 1 of f1.
// after the call, our 'nPtr' is still NULL
f2(&nPtr); // passes the *address* of our nPtr variable
// nPtr here should no-longer be null.
return 0;
}
---- EDIT: Regarding ownership of allocations ----
The ownership of pointers is a messy can of worms; the standard C library has a function strdup which returns a pointer to a copy of a string. It is left to the programmer to understand that the pointer is allocated with malloc and is expected to be released to the memory manager by a call to free.
This approach becomes more onerous as the thing being pointed to becomes more complex. For example, if you get a directory structure, you might be expected to understand that each entry is an allocated pointer that you are responsible for releasing.
dir = getDirectory(dirName);
for (i = 0; i < numEntries; i++) {
printf("%d: %s\n", i, dir[i]->de_name);
free(dir[i]);
}
free(dir);
If this was a file operation you'd be a little surprised if the library didn't provide a close function and made you tear down the file descriptor on your own.
A lot of modern libraries tend to assume responsibility for their resources and provide matching acquire and release functions, e.g. to open and close a MySQL connection:
// allocate a MySQL descriptor and initialize it.
MYSQL* conn = mysql_init(NULL);
DoStuffWithDBConnection(conn);
// release everything.
mysql_close(conn);
LibEvent has, e.g.
bufferevent_new();
to allocate an event buffer and
bufferevent_free();
to release it, even though what it actually does is little more than malloc() and free(), but by having you call these functions, they provide a well-defined and clear API which assumes responsibility for knowing such things.
This is the basis for the concept known as "RAII" in C++

Variable scope inside while loop

This is perhaps one of the most odd things I've ever encountered. I don't program much in C but from what I know to be true plus checking with different sources online, variables macroName and macroBody are only defined in scope of the while loop. So every time the loop runs, I'm expecting marcoName and macroBody to get new addresses and be completely new variables. However that is not true.
What I'm finding is that even though the loop is running again, both variables share the same address and this is causing me serious headache for a linked list where I need to check for uniqueness of elements. I don't know why this is. Shouldn't macroName and macroBody get completely new addresses each time the while loop runs?
I know this is the problem because I'm printing the addresses and they are the same.
while(fgets(line, sizeof(line), fp) != NULL) // Get new line
{
char macroName[MAXLINE];
char macroBody[MAXLINE];
// ... more code
switch (command_type)
{
case hake_macro_definition:
// ... more code
printf("**********%p | %p\n", &macroName, &macroBody);
break;
// .... more cases
}
}
Code that is part of my linked-list code.
struct macro {
struct macro *next;
struct macro *previous;
char *name;
char *body;
};
Function that checks if element already exists inside linked-list. But since *name has the same address, I always end up inside the if condition.
static struct macro *macro_lookup(char *name)
{
struct macro *temp = macro_list_head;
while (temp != NULL)
{
if (are_strings_equal(name, temp->name))
{
break;
}
temp = temp->next;
}
return temp;
}
These arrays are allocated on the stack:
char macroName[MAXLINE];
char macroBody[MAXLINE];
The compiler has pre-allocated space for you that exists at the start of your function. In other words, from the computer's viewpoint, the location of these arrays would the same as if you had defined them outside the loop body at the top of your function body.
The scope in C merely indicates where an identifier is visible. So the compiler (but not the computer) enforces the semantics that macroName and macroBody cannot be referenced before or after the loop body. But from the computer's viewpoint, the actual data for these arrays exists once the function starts and only goes away when the function ends.
If you were to look at the assembly dump of your code, you'd likely see that your machine's frame pointer is decremented by a big enough amount for your function's call stack to have space for all of your local variables, including these arrays.
What I need to mention in addition to chrisaycock's answer: you should never use pointers to local variables outside function these variables were defined in. Consider this example:
int * f()
{
int local_var = 0;
return &local_var;
}
int g(int x)
{
return (x > 0) ? x : 0;
}
int main()
{
int * from_f = f(); //
*from_f = 100; //Undefined behavior
g(15); //some function call to change stack
printf("%d", *from_f); //Will print some random value
return 0;
}
The same, actually, applies to a block. Technically, block-local variables can be cleaned out after the block ends. So, on each iteration of a loop old addresses can be invalid. It will not be true since C compiler indeed puts these vars to the same address for perfomance reasons, but you can not rely on it.
What you need to understand is how memory is allocated. If you want to implement a list, it is a structure that grows. Where does the memory come from? You can not allocate much memory from the stack, plus the memory is invalidated once you return from a function. So, you will need to allocate it from the heap (using malloc).

How to return an integer from a function

Which is considered better style?
int set_int (int *source) {
*source = 5;
return 0;
}
int main(){
int x;
set_int (&x);
}
OR
int *set_int (void) {
int *temp = NULL;
temp = malloc(sizeof (int));
*temp = 5;
return temp;
}
int main (void) {
int *x = set_int ();
}
Coming for a higher level programming background I gotta say I like the second version more. Any, tips would be very helpful. Still learning C.
Neither.
// "best" style for a function which sets an integer taken by pointer
void set_int(int *p) { *p = 5; }
int i;
set_int(&i);
Or:
// then again, minimise indirection
int an_interesting_int() { return 5; /* well, in real life more work */ }
int i = an_interesting_int();
Just because higher-level programming languages do a lot of allocation under the covers, does not mean that your C code will become easier to write/read/debug if you keep adding more unnecessary allocation :-)
If you do actually need an int allocated with malloc, and to use a pointer to that int, then I'd go with the first one (but bugfixed):
void set_int(int *p) { *p = 5; }
int *x = malloc(sizeof(*x));
if (x == 0) { do something about the error }
set_int(x);
Note that the function set_int is the same either way. It doesn't care where the integer it's setting came from, whether it's on the stack or the heap, who owns it, whether it has existed for a long time or whether it's brand new. So it's flexible. If you then want to also write a function which does two things (allocates something and sets the value) then of course you can, using set_int as a building block, perhaps like this:
int *allocate_and_set_int() {
int *x = malloc(sizeof(*x));
if (x != 0) set_int(x);
return x;
}
In the context of a real app, you can probably think of a better name than allocate_and_set_int...
Some errors:
int main(){
int x*; //should be int* x; or int *x;
set_int(x);
}
Also, you are not allocating any memory in the first code example.
int *x = malloc(sizeof(int));
About the style:
I prefer the first one, because you have less chances of not freeing the memory held by the pointer.
The first one is incorrect (apart from the syntax error) - you're passing an uninitialised pointer to set_int(). The correct call would be:
int main()
{
int x;
set_int(&x);
}
If they're just ints, and it can't fail, then the usual answer would be "neither" - you would usually write that like:
int get_int(void)
{
return 5;
}
int main()
{
int x;
x = get_int();
}
If, however, it's a more complicated aggregate type, then the second version is quite common:
struct somestruct *new_somestruct(int p1, const char *p2)
{
struct somestruct *s = malloc(sizeof *s);
if (s)
{
s->x = 0;
s->j = p1;
s->abc = p2;
}
return s;
}
int main()
{
struct somestruct *foo = new_somestruct(10, "Phil Collins");
free(foo);
return 0;
}
This allows struct somestruct * to be an "opaque pointer", where the complete definition of type struct somestruct isn't known to the calling code. The standard library uses this convention - for example, FILE *.
Definitely go with the first version. Notice that this allowed you to omit a dynamic memory allocation, which is SLOW, and may be a source of bugs, if you forget to later free that memory.
Also, if you decide for some reason to use the second style, notice that you don't need to initialize the pointer to NULL. This value will either way be overwritten by whatever malloc() returns. And if you're out of memory, malloc() will return NULL by itself, without your help :-).
So int *temp = malloc(sizeof(int)); is sufficient.
Memory managing rules usually state that the allocator of a memory block should also deallocate it. This is impossible when you return allocated memory. Therefore, the second should be better.
For a more complex type like a struct, you'll usually end up with a function to initialize it and maybe a function to dispose of it. Allocation and deallocate should be done separately, by you.
C gives you the freedom to allocate memory dynamically or statically, and having a function work only with one of the two modes (which would be the case if you had a function that returned dynamically allocated memory) limits you.
typedef struct
{
int x;
float y;
} foo;
void foo_init(foo* object, int x, float y)
{
object->x = x;
object->y = y;
}
int main()
{
foo myFoo;
foo_init(&foo, 1, 3.1416);
}
In the second one you would need a pointer to a pointer for it to work, and in the first you are not using the return value, though you should.
I tend to prefer the first one, in C, but that depends on what you are actually doing, as I doubt you are doing something this simple.
Keep your code as simple as you need to get it done, the KISS principle is still valid.
It is best not to return a piece of allocated memory from a function if somebody does not know how it works they might not deallocate the memory.
The memory deallocation should be the responsibility of the code allocating the memory.
The first is preferred (assuming the simple syntax bugs are fixed) because it is how you simulate an Out Parameter. However, it's only usable where the caller can arrange for all the space to be allocated to write the value into before the call; when the caller lacks that information, you've got to return a pointer to memory (maybe malloced, maybe from a pool, etc.)
What you are asking more generally is how to return values from a function. It's a great question because it's so hard to get right. What you can learn are some rules of thumb that will stop you making horrid code. Then, read good code until you internalize the different patterns.
Here is my advice:
In general any function that returns a new value should do so via its return statement. This applies for structures, obviously, but also arrays, strings, and integers. Since integers are simple types (they fit into one machine word) you can pass them around directly, not with pointers.
Never pass pointers to integers, it's an anti-pattern. Always pass integers by value.
Learn to group functions by type so that you don't have to learn (or explain) every case separately. A good model is a simple OO one: a _new function that creates an opaque struct and returns a pointer to it; a set of functions that take the pointer to that struct and do stuff with it (set properties, do work); a set of functions that return properties of that struct; a destructor that takes a pointer to the struct and frees it. Hey presto, C becomes much nicer like this.
When you do modify arguments (only structs or arrays), stick to conventions, e.g. stdc libraries always copy from right to left; the OO model I explained would always put the structure pointer first.
Avoid modifying more than one argument in one function. Otherwise you get complex interfaces you can't remember and you eventually get wrong.
Return 0 for success, -1 for errors, when the function does something which might go wrong. In some cases you may have to return -1 for errors, 0 or greater for success.
The standard POSIX APIs are a good template but don't use any kind of class pattern.

Resources