Global Pointer in C? - c

I know a pointer is usually assigned upon its declaration, but I wondering if there's any way to create a global pointer in C. For example my code below: is it a good practice?
static int *number_args = NULL;
void pro_init(int number)
{
number_args = &number; /* initialize the pointer value -- is this okay? */
}

Avoid globals - They are a bad idea and usually lead into problems.
You are taking an address of a variable on the stack. That will get reused somewhere down the line and hence having unintended results.
If you feel the need (why?) to have a global pointer then initialise if off the heap.

That is valid. There are many good reasons to have global variables, especially static global variables. But if something doesn't need to be global, it's better to not make it global.
Also keep in mind that if more than one thread accesses that variable, you'll need to protect it somehow, probably with a mutex, or you may have race conditions.
Also, keep in mind that "number" is a stack variable. Arguments to functions and local variables are both allocated on the stack, and cease to exist outside of their scope. So unless "pro_init()" either never returns, or sets the variable back to NULL before it returns, you'll end up with an invalid pointer.
You might use heap memory instead, for example:
number_args = malloc(sizeof(int));
if (number_args == NULL) { /* handle malloc error */ }
*number_args = number;

Related

C - Why variables created in a loop have the same memory address?

Just a simple example of my problem:
while(condition){
int number = 0;
printf("%p", &number);
}
That variable will always be in the same memory address. Why?
And what's the real difference between declaring it inside or outside the loop then?
Would I need to malloc the variable every iteration to get different addresses?
That variable will always be in the same memory address. Why?
It's not required to, but your code is so simple that it probably will be across all platforms. Specifically, because it's stored on the stack, it's always in the same place relative to your stack pointer. Keep in mind you're not allocating memory here (no new or malloc), you're just naming existing (stack-relative) memory.
And what's the real difference between declaring it inside or outside the loop then?
In this case, scope. The variable doesn't exist outside the braces where it lives in. Also outside of the braces, another variable can take its place if it fits in memory and the compiler chooses to do this.
Would I need to malloc the variable every iteration to get different addresses?
Yes, but I have yet to see a good use of malloc to allocate space for an int that a simple stack variable or a standard collection wouldn't do better.
That variable will always be in the same memory address. Why?
The compiler decides where the variable should be, given the operating system constraints, it's much more efficient to maintain the variable at the same address than having it relocated at every iteration, but this could, theoretically, happen.
You can't rely on it being in the same address every time.
And what's the real difference between declaring it inside or outside the loop then?
The difference is lifetime of the variable, if declared within the loop it will only exist inside the loop, you can't access it after the loop ends.
When execution of the block ends the lifetime of the object ends and it can no longer be accessed.
Would I need to malloc the variable every iteration to get different addresses?
malloc is an expensive operation, it does not make much sense to malloc the variable at every iteration, that said, again, the compiler decides where the memory for it is allocated, it may very well be at the same address or not.
Once again you can't rely on the variable location in the previous iteration to assert where it will be on the next one.
There is a difference in the the variables are stored, allocated variables will be on the heap, as opposed to the stack like in the previous case.
It is being put into the same memory address to save memory.
The only real difference between declaring it within and without the loop is that the variable will no longer be within scope outside the loop if it was declared within the loop.
You would have to use malloc to get a different address each time. Also, you would have to leave the frees until after all the mallocs to get this guarantee.
That variable will always be in the same memory address. Why?
The object that number designates has auto storage duration and only exists for the lifetime of the loop body, so logically speaking a new instance is created and destroyed on each loop iteration.
Practically speaking, it's easier to just re-use the same memory location for each loop iteration, which is what most (if not all) C compilers do. It's just not guaranteed to retain its last value from one iteration to the next (especially if you initialize it each iteration).
And what's the real difference between declaring it inside or outside the loop then?
The lifetime of the object (the period of program execution where storage is guaranteed to be reserved for it) changes from the body of the loop to the body of the function. The scope of the identifier (the region of program text where the identifier is visible) changes from the body of the loop to the body of the entire function.
Again, practically speaking, most compilers will allocate stack space for auto objects that are in blocks at function entry - for example, given the code
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
int bletch = 2 * bar;
...
}
}
most compilers will generate instructions to reserve stack space for both bar and bletch at function entry, rather than waiting until loop entry to reserve space for bletch. It's just easier to set the stack pointer once and get it over with. Storage is guaranteed to be reserved for bletch over the lifetime of the loop body, but there's nothing in the language definition that says you can't reserve it before then.
However, if you have a situation like this:
void foo( void )
{
int bar;
while ( bar = 0; bar < 10; bar++ )
{
if ( bar % 2 == 0 ) // bar is even
{
int bletch = 2 * bar;
...
}
else
{
int blurga = 3 * bar + 1;
...
}
}
bletch and blurga cannot exist at the same time, so the compiler may only allocate space for one additional int object, and that same space will be used for either bletch or blurga depending on the value of bar.
There are compilers that, despite you declaring the variable in the inner loop, just allocate them at the entry to the function block.
Modern compilers tend to allocate all memory for local variables in a single shot at function entry, so that only represents a single stack pointer move, against several push pop instructions to get the same result.
Despite of that, there's another issue you have not considered. The variable in the inner loop is not visible outside the loop, and the memory used by it can be used for a different variable outside. You know that the memory address is always the same... but you don't know when you are out of scope if any of the other variables you use for a different thing are given the same address by the compiler (that's perfectly legal, as your variable is automatic, and so, it ceases to exist as soon as you get out of the block (the pair of curly brackets you put around the loop)

Mallocing in a recursive function

I have a function that is called recursively a number of times. Inside this function I malloc memory for a struct and pass it as an argument into the recursive call of this function. I am confused whether I can keep the name of the variable I am mallocing the same. Or is this going to be a problem?
struct Student{
char *studentName;
int studentAge;
};
recursiveFunction(*struct){ //(Whoever calls this function sends in a malloced struct)
Student *structptr = malloc(sizeof(Student));
<Do some processing>
.
.
if(condition met){
return;
}
else{
recursiveFunction(structptr);
}
}
free(){} // All malloced variables are free'd in another function
Would this be a problem since the name of the variable being malloced doesnt change in each recursive call.
The short answer is no. When you declare a variable it is scoped at the level where it is declared, in your case within this function. Each successive recursive call creates a new scope and allocates that memory within that scope so the name of your variable will not cause problems. However, you do want to be very careful that you free any memory that you malloc() before returning from your function as it will not be accessible outside the scope of your function unless you pass back a pointer to it. This question provides a lot of helpful information on using malloc() within functions. I also recommend reading more about scope here.
Each malloc() must have a matching free(). Either you need to free the record inside recursiveFunction (e.g. immediately before it exits), or in a function called by recursiveFunction or you need to maintain a list of them and free them elsewhere.
The name of the 'variable being malloced' being the same is irrelevant. In any case, it is not the variable that is being malloc()d; rather it is memory that is being malloc()d and the address stored in a variable. Each recursive iteration of recursiveFunction has a different stack frame and thus a different instance of this variable. So all you need to do is ensure that each malloc() is paired with a free() that is passed the address returned by malloc().
If you want to check you've done your malloc() / free() right, run valgrind on the code.
Can keep the name of the variable I am mallocing the same?
Yes, in a recursive function this is fine. As the function is called recursively, each variable holding the malloc'd pointer (it doesn't hold the memory itself) will be allocated on a new stack frame.
However, you're going to have to free that memory somehow. Only the pointer to the memory is on the stack, so only the pointer is freed when the function exits. The malloc'd memory lives on. Either at the end of each call to the function, or all that memory will have to be returned as part of a larger structure and freed later.
I am confused whether I can keep the name of the variable I am mallocing the same.
You seem to be confused about the concept of scope. Functions in C define scopes for the (local) variables you declare within them. That means that when you declare a local variable bar inside some function foo, then when you reference bar inside that function you reference whatever you declared it to be.
int bar = 21;
void foo(void) {
int bar = 42;
// ...
bar; // This is the bar set to 42
}
Now scope is only the theoretical concept. It's implemented using (among other details that I skip over here) so called stack frames:
When you call foo, then a new stack frame is created on the call stack, containing (this is highly dependent on the target architecture) things like return address (i.e. the address of the instruction that will be executed after foo), parameters (i.e. the values that you pass to a function) and, most importantly, space for the local variables (bar).
Accessing the variable bar in foo is done using addresses relative to the current stack frame. So accessing bar could mean access byte 12 relative to the current stack frame.
When in a recursive function the function calls itself, this is handled (mostly, apart from possible optimizations) like any other function call, and thus a new stack frame is created. Accessing the same (named) variable from within different stack frames will (because, as said, the access is using a relative address) thus access a different entities.
[Note: I hope this rather rough descriptions helps you, this is a topic that is - when talked about in depth - extremely depending on actual implementations (compilers), used optimizations, calling convention, operating system, target architecture, ... ]
I put together a simple stupid example, which hopefully shows that what you want to do should be possible, given that you appropriately free whatever you allocated:
unsigned int crazy_factorial(unsigned int const * const n) {
unsigned int result;
if (*n == 0) {
result = 1;
} else {
unsigned int * const nextN = malloc(sizeof(unsigned int));
*nextN = *n - 1;
result = *n * crazy_factorial(nextN);
free(nextN);
}
return result;
}
Running this with some output shows what's going on.

scope rules in C

I recently read about scope rules in C. It says that a local or auto variable is available only inside the block of the function in which it is declared. Once outside the function it no longer is visible. Also that its lifetime is only till the end of the final closing braces of the function body.
Now here is the problem. What happens when the address of a local variable is returned from the function to the calling function ?
For example :-
main()
{
int *p=fun();
}
int * fun()
{
int localvar=0;
return (&localvar);
}
once the control returns back from the function fun, the variable localvar is no longer alive. So how will main be able to access the contents at this address ?
The address can be returned, but the value stored at the address cannot reliably be read. Indeed, it is not even clear that you can safely assign it, though the chances are that on most machines there wouldn't be a problem with that.
You can often read the address, but the behaviour is undefined (read 'bad: to be avoided at all costs!'). In particular, the address may be used for other variables in other functions, so if you access it after calling other functions, you are definitely unlikely to see the last value stored in the variable by the function that returned the pointer to it.
Why then is a function returning a pointer ever required?
One reason is often 'dynamic memory'. The malloc() family of functions return a pointer to new (non-stack) memory.
Another reason is 'found something at this location in a value passed to me'. Consider strchr() or strstr().
Another reason is 'returning pointer to a static object, either hidden in the function or in the file containing the source for the function'. Consider asctime() et al (and worry about thread-safety).
There are probably a few others, but those are probably the most common.
Note that none of these return a pointer to a local (stack-based) variable.
The variable is gone, but the memory location still exists and might even still contain the value you set. It will however probably get overwritten pretty fast as more functions are called and the memory address gets reused for another function's local variables. You can learn more by reading about the Call Stack, which is where local variables of functions are stored.
Referencing that location in memory after the function has returned is dangerous. Of course the location still exists (and it may still contain your value), but you no longer have any claim to that memory region and it will likely be overwritten with new data as the program continues and new local variables are allocated on the stack.
gcc gives me the following warning:
t.c: In function ‘test’:
t.c:3:2: warning: function returns address of local variable [enabled by default]
Consider this test program:
int * test(int p) {
int loc = p;
return &loc;
}
int main(void) {
int *c = test(4);
test(5);
printf("%d\n", *c);
return 0;
}
What do you think this prints?

Why can a function return an array setup by malloc but not one setup by "int cat[3] = {0,0,0};"

Why can I return from a function an array setup by malloc:
int *dog = (int*)malloc(n * sizeof(int));
but not an array setup by
int cat[3] = {0,0,0};
The "cat[ ]" array is returned with a Warning.
Thanks all for your help
This is a question of scope.
int cat[3]; // declares a local variable cat
Local variables versus malloc'd memory
Local variables exist on the stack. When this function returns, these local variables will be destroyed. At that point, the addresses used to store your array are recycled, so you cannot guarantee anything about their contents.
If you call malloc, you will be allocating from the heap, so the memory will persist beyond the life of your function.
If the function is supposed to return a pointer (in this case, a pointer-to-int which is the first address of the integer array), that pointer should point to good memory. Malloc is the way to ensure this.
Avoiding Malloc
You do not have to call malloc inside of your function (although it would be normal and appropriate to do so).
Alternatively, you could pass an address into your function which is supposed to hold these values. Your function would do the work of calculating the values and would fill the memory at the given address, and then it would return.
In fact, this is a common pattern. If you do this, however, you will find that you do not need to return the address, since you already know the address outside of the function you are calling. Because of this, it's more common to return a value which indicates the success or failure of the routine, like an int, than it is to return the address of the relevant data.
This way, the caller of the function can know whether or not the data was successfully populated or if an error occurred.
#include <stdio.h> // include stdio for the printf function
int rainCats (int *cats); // pass a pointer-to-int to function rainCats
int main (int argc, char *argv[]) {
int cats[3]; // cats is the address to the first element
int success; // declare an int to store the success value
success = rainCats(cats); // pass the address to the function
if (success == 0) {
int i;
for (i=0; i<3; i++) {
printf("cat[%d] is %d \r", i, cats[i]);
getchar();
}
}
return 0;
}
int rainCats (int *cats) {
int i;
for (i=0; i<3; i++) { // put a number in each element of the cats array
cats[i] = i;
}
return 0; // return a zero to signify success
}
Why this works
Note that you never did have to call malloc here because cats[3] was declared inside of the main function. The local variables in main will only be destroyed when the program exits. Unless the program is very simple, malloc will be used to create and control the lifespan of a data structure.
Also notice that rainCats is hard-coded to return 0. Nothing happens inside of rainCats which would make it fail, such as attempting to access a file, a network request, or other memory allocations. More complex programs have many reasons for failing, so there is often a good reason for returning a success code.
There are two key parts of memory in a running program: the stack, and the heap. The stack is also referred to as the call stack.
When you make a function call, information about the parameters, where to return, and all the variables defined in the scope of the function are pushed onto the stack. (It used to be the case that C variables could only be defined at the beginning of the function. Mostly because it made life easier for the compiler writers.)
When you return from a function, everything on the stack is popped off and is gone (and soon when you make some more function calls you'll overwrite that memory, so you don't want to be pointing at it!)
Anytime you allocate memory you are allocating if from the heap. That's some other part of memory, maintained by the allocation manager. Once you "reserve" part of it, you are responsible for it, and if you want to stop pointing at it, you're supposed to let the manager know. If you drop the pointer and can't ask to have it released any more, that's a leak.
You're also supposed to only look at the part of memory you said you wanted. Overwriting not just the part you said you wanted, but past (or before) that part of memory is a classic technique for exploits: writing information into part of memory that is holding computer instructions instead of data. Knowledge of how the compiler and the runtime manage things helps experts figure out how to do this. Well designed operating systems prevent them from doing that.
heap:
int *dog = (int*)malloc(n*sizeof(int*));
stack:
int cat[3] = {0,0,0};
Because int cat[3] = {0,0,0}; is declaring an automatic variable that only exists while the function is being called.
There is a special "dispensation" in C for inited automatic arrays of char, so that quoted strings can be returned, but it doesn't generalize to other array types.
cat[] is allocated on the stack of the function you are calling, when that stack is freed that memory is freed (when the function returns the stack should be considered freed).
If what you want to do is populate an array of int's in the calling frame pass in a pointer to an that you control from the calling frame;
void somefunction() {
int cats[3];
findMyCats(cats);
}
void findMyCats(int *cats) {
cats[0] = 0;
cats[1] = 0;
cats[2] = 0;
}
of course this is contrived and I've hardcoded that the array length is 3 but this is what you have to do to get data from an invoked function.
A single value works because it's copied back to the calling frame;
int findACat() {
int cat = 3;
return cat;
}
in findACat 3 is copied from findAtCat to the calling frame since its a known quantity the compiler can do that for you. The data a pointer points to can't be copied because the compiler does not know how much to copy.
When you define a variable like 'cat' the compiler assigns it an address. The association between the name and the address is only valid within the scope of the definition. In the case of auto variables that scope is the function body from the point of definition onwards.
Auto variables are allocated on the stack. The same address on the stack is associated with different variables at different times. When you return an array, what is actually returned is the address of the first element of the array. Unfortunately, after the return, the compiler can and will reuse that storage for completely unrelated purposes. What you'd see at a source code level would be your returned variable mysteriously changing for no apparent reason.
Now, if you really must return an initialized array, you can declare that array as static. A static variable has a permanent rather than a temporary storage allocation. You'll need to keep in mind that the same memory will be used by successive calls to the function, so the results from the previous call may need to be copied somewhere else before making the next call.
Another approach is to pass the array in as an argument and write into it in your function. The calling function then owns the variable, and the issues with stack variables don't arise.
None of this will make much sense unless you carefully study how the stack works. Good luck.
You cannot return an array. You are returning a pointer. This is not the same thing.
You can return a pointer to the memory allocated by malloc() because malloc() has allocated the memory and reserved it for use by your program until you explicitly use free() to deallocate it.
You may not return a pointer to the memory allocated by a local array because as soon as the function ends, the local array no longer exists.
This is a question of object lifetime - not scope or stack or heap. While those terms are related to the lifetime of an object, they aren't equivalent to lifetime, and it's the lifetime of the object that you're returning that's important. For example, a dynamically alloced object has a lifetime that extends from allocation to deallocataion. A local variable's lifetime might end when the scope of the variable ends, but if it's static its lifetime won't end there.
The lifetime of an object that has been allocated with malloc() is until that object has been freed using the free() function. Therefore when you create an object using malloc(), you can legitimately return the pointer to that object as long as you haven't freed it - it will still be alive when the function ends. In fact you should take care to do something with the pointer so it gets remembered somewhere or it will result in a leak.
The lifetime of an automatic variable ends when the scope of the variable ends (so scope is related to lifetime). Therefore, it doesn't make sense to return a pointer to such an object from a function - the pointer will be invalid as soon as the function returns.
Now, if your local variable is static instead of automatic, then its lifetime extends beyond the scope that it's in (therefore scope is not equivalent to lifetime). So if a function has a local static variable, the object will still be alive even when the function has returned, and it would be legitimate to return a pointer to a static array from your function. Though that brings in a whole new set of problems because there's only one instance of that object, so returning it multiple times from the function can cause problems with sharing the data (it basically only works if the data doesn't change after initialization or there are clear rules for when it can and cannot change).
Another example taken from another answer here is regarding string literals - pointers to them can be returned from a function not because of a scoping rule, but because of a rule that says that string literals have a lifetime that extends until the program ends.

Is there any point in declaring pointers for variables that are on the stack?

void my_cool_function()
{
obj_scene_data scene;
obj_scene_data *scene_ptr = &scene;
parse_obj_scene(scene_ptr, "test.txt");
}
Why would I ever create a pointer to a local variable as above if I can just do
void my_cool_function()
{
obj_scene_data scene;
parse_obj_scene(&scene, "test.txt");
}
Just in case it's relevant:
int parse_obj_scene(obj_scene_data *data_out, char *filename);
In the specific code you linked, there isn't really a reason.
It could be functionally necessary if you have a function taking an obj_scene_data **. You can't do &&scene, so you'd have to create a local variable before passing the address on.
Yes absolutely you can do this for many reasons.
For example if you want to iterate over the members of a stack allocated array via a pointer.
Or in other cases if you want to point sometimes to one memory address and other times to another memory address. You can setup a pointer to point to one or the other via an if statement and then later use your common code all within the same scope.
Typically in these cases your pointer variable goes out of scope at the same time as your stack allocated memory goes out of scope. There is no harm if you use your pointer within the same scope.
In your exact example there is no good reason to do it.
If the function accepts a NULL pointer as input, and you want to decide whether to pass NULL based on some condition, then a pointer to a stack variable is useful to avoid having to call the same function in separate code paths, especially if the rest of the parameters are the same otherwise. For example, instead of this:
void my_function()
{
obj_data obj = {0};
if( some condition )
other_function(&scene, "test.txt");
else
other_function(NULL, "test.txt");
}
You could do this:
void my_function()
{
obj_data obj = {0};
obj_data *obj_ptr = (condition is true) ? &obj : NULL;
other_function(obj_ptr, "test.txt");
}
If parse_obj_scene() is a function there may be no good reason to create a separate pointer. But if for some unholy reason it is a macro it may be necessary to reassign the value to the pointer to iterate over the subject data.
Not in terms of semantics, and in fact there is a more general point that you can replace all local variables with function calls with no change in semantics, and given suitable compiler optimisations, equal efficiency. (see section 2.3 of "Lambda: The Ultimate Imperative".)
But the point of writing code to communicate with the next person to maintain it, and in an imperative language without tail call optimisation, it is usual to use local variables for things which are iterated over, for automatic structures, and to simplify expressions. So if it makes the code more readable, then use it.

Resources