Is the value of an un-initialialized variable aways 'garbage'? - c

Are there cases where it's ok to use a variable when it has not been initialized, or is it always assumed to be garbage? For example, one case would be:
// global example?
// static example?
// extern example?
// etc.
{
int n=1, max_n;
printf("%d %d\n", n, max_n);
}
In this case max_n has a garbage/undefined value. But are there ever cases where the value is known and can be used, such as doing something like bool item being auto-initialized to 0/false, or is that never the case in C?

By definition, if something is not initialized, it does not have a defined value. If it does have a defined value, it is initialized.
From cppreference:
The value in an uninitialized variable can be anything – it is
unpredictable, and may be different every time the program is run.
Reading the value of an uninitialized variable is undefined behaviour
– which is always a bad idea. It has to be initialized with a value
before you can use it.
Also from cppreference about implicit initialization:
If an initializer is not provided:
objects with automatic storage duration are initialized to
indeterminate values (which may be trap representations)
objects with
static and thread-local storage duration are zero-initialized
So for example, int a; will be zero-initialized if declared in e.g. global scope, as a static class variable, as a thread_local variable, etc. In other cases, it will be uninitialized.

In C, reading an uninitialized variable results in undefined behavior. It may be 0, but it may also be any random value that happens to be in that memory address. To make a long story short - don't rely on the values of uninitialized variables.

Not sure if you would consider this a "case", but in C, local variables are allocated on the stack, and are uninitialized.
However, global and static variables are allocated in one of the program data sections, so they are zeroed by the OS at load time.
See this question for more information on that including standard reference:
Why are global and static variables initialized to their default values?

Global and static objects are initialized with the default value for their type (0 for integers, 0.0 for floats, NULL for pointers etc.)
No other uninitialized value can be relied upon.
Here is a quick explanation why:
Most implementations have all local variables on a stack. This stack grows when you declare local variables or when you call a function, and shrinks when you do the reverse action - exit the scope of a variable or return from a function.
Now let's see a program:
void good()
{
int p = 2;
printf("%d", p);
}
void notgood()
{
int p;
printf("%d", p);
}
int main()
{
good();
notgood();
notgood();
}
Here is the stack at the beginning of the program (stack grows downwards). The stack pointer always points to the top element (represented by an arrow):
|---------------------|
|main's return address| <-- stack pointer
Next is immediately after good() gets called:
|---------------------|
|main's return address|
|good's return address| <-- stack pointer
Next we declare p and initialize it with 2:
|---------------------|
|main's return address|
|good's return address|
|value 2 (variable p) | <-- stack pointer
After that, we have the call to printf:
|---------------------|
|main's return address|
|good's return address|
|value 2 (variable p) |
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# | <-- stack pointer
When printf returns, the return address and parameters are popped of the stack, but they are not erased from memory. That would be inefficient. We simply decrease the stack pointer.
|---------------------|
|main's return address|
|good's return address|
|value 2 (variable p) | <-- stack pointer
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# |
Next, our good() function returns to main:
|---------------------|
|main's return address| <-- stack pointer
|good's return address|
|value 2 (variable p) |
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# |
Call notgood. Whatever trash is on the stack gets overwritten:
|---------------------|
|main's return address|
|notgood's return addr| <-- stack pointer
|value 2 (variable p) |
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# |
Declare the variable (allocate the space), but we don't initialize. Hence, the old garbage value is still there:
|---------------------|
|main's return address|
|notgood's return addr|
|value 2 (variable p) | <-- stack pointer
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# |
Next, we call printf again. Please note that it's return address actually changes, so the old trash is overwritten on the stack:
|---------------------|
|main's return address|
|notgood's return addr|
|value 2 (variable p) |
|value 2 (parameter) |
|format string address|
|printf's return addr |
|#printf's frame# | <-- stack pointer
So, as you can see, if you don't initialize variables, they take the value of whatever there was on the stack.
Be aware that the program may not work, as the compiler could optimize some function calls out.

The answer to your question is NO. That will depend on which scope that variable was declared. If it is a global or static variable it will be initialized to 0. In the other cases there is nothing you can be sure about.

Result of any operation using not initialized automatic storage variables is Undefined. What they hold initially is not determined.
C (2007 draft) Standard (6.7.8 p10):
If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

Related

Stack Segment C Arrays

I came across an example in a page outlining the various ways to represent a string in C structures. It explains that an array defined in a function outside main will be stored in the stack segment and as such will not necessarily be present following its return potentially causing a runtime error.
HIGHLIGHTED POSSIBLE DUPLICATE EXPLAINED WHY THE ARRAY FAILED ON RETURN I.E THE POINTER TO ELEMENT 0 RETURNED IS NO LONGER VALID BUT DID NOT SHOW THAT THE REASON VARIABLES OF THE SAME STORAGE CLASS (AUTO) ARE SUCCESSFUL IS THAT THEY PASS A VALUE WHICH SURVIVES THE REMOVAL OF THE STACK FRAME
" the below program may print some garbage data as string is stored in stack frame of function getString() and data may not be there after getString() returns. "
char *getString()
{
char str[] = "GfG"; /* Stored in stack segment */
/* Problem: string may not be present after getSting() returns */
return str;
}
int main()
{
printf("%s", getString());
getchar();
return 0;
}
I understand that other local C variables will also be defined in their respective stack frames and obviously they can be returned so why is it an issue for arrays?
Thanks
This should roughly explains what happened, after return from getString(), its stack is not valid anymore.
^ ^
| not valid | ^ ^
+------------+ | not valid |
str--> | "GfG" | | not valid | <---+
| --- | | not valid | |
| stack of | +------------+ |
| getString | | return(str)| ----+
+------------+ | --- |
| | | |
| stack of | | stack of |
| main() | | main() |
+------------+ +------------+
If compiled with gcc -W -Wall (should always use those options), it should give the warning:
warning: function returns address of local variable [-Wreturn-local-addr]
The difference is in returning a value as opposed to returning a pointer.
When you do this:
int f()
{
int x = 9;
return x;
}
int main()
{
int a = f();
printf("a=%d\n", a);
return;
}
This is valid because even though x is out of scope when f returns, it is the value stored in x (9 in this case) that is returned. That value is then assigned to a and subsequently printed.
In your example you're returning an array. In most contexts, an array used in an expression decays into a pointer to its first element. So return str is the same as return &str[0]. That pointer value is returned and passed to printf. Then printf tried to dereference that pointer, but the memory it points to (the array str) is no longer valid.
So you can return values from a function, but if that value is a pointer to a local variable it will not be valid.
There are 2 situations:
When you return a simple value like an int or char from a local function and that variable is defined/declared in that local function. It happens successfully, because while returning the value is actually copied.
Now you have string "GfG" in str and when you do return str, the value that is copied is what is there in str and that is the address location of the array. So in this case the array location (pointer) is copied, while the contents of this array vanishes (since the contents were on local stack frame).

Why does free work like this?

Given the following code:
typedef struct Tokens {
char **data;
size_t count;
} Tokens;
void freeTokens(Tokens *tokens) {
int d;
for(d = 0;d < tokens->count;d++)
free(tokens->data[d]);
free(tokens->data);
free(tokens);
tokens = NULL;
}
Why do I need that extra:
free(tokens->data);
Shouldn't that be handled in the for loop?
I've tested both against valgrind/drmemory and indeed the top loop correctly deallocates all dynamic memory, however if I remove the identified line I leak memory.
Howcome?
Let's look at a diagram of the memory you're using in the program:
+---------+ +---------+---------+---------+-----+
| data | --> | char * | char * | char * | ... |
+---------+ +---------+---------+---------+-----+
| count | | | |
+---------+ v v v
+---+ +---+ +---+
| a | | b | | c |
+---+ +---+ +---+
|...| |...| |...|
+---+ +---+ +---+
In C, we can dynamically allocate space for a group (more simply, an array) of elements. However, we can't use an array type to reference that dynamic allocation, and instead use a pointer type. In this case, the pointer just points to the first element of the dynamically allocated array. If you add 1 to the pointer, you'll get a pointer to the second element of the dynamically allocated array, add two to get a pointer to the second element, and so on.
In C, the bracket syntax (data[1]) is shorthand for addition and dereferencing to a pointer. So pointers in C can be used like arrays in this way.
In the diagram, data pointing to the first char * in the dynamically allocated array, which is elsewhere in memory.
Each member of the array pointed to by data is a string, itself dynamically allocated (since the elements are char *s).
So, the loop deallocates the strings ('a...', 'b...', 'c...', etc), free(tokens->data) deallocates the array data points to, and finally, free(tokens) frees the entire struct.
data is a pointer to a pointer. This means data points to a dynamically allocated array of pointers, which then each point to the actual data. The first for loops frees each of the pointers IN the array, but you still need to free the original pointer TO that array of the other points which you freed already. That's the reason for the line you pointed out.
As a general rule of thumb, every malloc() should have a corresponding call to free(). If you look at the code which allocates the memory in this program, you will very likely see a very strict correspondence with the code you posted here that frees the memory.

Reinitializing Pointers for C Language

I'm currently learning C Programming through Dan Gookin's book Beginning C Programming for Dummies.
One of the topic I'm currently reading is on the fact that arrays are in fact pointers. Dan attempted to prove that with the following code:
#include <stdio.h>
int main()
{
int numbers[10];
int x;
int *pn;
pn = numbers; /* initialize pointer */
/* Fill array */
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
pn = numbers;
/* Display array */
for(x=0;x<10;x++)
{
printf("numbers[%d] = %d, address %p\n",
x+1,*pn,pn);
pn++;
}
return(0);
}
My question is really with line 17. I realized that if I do not reintialize the pointer again as in line 17, the peek values of pointer pn being displayed at the second for loop sequence are a bunch of garbage that do not make sense. Therefore, I would like to know why is there a need to reintialize the pointer pn again for the code to work as intended?
An array is not a pointer, but C allows you to assign the array to a pointer of the type of the variable of the array, with the effect that that pointer will point to the first item in the array. That's what pn = numbers does.
pn is a pointer to an int, not to an array. It points to a single integer. When you increment the pointer, it just shifts to the next memory location. The shift it makes is the size of the type of the pointer, so int in this case.
So what does this prove? Not that an array is a pointer, but only that an array is a continuous block of memory that consists of N times the size of the type of your array item.
When you run the second loop, your pointer arrives at a piece of memory that doesn't belong to the array anymore, and so you get 'garbage' which is just the information which happens to exist at that location.
If you want to iterate over the array again by incrementing a pointer, you will have to reinitialize that pointer to the first item. The for loop does only do one thing, which is counting to 10. It doesn't know about the array and it doesn't know about the pointer, so the loop isn't going to automatically reset the pointer for you.
Since pn is incremented in the first loop, after the first loop is finished, pn will point to an address beyond the numbers array. Therefore, you must initialize pn to the beginning of the array before the second loop since you use the same pointer for printing the contents.
Because you have changed the address contained in pn in the statement pn++ in the following code snippet.
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
The pn pointer is being used to point into the numbers array.
The first for-loop uses pn to set the values, stepping pn throught the data element by element. After the end of the loop, pn points off the end of numbers (at a non-allocated 11th element).
For the second for-loop to work, i.e. to use pn to loop through numbers again by stepping through the array, pn needs to be moved to the front of the numbers array, otherwise you'll access memory that you shouldn't be looking at (non-allocated memory).
First arrays are not pointers. They decay to pointers when used in function calls and can be used (almost) the same.
Some subtle differences
int a[5]; /* array */
int *pa = a; /* pointer */
pa[0] = 5;
printf("%d\n", a[0]); /* ok it is the same here */
printf("address of array %p - address of pointer %p, value of pointer\n",
&a, &pa, pa); /* &a is the same as pa not &pa */
printf("size of array %d - size of pointer %d\n", sizeof(a), sizeof(pa));
sizeof(a) is here 5 * sizeof(int) whereas sizeof(pa) is the size of a pointer.
Now for your question:
After first loop, pn points to p[10] and no longer to p[0]. That's the reason why you must reset it.
Just to drive the point home, arrays are not pointers. When you declare numbers as int numbers[10], you get the following in memory:
+---+
numbers: | | numbers[0]
+---+
| | numbers[1]
+---+
...
+---+
| | numbers[9]
+---+
There's no storage set aside for a separate pointer to the first element of numbers. What happens is that when the expression numbers appears anywhere, and it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to int", and the value of the expression is the address of the first element of the array.
What you're doing with pn is setting it to point to the first element of numbers, and then "walking" through the array:
+---+
numbers: | | <------+
+---+ |
| | |
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
The expression pn++ advances pn to point to the next integer object, which in this case is the next element of the array:
+---+
numbers: | |
+---+
| | <------+
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
Each pn++ advances the pointer until, at the end of the first loop, you have the following:
+---+
numbers: | |
+---+
| |
+---+
...
+---+
| |
+---+
... <------+
|
+---+ |
pn: | | -------+
+---+
At this point, pn is pointing to the object immediately following the end of the array. This is why you have to reset pn before the next loop; otherwise you're walking through the memory immediately following numbers, which can contain pretty much anything, including trap representations (i.e., bit patterns that don't correspond to a legal value for the given type).
Trying to access memory more than one past the end of an array invokes undefined behavior, which can mean anything from your code crashing outright to displaying garbage to working as expected.
During the fill array, the pointer pn is incremented and the data is placed on array. Same pointer variable used to print the array content. Since this reinitialise is done.

Pointer initialisation gives segmentation fault

I wrote a C program as follows:
CASE 1
int *a; /* pointer variable declaration */
int b; /* actual variable declaration */
*a=11;
a=&b;/* store address of b in pointer variable*/
It gives a segmentation fault when running the program.
I changed the code as follows:
CASE 2
int *a; /* pointer variable declaration */
int b; /* actual variable declaration */
a=&b;/* store address of b in pointer variable*/
*a=11;
Now it's working fine.
If anyone knows please explain why it is giving a segmentation fault in CASE 1.
CASE .1
int *a; /* pointer variable declaration */
int b; /* actual variable declaration */
*a=11;//Not valid means you are not owner of the address where a now pointing it is unknown and accessing this will segfault/
a=&b;/* store address of b in pointer variable*/
This is going to be segmentation fault because the address you are using is not a valid address and there you are storing 11 which is illegal.
b
+-------+ +--------+
| + | 11 |
|Unknown| | |
+---+---+ +---+----+
| |
| |
+ +
a a
CASE .2
int *a; /* pointer variable declaration */
int b; /* actual variable declaration */
a=&b;/* store address of b in pointer variable*/
*a=11;
Now its working fine because the address of b is valid an there you are storing 11 which is legal.
Also above cases are not correct way of pointer declaration
int *a = NUll;
a = malloc(sizeof(int));
*a=5;
free(a);//must
or
int *a = NUll;
int b;
a = &b;
*a=5;
This will remove segmentation fault many times which is hard to find .
int *a; // a pointer variable that can hold a memory address of a integer value.
In case 1,
*a = 10; // here you have asigned 10 to unknown memory address;
It shows segmentation fault because of assigning value to a memory address that is not defined. Undefined behaviour.
In case 2,
a=&b; // assigning a proper memory address to a.
*a=11;// assigning value to that address
Consider this example:
#include<stdio.h>
int main()
{
int *a,b=10;
printf("\n%d",b);
a=&b;
*a=100;
printf("-->%d",b);
}
Output: 10-->100
Here this is how it works.
b // name
----------
+ 10 + // value
----------
4000 // address
Asuuming memory location of b is 4000.
a=&b => a=4000;
*a=100 => *(4000)=100 => valueat(4000) => 100
After manipulation it looks like this.
b // name
----------
+ 100 + // value
----------
4000 // address
One line: First code you are dereferencing uninitialized pointer which exhibits undefined behaviour, and in the second code you are dereferencing initialized pointer which will give access to the value at the address.
A bit of explanation:
First you need to realize that a pointer is nothing but an integer, an with the *var we tell the compiler that we will be using the content of the variable var (the integer in it) as an address to fetch the value in that address. If there is **var similarly we tell the compiler that we will first use the stored value of the variable var to fetch the value at the address and again use this fetched value as an address and fetch the value stored in it.
Therefore in your first declaration it is:
+----------+ +----------+
| garbage | | garbage |
+----------+ +----------+
| a | | b |
+----------+ +----------+
| addr1 | | addr2 |
+----------+ +----------+
Then you try to use the value stored in a as an address. a contains garbage, it can be any value, but you do not have access to any address location. Therefore the next moment when you do *a it will use the stored value in a as an address. Because the stored value can be anything, anything can happen.
If you have permission to access the location , the code will continue to execute without a segmentation fault. If the address happens to be an address from the heap book-keeping structure, or other memory area which your code allocated from heap or stack then when you do *a = 10 it will simply wipe off the existing value with 10 in that location. This can lead to undefined behaviour as now you have changed something without the knowledge of the context having the actual authority of the memory. If you don't have permission to the memory, you simply get a segmentation fault. This is called dereferencing of an uninitialized pointer.
Next statement you do a = &b which just assigns the address of b in a. This doesn't help, because the previous line has dereferenced an uninitialized pointer.
Next code you have something like this after the third statement:
+----------+ +----------+
| addr2 |---+ | garbage |
+----------+ | +----------+
| a | +--> | b |
+----------+ +----------+
| addr1 | | addr2 |
+----------+ +----------+
The third statement assigns the address of b into a. Before that a is not dereferenced, therefore the garbage value stored in a before the initialization is never used as an address. Now when you assign a valid address of your knowledge into a, dereferencing a now will give you access to the value pointed to by a.
Extending the answer, you need to keep an eye that, even if you have assigned a valid address to a pointer, you have to make sure at the time of dereferencing the pointer the lifetime of the address pointed to by the pointer has not expired. For example returning local variable.
int foo (void)
{
int a = 50;
return &a; //Address is valid
}//After this `a' is destroyed (lifetime finishes), accessing this address
//results in undefined behaviour
int main (void)
{
int *v = foo ();
*v = 50; //Incorrect, contents of `v' has expired lifetime.
return 0;
}
Same in the case of accessing freed memory location from heap.
int main (void)
{
char *a = malloc (1);
*a = 'A'; //Works fine, because we have allocated memory
free (a); //Freeing allocated memory
*a = 'B'; //Undefined behaviour, we have already freed
//memory, it's not for us now.
return 0;
}
int a stores a random integer value. So saying by saying *a, you might be accessing a memory location that is out of bounds or invalid. So it is a seg fault.
In the first case you have declared a pointer but you have not assigned the address to which it has to point hence the pointer would have contained an address that would have belonged to another process in the system (or it would have contained an junk value which is not an address at all or it would have contained an null which can't be an memory address)hence operating system sends an signal to prevent invalid memory operation and hence an segmentation fault occurs.
In the second case you are assigning the address of the variable which has to be updated to the pointer and storing the value which is the correct way of doing and hence there's no segmentation fault.

Pointer in C (should be simple)

I tried code like this:
int *a;
*a = 10;
printf("%d",*a);
in eclipse and it is not printing out anything.
is it because I haven't give initial value to a?
Thank you that was helpful. I know it is problematic I just wasn't sure the exact problem
like, if I do printf("%d",a); I can see it does contain something, is it C's rule
that I do have to give it a place to point to then I can start to change the value in that address?
int *a; This defines a variable which is a pointer to an integer type. The pointer type variable a at the creation contains garbage value.
When you do *a = 10; it ties to use the value stored in a , which is garbage, as an address and store the value 10 there. Because we do not know what a contains and it is not allocated so a points to some memory location which is unknown and accessing it will be illegal, and will get you a segmentation fault (or something similar).
Same in the case of printf ("%d", *a); . This also tries to access the value stored at some undefined memory location which you have not allocated.
.
this variable is this is the location
stored in some with the address
address on the 'garbage'. You do not
stack have permissions to
access this
+-------+---------+
| name | value |
+-------+---------+ +---------+
| a | garbage |---->| ????? |
+-------+---------+ +---------+
After you have defined the pointer type variable, you need to request for some memory location from the operating system and use that memory location value to store into a, and then use that memory location through a.
To do that you need to do the following:
int *a;
a = malloc (sizeof (int)); /* allocates a block of memory
* of size of one integer
*/
*a = 10;
printf ("%d", *a);
free (a); /* You need to free it after you have used the memory
* location back to the OS yourself.
*/
In this case it is like below:
After *a = 10; . The pointer variable is allocated in the stack. At this moment the a contains a garbage value. Then a points to an address with that garbage value.
this variable is this is the location
stored in some with the address
address on the 'garbage'. You do not
stack have permissions to
access this
+-------+---------+
| name | value |
+-------+---------+ +---------+
| a | garbage |---->| ????? |
+-------+---------+ +---------+
After a = (int *) malloc (sizeof (int)); . Let us assume that malloc returns you some address 0x1234abcd, to be used. At this moment a will contain 0x1234abcd then a points to a valid memory location which was allocated and reserved for you to be used. But note that the value inside 0x1234abcd can be anything, ie. garbage. You can use calloc to set the contents of the memory locations you allocate to 0.
this variable is this is the location
stored in some 0x1234abcd , allocated
address on the by malloc, and reserved
stack for your program. You have
access to this location.
+-------+------------+
| name | value |
+-------+------------+ +---------+
| a | 0x1234abcd |---->| garbage|
+-------+------------+ +---------+
After *a = 10; , by *a you access the memory location 0x1234abcd and store 10 into it.
this variable is this is the location
stored in some 0x1234abcd , allocated
address on the by malloc, and reserved
stack for your program. You have
access to this location.
+-------+------------+
| name | value |
+-------+------------+ +---------+
| a | 0x1234abcd |---->| 10 |
+-------+------------+ +---------+
After free (a) , the contents of a ie. the memory address 0x1234abcd will be freed, ie returned back to the operating system. Note that after freeing the 0x1234abcd the contents of a is still 0x1234abcd , but you can no more access it legally, because you just freed it. Accessing the contents pointed by the address stored in a will result in undefined behavior, most probably a segmentation fault or heap corruption, as it is freed and you do not have access rights.
this variable is this is the location
stored in some 0x1234abcd , allocated
address on the by malloc. You have freed it.
stack Now you CANNOT access it legally
+-------+------------+
| name | value |
+-------+------------+ +---------+
| a | 0x1234abcd | | 10 |
+-------+------------+ +---------+
the contents of a remains
the same.
EDIT1
Also note the difference between printf ("%d", a); and printf ("%d", *a); . When you refer to a , it simply prints the contents of a that is 0x1234abcd. And when you refer *a then it uses 0x1234abcd as an address , and then prints the contents of the address, which is 10 in this case.
this variable is this is the location
stored in some 0x1234abcd , allocated
address on the by malloc, and reserved
stack for your program. You have
access to this location.
+-------+------------+
| name | value |
+-------+------------+ +---------+
| a | 0x1234abcd |---->| 10 |
+-------+------------+ +---------+
^ ^
| |
| |
(contents of 'a') (contents of the )
| (location, pointed )
printf ("%d", a); ( by 'a' )
|
+----------------+
|
printf ("%d", *a);
EDIT2
Also note that malloc can fail to get you some valid memory location. You should always check if malloc returned you a valid memory location. If malloc cannot get you some memory location to be used then it will return you NULL so you should check if the returned value is NULL or not before use. So finally the code becomes:
int *a;
a = malloc (sizeof (int)); /* allocates a block of memory
* of size of one integer
*/
if (a == NULL)
{
printf ("\nCannot allocate memory. Terminating");
exit (1);
}
*a = 10;
printf ("%d", *a);
free (a); /* You need to free it after you have used the memory
* location back to the OS yourself.
*/
You haven't allocated memory for a.
Try
int *a;
a = (int*)malloc(sizeof(int)); //Allocating memory for one int.
*a = 10;
printf("%d", *a);
free(a); //Don't forget to free it.
When you declare the pointer, the compiler is only going to reserve memory to store a pointer variable. If you want that pointer to point to something after the fact, it has to be something that had its own memory allocated. Either point it at something that was declared as an int, or allocate memory from the heap for an int.

Resources