Question about pointers and automatic storage duration in C - c

I'm studying pointers on my own and have a doubt about lifetime. Look at this example:
int* fun(){
int arr[10];
for(size_t c = 0;c<10;c++){
arr[c] = c + 10;
}
return arr;
}
int main() {
int* p;
p = fun();
printf("%p",p);
}
This example obviously will print a null address, because the array have been freed after function is finished. To fix it I've tried malloc and it have worked.
After my success I've tried another code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
int* fun(){
int arr[10];
for(size_t c = 0;c<10;c++){
arr[c] = c + 10;
}
int* arr2 = arr;
return arr2;
}
int main() {
int* p;
p = fun();
printf("%p",p);
}
Based on my studies, this example should print a null address again, but it have worked as well as the malloc solution. I can't understand why this happen. If the int arr[10] is freed, it's pointers should point to null right?

This example obviously will print a null address
No. The address returned from the function will be indeterminate, meaning it has no deterministic value. Likely it is the same address as where arr used to be, but the compiler is free to print anything it pleases, or treat the whole thing as a "trap representation" meaning it may toss an exception/signal etc when this happens.
When I run your code in gcc it decided to print "(nil)" output. While clang felt that the "obvious" result should be 0x7ffef0b3c870. The program might as well print 0xDEADBEEF and that would be conforming as well.
Based on my studies, this example should print a null address again
Your studies are based on wrong conclusions. You watched a car crash and it landed upside-down, then made the conclusion that when cars crash they always land upside-down.
In either of your examples, the value of the returned pointer is indeterminate.
Sources
C17 6.2.4/2:
If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
C17 3.19.2:
indeterminate value
either an unspecified value or a trap representation

Related

Pointers address location

As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.

Malloc(0)ing an array in Windows Visual Studio for C allows the program to run perfectly fine

The C program is a Damereau-Levenshtein algorithm that uses a matrix to compare two strings. On the fourth line of main(), I want to malloc() the memory for the matrix (2d array). In testing, I malloc'd (0) and it still runs perfectly. It seems that whatever I put in malloc(), the program still works. Why is this?
I compiled the code with the "cl" command in the Visual Studio developer command prompt, and got no errors.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
int main(){
char y[] = "felkjfdsalkjfdsalkjfdsa;lkj";
char x[] = "lknewvds;lklkjgdsalk";
int xl = strlen(x);
int yl = strlen(y);
int** t = malloc(0);
int *data = t + yl + 1; //to fill the new arrays with pointers to arrays
for(int i=0;i<yl+1;i++){
t[i] = data + i * (xl+1); //fills array with pointer
}
for(int i=0;i<yl+1;i++){
for(int j=0;j<xl+1;j++){
t[i][j] = 0; //nulls the whole array
}
}
printf("%s", "\nDistance: ");
printf("%i", distance(y, x, t, xl, yl));
for(int i=0; i<yl+1;i++){
for(int j=0;j<xl+1;j++){
if(j==0){
printf("\n");
printf("%s", "| ");
}
printf("%i", t[i][j]);
printf("%s", " | ");
}
}
}
int distance(char* y, char* x, int** t, int xl, int yl){
int isSub;
for(int i=1; i<yl+1;i++){
t[i][0] = i;
}
for(int j=1; j<xl+1;j++){
t[0][j] = j;
}
for(int i=1; i<yl+1;i++){
for(int j=1; j<xl+1;j++){
if(*(y+(i-1)) == *(x+(j-1))){
isSub = 0;
}
else{
isSub = 1;
}
t[i][j] = minimum(t[i-1][j]+1, t[i][j-1]+1, t[i-1][j-1]+isSub); //kooks left, above, and diagonal topleft for minimum
if((*(y+(i-1)) == *(x+(i-2))) && (*(y+(i-2)) == *(x+(i-1)))){ //looks at neighbor characters, if equal
t[i][j] = minimum(t[i][j], t[i-2][j-2]+1, 9999999); //since minimum needs 3 args, i include a large number
}
}
}
return t[yl][xl];
}
int minimum(int a, int b, int c){
if(a < b){
if(a < c){
return a;
}
if(c < a){
return c;
}
return a;
}
if(b < a){
if(b < c){
return b;
}
if(c < b){
return c;
}
return b;
}
if(a==b){
if(a < c){
return a;
}
if(c < a){
return c;
}
}
}
Regarding malloc(0) part:
From the man page of malloc(),
The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
So, the returned pointer is either NULL or a pointer which can only be pasxed to free(), you cannot expect to dereference that pointer and store something into the memory location.
In either of the above cases, you're trying to to use a pointer which is invalid, it invokes undefined behavior.
Once a program hits UB, the output of that cannot be justified anyway.
One of the major outcome of UB is "working fine" (as "wrongly" expected), too.
That said, follwing the analogy
"you can allocate a zero-sized allocation, you just must not dereference it"
some of the memory debugger applications hints that usage of malloc(0) is potentially unsafe and red-zones the statements including a call to malloc(0).
Here's a nice reference related to the topic, if you're interested.
Regarding malloc(<any_size>) part:
In general, accessing out of bound memory is UB, again. If you happen to access outside the allocated memory region, you'll invoke UB anyways, and the result you speculate cannot be defined.
FWIW, C itself does not impose/ perform any boundary checking on it's own. So, you're not "restricted" (read as "compiler error") from accessing out of bound memory, but doing so invokes UB.
It seems that whatever I put in malloc(), the program still works. Why is this?
int** t = malloc(0);
int *data = t + yl + 1;
t + yl + 1 is undefined behavior (UB). Rest of code does not matter.
If t == NULL, adding 1 to it is UB as adding 1 to a null pointer is invalid pointer math.
If t != NULL, adding 1 to it is UB as adding 1 to that pointer is more than 1 beyond the allocating space.
With UB, the pointer math may worked as hope as typical malloc() allocates larges chunks, not necessarily the small size requested. It may crash on another platform/machine or another day or phase of the moon. The code is not reliable even if it works with light testing.
You just got lucky. C does not do rigorous bounds checking because it has a performance cost. Think of a C program as a raucous party happening in a private building, where the OS police are stationed outside. If somebody throws a rock that stays inside the club (an example of an invalid write that violates the ownership convention within the process but stays within the club boundaries) the police don't see it happening and take no action. But if the rock is thrown and it goes flying dangerously out the window (an example of a violation that is noticed by the operating system) the OS police step in and shut the party down.
The C standard says:
If the size of the space requested is zero, the behavior is implementation-defined; the value returned shall be either a null pointer or a unique pointer. [7.10.3]
So we have to check what your implementation says. The question says "Visual Studio," so let's check Visual C++'s page for malloc:
If size is 0, malloc allocates a zero-length item in the heap and returns a valid pointer to that item.
So, with Visual C++, we know that you're going to get a valid pointer rather than a null pointer.
But it's just a pointer to a zero-length item, so there's not really anything safe you can do with that pointer except pass it to free. If you dereference the pointer, the code is allowed to do anything it wants. That's what's meant by "undefined behavior" in the language standards.
So why does it appear to work? Probably because malloc returned a pointer to at least a few bytes of valid memory since the easiest way for malloc to give you a valid pointer to a zero-length item is to pretend you really asked for at least one byte. And then the alignment rules would round that up to something like 8 bytes.
When you dereference the beginning of that allocation, you likely have some valid memory. What you're doing is strictly illegal, non-portable, but, with this implementation, likely to work. When you index farther into it, you'll likely start corrupting other data structures (or metadata) in the heap. If you index even father into it, you're increasingly likely to crash due to hitting an unmapped page.
Why does the standard allow malloc(0) to be implementation-defined instead of just requiring it to return a null pointer?
With pointers, it's sometimes hand to have special values. The most obvious being the null pointer. The null pointer is just a reserved address that will never be used for valid memory. But what if you wanted another special pointer value that had some meaning to your program?
In the dark days before the standard, some mallocs allowed you to effectively reserve additional special pointer values by calling malloc(0). They could have used malloc(1) or any other very small size, but malloc(0) made it clear that you just wanted to reserve and address rather than actual space. So there were many programs that depended on this behavior.
Meanwhile, there were programs that expected malloc(0) to return a null pointer, since that's what their library had always done. When the standards people looked at the existing code and how it used the library, they decided they couldn't choose one method over the other without "breaking" some of the code out there. So they allowed malloc's behavior to remain "implementation-defined."

Is malloc needed for this int pointer example?

The following application works with both the commented out malloced int and when just using an int pointer to point to the local int 'a.' My question is if this is safe to do without malloc because I would think that int 'a' goes out of scope when function 'doit' returns, leaving int *p pointing at nothing. Is the program not seg faulting due to its simplicity or is this perfectly ok?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct ht {
void *data;
} ht_t;
ht_t * the_t;
void doit(int v)
{
int a = v;
//int *p = (int *) malloc (sizeof(int));
//*p = a;
int *p = &a;
the_t->data = (void *)p;
}
int main (int argc, char *argv[])
{
the_t = (ht_t *) malloc (sizeof(ht_t));
doit(8);
printf("%d\n", *(int*)the_t->data);
doit(4);
printf("%d\n", *(int*)the_t->data);
}
Yes, dereferencing a pointer to a local stack variable after the function is no longer in scope is undefined behavior. You just happen to be unlucky enough that the memory hasn't been overwritten, released back to the OS or turned into a function pointer to a demons-in-nose factory before you try to access it again.
Not every UB (undefined behaviour) results in a segfault.
Normally, the stack's memory won't be released back to the OS (but it might!) so that accessing that memory works. But at the next (bigger) function call, the memory where the pointer points to might be overwritten, so your data you felt like bing safe there is lost.
malloc() inside a function call will work because it stores on heap.
The pointer remains until you free the memory.
And yes, not every undefined behaviour will result in segmentation fault.
It does not matter that p does not point at anything on return from doit.
Because, you know, p isn't there any more either.
It does matter that you are reading the pointer to a no-longer-existing object in main though. That's UB, but harmless on most modern platforms.
Even worse though, you are reading the non-existent object it once pointed to, which is straight Undefined Behavior.
Still, nobody is obliged to catch you:
Can a local variable's memory be accessed outside its scope?
As an aside, Don't cast the result of malloc (and friends).
Also, be aware that not free-ing memory is not harmless on all platforms: Can I avoid releasing allocated memory in C with modern OSes?
Yes, the malloc is needed, otherwise the pointer would point on the 4-byte space of the stack, which would be used by other data, if you called other functions or created now local variables.
You can see that if you call that function afterwards with a value of 10:
void use_memory(int i)
{
int f[128]={};
if(i>0)
{
use_memory(i-1);
}
}

Does local variable be deleted from memory when this function is called in main [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Can a local variable's memory be accessed outside its scope?
returning address of local variable
I have a question, first of all look at the code
#include <stdio.h>
int sum(); /* function declaration */
int main()
{
int *p2;
p2 = sum(); /* Calling function sum and coping its return type to pointer variable p2 */
printf("%d",*p2);
} /* END of main */ `
int sum()
{
int a = 10;
int *p = &a;
return p;
} /* END of sum */
I think the answer is 10 and address of variable a, but my tesacher says that a is local to the function come so a and its
value will be deleted from the memory location when the function returns or is finished executing. I tried this code and the answer is weel of course 10 and address of a, I use the GNU/GCC compiler. Can anyone say what is right and wrong.
Thanks in advance.
Your teacher is absolutely right: even if you fix your program to return int* in place of int, your program still contains undefined behavior. The issue is that the memory in which a used to be placed is ripe for reuse once sum returns. The memory may stay around untouched for you to access, so you might even print ten, but this behavior is still undefined: it may run on one platform and crash on ten others.
You may get the right result but that is just because you are lucky, by the time the sum() is returned, the memory of a is returned to the system, and it can be used by any other variables, so the value may be changed.
For example:
#include <stdio.h>
int* sum(); /* function declaration */
int* sum2(); /* function declaration */
int main()
{
int *p2;
p2 = sum(); /* Calling function sum and coping its return type to pointer variable p2 */
sum2();
printf("%d",*p2);
}
int* sum()
{
int a = 10;
int *p = &a;
return p;
} /* END of sum */
int* sum2()
{
int a = 100;
int *p = &a;
return p;
} /* END of sum */
With this code, the a will be reused by sum2() thus override the memory value with 100.
Here you just return a pointer to int, suppose you are returning an object:
TestClass* sum()
{
TestClass tc;
TestClass *p = &tc;
return p;
}
Then when you dereference tc, weird things would happen because the memory it points to might be totally screwed.
Your pointer still points to a place in memory where your 10 resides. However, from Cs point of view that memory is unallocated and could be reused. Placing more items on the stack, or allocating memory could cause that part of memory to be reused.
The object a in the function sum has automatic lifetime. It's lifetime will be ended as soon as the scope in which it has been declared in (the function body) is left by the program flow (the return statement at the end of the function).
After that, accessing the memory at which a lived will either do what you expect, summon dragons or set your computer on fire. This is called undefined behavior in C. The C standard itself, however, says nothing about deleting something from memory as it has no concept of memory.
Logically speaking, a no longer exists once sum exits; its lifetime is limited to the scope of the function. Physically speaking, the memory that a occupied is still there and still contains the bit pattern for the value 10, but that memory is now available for something else to use, and may be overwritten before you can use it in main. Your output may be 10, or it may be garbage.
Attempting to access the value of a variable outside that variable's lifetime leads to undefined behavior, meaning the compiler is free to handle the situation any way it wants to. It doesn't have to warn you that you're doing anything hinky, it doesn't have to work the way you expect it to, it doesn't have to work at all.

Dangling Pointer in C

I wrote a program in C having dangling pointer.
#include<stdio.h>
int *func(void)
{
int num;
num = 100;
return &num;
}
int func1(void)
{
int x,y,z;
scanf("%d %d",&y,&z);
x=y+z;
return x;
}
int main(void)
{
int *a = func();
int b;
b = func1();
printf("%d\n",*a);
return 0;
}
I am getting the output as 100 even though the pointer is dangling.
I made a single change in the above function func1(). Instead of taking the value of y and z from standard input as in above program, now I am assigning the value during compile time.
I redefined the func1() as follows:
int func1(void)
{
int x,y,z;
y=100;
z=100;
x=y+z;
return x;
}
Now the output is 200.
Can somebody please explain me the reason for the above two outputs?
Undefined Behavior means anything can happen, including it'll do as you expect. Your stack variables weren't overwritten in this case.
void func3() {
int a=0, b=1, c=2;
}
If you include a call to func3() in between func1 and printf you'll get a different result.
EDIT: What actually happens on some platforms.
int *func(void)
{
int num;
num = 100;
return &num;
}
Let's assume, for simplicity, that the stack pointer is 10 before you call this function, and that the stack grows upwards.
When you call the function, the return address is pushed on stack (at position 10) and the stack pointer is incremented to 14 (yes, very simplified). The variable num is then created on stack at position 14, and the stack pointer is incremented to 18.
When you return, you return a pointer to address 14 - return address is popped from stack and the stack pointer is back to 10.
void func2() {
int y = 1;
}
Here, the same thing happens. Return address pushed at position, y created at position 14, you assign 1 to y (writes to address 14), you return and stack pointer's back to position 10.
Now, your old int * returned from func points to address 14, and the last modification made to that address was func2's local variable assignment. So, you have a dangling pointer (nothing above position 10 in stack is valid) that points to a left-over value from the call to func2
It's because of the way the memory gets allocated.
After calling func and returning a dangling pointer, the part of the stack where num was stored still has the value 100 (which is what you are seeing afterwards). We can reach that conclusion based on the observed behavior.
After the change, it looks like what happens is that the func1 call overwrites the memory location that a points to with the result of the addition inside func1 (the stack space previously used for func is reused now by func1), so that's why you see 200.
Of course, all of this is undefined behavior so while this might be a good philosophical question, answering it doesn't really buy you anything.
It's undefined behavior. It could work correctly on your computer right now, 20 minutes from now, might crash in an hour, etc. Once another object takes the same place on the stack as num, you will be doomed!
Dangling pointers (pointers to locations that have been disassociated) induce undefined behavior, i.e. anything can happen.
In particular, the memory locations get reused by chance* in func1. The result depends on the stack layout, compiler optimization, architecture, calling conventions and stack security mechanisms.
With dangling pointers, the result of a program is undefined. It depends on how the stack and the registers are used. With different compilers, different compiler versions and different optimization settings, you'll get a different behavior.
Returning a pointer to a local variable yields undefined behaviour, which means that anything the program does (anything at all) is valid. If you are getting the expected result, that's just dumb luck.
Please study functions from basic C. Your concept is flawed...main should be
int main(void)
{
int *a = func();
int b;
b = func1();
printf("%d\n%d",*a,func1());
return 0;
}
This will output 100 200

Resources