Assume we have the following:
int main(void) {
char* ptr;
printf("%c\n",ptr[24]); // junk value
ptr[24] = 'H';
printf("%c\n", ptr[24]); // prints H
return 0;
}
When I change the junk value to something else, does that mean I am corrupting memory or is this value literally junk so it doesn't matter what new value I assign to it?
Your program exhibits undefined behaviour which means: Literally anything may happen and it's still be coverd by the standard as being undefiend. And when I say anything, I mean it in the full extent. It would be even valid for your computer becoming sentient and chase you down the street.
Well, what's usually happens, but that's not warranted, is that you're writing into unmapped address space (on a modern OS with paged memory) causing a segmentation fault or a bus error (depending on architecture, OS and runtime implementation).
ptr is an unitialized pointer, which means the pointer's value is yet to be defined. A undefined pointer, by definition, points to nothing and everything, i.e. no valid object at all. The only way to make that pointer valid is assigning it the address of a proper C object of the type the pointer dereferences to.
BTW: Plain C has very, very strict typing rules. I sometimes say it's even stricter than C++, because its lack of the implicit conversion operator and function overloading. But its sloppy type casting and bad compilers ruined its reputation with respect to type safety.
You are accessing invalid memory locations which invokes undefined behavior. Anything might happen, it can't be predicted.
Since most C implementations allow you to access invalid memory locations, you are actually assigning the 'H' value to that position.
But you cannot trust what's gonna happen next. Maybe your program fails, maybe you damage memory in use by other program, or, in a multithreaded environment, another program may overwrite that value.
Related
I am new to this particular forum, so if there are any egregious formatting choices, please let me know, and I will promptly update.
In the book C Programming: A Modern Approach (authored by K. N. King), the following passage is written:
If a pointer variable p hasn't been initialized, attempting to use the value of p in any way causes undefined behavior. In the following example, the call of printf may print garbage, cause the program to crash, or have some other effect:
int *p;
printf("%d", *p);
As far as I understand pointers and how the compiler treats them, the declaration int *p effectively says, "Hey, if you dereference p in the future, I will look at a block of four consecutive bytes in memory, whose starting address is the value contained in p, and interpret those 4 bytes as a signed integer."
As to whether or not that is correct...if it is correct, then I am a little confused about why the aforementioned block of code:
is classified as undefined behavior
can cause programs to crash
can have some other effect
Commenting on the above-numbered cases:
My understanding of undefined behavior is that, at run time, anything can happen. With that being said, in the above code it appears to me that only a very defined subset of things can happen. I understand that p (due to its lack of initialization) is storing a random address that could point anywhere in memory. However, when printf is passed the dereferenced value *p, won't the compiler just look at the 4 consecutive bytes of memory (which start at whatever random address) and interpret those 4 bytes as a signed integer?
Therefore, printf should only do one thing: print a number that ranges anywhere from -2,147,483,648 to 2,147,483,647. Clearly that is a lot of different possible outputs, but does that really qualify as "undefined behavior". Further, how could such an "undefined behavior" lead to "program crash" or "have some other effect".
Any clarification would be greatly appreciated! Thanks!
The value of an uninitialized value is indeterminate. It could hold any value (including 0), and it's even possible that a different value could be read each time you attempt to read it. It's also possible that the value could be a trap representation, meaning that attempting to read it will trigger a processor exception that can crash the program.
Assuming you got lucky and were able to read a value for p, due to the virtual memory model most systems use that value may not correspond to an address that is mapped to the process's memory space. So if you attempt to read from that address by dereferencing the pointer it triggers a segmentation fault that can crash the program.
Notice that in both of these scenarios the crash occurs before printf is even called.
Also, compilers are allowed to assume your program does not have undefined behavior and will perform optimizations based on that assumption. That can make your program behave in ways you might not expect.
As for why doing these things is undefined behavior, it is because the C standard says so. In particular, appendix J2 gives as an example of undefined behavior:
The value of an object with automatic storage duration
is used while it is indeterminate. (6.2.4, 6.7.9, 6.8)
Undefined Behavior is defined as "we are not specifying what must happen, it's up to the implementers."
In a practical sense, *p is likely to contain whatever that memory area held last, maybe zeros, maybe something more random, maybe a chunk of data from a previous use. On occasion, a compiler will implicitly zero memory for safeties sake, sacrificing a bit of time to offer that feature.
Notably, if p were defined as a char*, and you printf'ed it, it'd try to print contents until it found a 0x00. If that takes you to a memory boundary, you could get a segmentation fault.
//this code should give segmentation error....but it works fine ....how is it possible.....i just got this code by hit and trail whle i was trying out some code of topic ARRAY OF POINTERS....PLZ can anyone explain
int main()
{
int i,size;
printf("enter the no of names to be entered\n");
scanf("%d",&size);
char *name[size];
for(i=0;i<size;i++)
{
scanf("%s",name[i]);
}
printf("the names in your array are\n");
for(i=0;i<size;i++)
{
printf("%s\n",&name[i]);
}
return 0
The problem in your code (which is incomplete, BTW; you need #include <stdio.h> at the top and a closing } at the bottom) can be illustrated in a much shorter chunk of code:
char *name[10]; // make the size an arbitrary constant
scanf("%s", name[0]); // Read into memory pointed to by an uninitialized pointer
(name could be a single pointer rather than an array, but I wanted to preserve your program's structure for clarity.)
The pointer name[0] has not been initialized, so its value is garbage. You pass that garbage pointer value to scanf, which reads characters from stdin and stores them in whatever memory location that garbage pointer happens to point to.
The behavior is undefined.
That doesn't mean that the program will die with a segmentation fault. C does not require checking for invalid pointers (nor does it forbid it, but most implementations don't do that kind of checking). So the most likely behavior is that your program will take whatever input you provide and attempt to store it in some arbitrary memory location.
If the garbage value of name[0] happens to point to a detectably invalid memory location, your program might die with a segmentation fault. That's if you're luck. If you're not, it might happen to point to some writable memory location that your program is able to modify. Storing data in that location might be harmless, or it might clobber some critical internal data structure that your program depends on.
Again, your program's behavior is undefined. That means the C standard imposes no requirements on its behavior. It might appear to "work", it might blow up in your face, or it might do anything that it's physically possible for a program to do. Apparently to behave correctly is probably the worst consequence of undefined behavior, since it makes it difficult to diagnose the problem (which will probably appear during a critical demo).
Incidentally, using scanf with a %s format specifier is inherently unsafe, since there's no way to limit the amount of data it will attempt to read. Even with a properly initialized pointer, there's no way to guarantee that it points to enough memory to hold whatever input it receives.
You may be accustomed to languages that do run-time checking and can reliably detect (most) problems like this. C is not such a language.
I'm not sure what's your test case (No enough reputation to post a comment). I just try to input it with 0 and 1\n1\n2\n.
It's a little complex to explain the detail. However, Let's start it :-). There are two things you should know. First, main() is a function. Second, you use a C99 feature, variable-length array or gnu extension, zero-length array (supported by gcc), on char *name[size];.
main() is a function, so all the variable declared in this function is local variables. Local variables locate at stack section. You must know about it first.
If you input 1\n1\n2\n, the variable-length array is used. The implementation of it is also to allocate it on stack. Notice that value of each element in array is not initialized as 0. That is the possible answer for you to execute without segmentation fault. You cannot make sure that it'll point to the address which isn't writable (At least failed on me).
If the input is 0\n, you will use extension feature, zero-length array, supported by GNU. As you saw, it means no element in array. The value of name is equal to &size, because size is the last local variable you declared before you declared name[0] (Consider stack pointer). The value of name[0] is equal to dereference to &size, that's zero (='\0') , so it will work fine.
The simple answer to your question is that a segmentation fault is:
A segmentation fault (aka segfault) are caused by a program trying to read or write an illegal memory location.
So it all depends upon what is classed as illegal. If the memory in question is a part of the valid address space, e.g. the stack, for the process the program is running, it may not cause a segfault.
When I run this code in a debugger the line:
scanf("%s, name[i]);
over writes the content of the size variable, clearly not the intended behaviour, and the code essentially goes into an infinite loop.
But that is just what happens on my 64 bit Intel linux machine using gcc 5.4. Another environment will probably do something different.
If I put the missing & in front of name[i] it works OK. Whether that is luck, or expertly exploiting the intended behaviour of C99 variable length arrays, as suggested. I'm afraid I don't know.
So welcome to the world of subtle memory overwriting bugs.
MISRA C 2012 directive 4.12 is "Dynamic memory allocation should not be used".
As an example, the document provides this sample of code:
char *p = (char *) malloc(10);
char *q;
free(p);
q = p; /* Undefined behaviour - value of p is indeterminate */
And the document states that:
Although the value stored in the pointer is unchanged following the
call to free, it is possible, on some targets, that the memory to
which it points no longer exists and the act of copying that pointer
could cause a memory exception.
I'm ok with almost all the sentence but the end. As p and q are both allocated on the stack, how can the copy of the pointers cause a memory exception ?
According to the Standard, copying the pointer q = p;, is undefined behaviour.
Reading J.2 Undefined behaviour states:
The value of a pointer to an object whose lifetime has ended is used (6.2.4).
Going to that chapter we see that:
6.2.4 Storage durations of objects
The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address,33)and retains
its last-stored value throughout its lifetime.34)If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
What is indeterminate:
3.19.2 indeterminate value:
either an unspecified value or a trap representation
Once you free an object through the pointer, all pointers to that memory become indeterminate. (Even) reading indeterminate memory is undefined behaviour (UB). Following is UB:
char *p = malloc(5);
free(p);
if(p == NULL) // UB: even just reading value of p as here, is UB
{
}
First, some history...
When ISO/IEC JTC1/SC22/WG14 first started to formalise the C Language (to produce what is now ISO/IEC 9899:2011) they had a problem.
Many compiler vendors had interpreted things in different ways.
Early on, they made a decision to not break any existing functionality... so where compiler implementations were divergent, the Standard offers unspecified and undefined behaviours.
MISRA C attempts to trap the pit-falls that these behaviours will trigger. So much for the theory...
--
Now to the specific of this question:
Given that the point of free() is to release the dynamic memory back to the heap, there were three possible implementations, all of which were "in the wild":
reset the pointer to NULL
leave the pointer as was
destroy the pointer
The Standard could not mandate any one of these, so formally leaves the behaviour as undefined - your implementation may follow one path, but a different compiler could do something else... you cannot assume, and it is dangerous to rely on a method.
Personally, I'd rather the Standard was specific, and required free() to set the pointer to NULL, but that's just my opinion.
--
So the TL;DR; answer is, unfortunately: because it is!
While both p and q are both pointer variables on the stack, the memory address returned by malloc() is not on the stack.
Once a memory area that was successfully malloced is freed then at that point there is no telling who may be using the memory area or the disposition of the memory area.
So once free() is used to free an area of memory previously obtained using malloc() an attempt to use the memory area is an undefined type of action. You might get lucky and it will work. You might be unlucky and it will not. Once you free() a memory area, you no longer own it, something else does.
The issue here would appear to be what machine code is involved in copying a value from one memory location to another. Remember that MISRA targets embedded software development so the question is always what kind of funky processors are out there that do something special with a copy.
The MISRA standards are all about robustness, reliability, and eliminating risk of software failure. They are quite picky.
The value of p cannot be used as such after the memory it points to has been freed. More generally, the value of an uninitialized pointer has the same status: even just reading it for the purpose of copying to invokes undefined behavior.
The reason for this surprising restriction is the possibility of trap representations. Freeing the memory pointed to by p can make its value become a trap representation.
I remember one such target, back in the early 1990s that behaved this way. Not en embedded target then and rather in widespread use then: Windows 2.x. It used the Intel architecture in 16-bit protected mode, where pointers were 32-bit wide, with a 16-bit selector and a 16-bit offset. In order to access the memory, pointers were loaded in a pair of registers (a segment register and an address register) with a specific instruction:
LES BX,[BP+4] ; load pointer into ES:BX
Loading the selector part of the pointer value into a segment register had the side effect of validating the selector value: if the selector did not point to a valid memory segment, an exception would be fired.
Compiling the innocent looking statement q = p; could be compiled in many different ways:
MOV AX,[BP+4] ; loading via DX:AX registers: no side effects
MOV DX,[BP+6]
MOV [BP-6],AX
MOV [BP-4],DX
or
LES BX,[BP+4] ; loading via ES:BX registers: side effects
MOV [BP-6],BX
MOV [BP-4],ES
The second option has 2 advantages:
The code is more compact, 1 less instruction
The pointer value is loaded into registers that can be used directly to dereference the memory, which can result in fewer instructions generated for subsequent statements.
Freeing the memory may unmap the segment and make the selector invalid. The value becomes a trap value and loading it into ES:BX fires an exception, also called trap on some architectures.
Not all compilers would use the LES instruction for just copying pointer values because it was slower, but some did when instructed to generate compact code, a common choice then as memory was rather expensive and scarce.
The C Standard allows for this and describes a form of undefined behavior the code where:
The value of a pointer to an object whose lifetime has ended is used (6.2.4).
because this value has become indeterminate as defined this way:
3.19.2 indeterminate value: either an unspecified value or a trap representation
Note however that you can still manipulate the value by aliasing via a character type:
/* dumping the value of the free'd pointer */
unsigned char *pc = (unsigned char*)&p;
size_t i;
for (i = 0; i < sizeof(p); i++)
printf("%02X", pc[i]); /* no problem here */
/* copying the value of the free'd pointer */
memcpy(&q, &p, sizeof(p)); /* no problem either */
There are two reasons that code which examines a pointer after freeing it is problematic even if the pointer is never dereferenced:
The authors of the C Standard did not wish to interfere with implementations of the language on platforms where pointers contain information about the surrounding memory blocks, and which might validate such pointers whenever anything is done with them, whether they are dereferenced or not. If such platforms exist, code which uses pointers in violation of the Standard might not work with them.
Some compilers operate on the presumption that a program will never receive any combination of inputs that would invoke UB, and thus any combination of inputs that would produce UB should be presumed impossible. As a consequence of this, even forms of UB which would have no detrimental effect on the target platform if a compiler simply ignored them may end up having arbitrary and unlimited side-effects.
IMHO, there is no reason why equality, relational, or pointer-difference
operators upon freed pointers should have any adverse effect on any
modern system, but because it is fashionable for compilers to apply crazy
"optimizations", useful constructs which should be usable on commonplace
platforms have become dangerous.
The poor wording in the sample code is throwing you off.
It says "value of p is indeterminate", but it is not the value of p that is indeterminate, because p still has the same value (the address of a memory block which has been released).
Calling free(p) does not change p -- p is only changed once you leave the scope in which p is defined.
Instead, it is the value of what p points to that is indeterminate, since the memory block has been released, and it may as well be unmapped by the operating system. Accessing it either through p or through an aliased pointer (q) may cause an access violation.
An important concept to internalize is the meaning of "indeterminate" or "undefined" behavior. It is exactly that: unknown and unknowable. We would often tell students "It is perfectly legitimate for your computer to melt into a shapeless blob, or for the disk to fly off to Mars". As I read the original documentation included, I did not see any place it said to not use malloc. It merely points out that an erroneous program will fail. Actually, having the program take a memory exception is a Good Thing, because it tells you immediately that your program is defective. Why the document suggests this might be a Bad Thing escapes me. What is a Bad Thing is that on most architectures, it will NOT take a memory exception. Continuing to use that pointer will produce erroneous values, potentially render the heap unusable, and, if that same block of storage is allocated for a different use, corrupting the valid data of that use, or interpreting its values as your own. Bottom line: don't use 'stale' pointers! Or, to put it another way, writing defective code means that it won't work.
Furthermore, the act of assigning p to q is most decidedly NOT "undefined". The bits stored in the variable p, which are meaningless nonsense, are quite easily, and correctly, copied to q. All this means now is that any value that is accessed by p can now also be accessed by q, and since p is undefined nonsense, q is now undefined nonsense. So using either one of them to read or write will produce "undefined" results. If you are lucky enough to be running on an architecture that can cause this to take a memory fault, you will easily detect the improper usage. Otherwise, using either pointer means your program is defective. Plan on spending a lot of hours finding it.
I spent an embarrassing amount of time last night tracking down a segfault in my application. Ultimately, it turned out I'd written:
ANNE_SPRITE_FRAME *desiredFrame;
*desiredFrame = anne_sprite_copy_frame(&sprite->current);
instead of:
ANNE_SPRITE_FRAME desiredFrame;
desiredFrame = anne_sprite_copy_frame(&sprite->current);
In line 1 I created a typed pointer, and in line 2 I set the value of the dereferenced pointer to the struct returned by anne_sprite_copy_frame().
Why was this a problem? And why did the compiler accept this at all? All I can figure is that the problem in example 1 is either:
I'm reserving space for the pointer but not the contents that it points to, or
(unlikely) it's trying to store the return value in the memory of the pointer itself
In line 1 I've created a typed pointer, and in line 2 I set the value of the dereferenced pointer to the struct returned by anne_sprite_copy_frame().
Both of these are allowed in C, which is why this is perfectly acceptable by the compiler.
The compiler doesn't check to make sure your pointer actually points to anything meaningful - it just dereferences and assigns.
One of the best and worst features of C is that the compiler does very little sanity checking for you - it follows your instructions, and does exactly what you tell it to do. You told it to do two legal operations - even though the variables were not initialized properly. As such, you get runtime issues, not compile time problems.
I'm reserving space for the pointer but not the contents that it points to
Yeah, exactly. But the compiler (unless it does some static analysis) can't infer that. It only sees that the syntax is valid and the types match, so it compiles your program. Dereferencing an uninitialized pointer is undefined behavior, though, so your program will most likely work erroneously.
The pointer is uninitialized, but it still has a value so it points somewhere. Writing the return value to that memory address overwrites whatever happens to be there, invoking undefined behavior.
Technically the compiler is not in the business of telling you that a syntactically valid construct will result in undefined (or even likely unexpected) behavior, but I would be surprised if there was no warning issued about this particular usage.
C is weakly typed. You can assign anything to anything with the obvious consequences. You have to be very careful and disciplined if you do not want to spend nights uncovering bugs that turn out "stupid". I mean no offense. I went through the same issues due to an array bound overflow that overwrote other variables and only showed up in some other part of the code trying to use these variables. Nightmare! That's why Java is so much easier to deal with. With C you are an acrobat without a net, with Java, you can afford to fall. That said, I do not mean to say Java is better. C has its raison d'etre.
Working my way through a C tutorial
#include <stdio.h>
int main() {
short s = 10;
int i = *(int *)&s; // wonder about this
printf("%i", i);
return 0;
}
When I tell C that the address of s is an int, should it not read 4 bytes?
Starting from the left most side of 2 bytes of s. In which case is this not critically dangerous as I don't know what it is reading since the short only assigned 2 bytes?
Should this not crash for trying to access memory that I haven't assigned/belong-to-me?
Don't do that ever
Throw away the tutorial if it teaches/preaches that.
As you pointed out it will read more bytes than that were actually allocated, so it reads off some garbage value from the memory not allocate by your variable.
In fact it is dangerous and it breaks the Strict Aliasing Rule[Detail below] and causes an Undefined Behavior.
The compiler should give you a warning like this.
warning: dereferencing type-punned pointer will break strict-aliasing rules
And you should always listen to your compiler when it cries out that warning.
[Detail]
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)
The exception to the rule is a char*, which is allowed to point to any type.
First of all, never do this.
As to why it doesn't crash: since s is a local, it's allocated on the stack. If short and int have different sizes in your architecture (which is not a given), then you will probably end up reading a few more bytes from memory that's on the same memory page as the stack; so and there will be no access violation (even though you will read garbage).
Probably.
This is dangerous and undefined behaviour, just as you said.
The reason why it doesn't crash on 32 (or 64) bit platforms is that most compilers allocate atleast 32 bits for each stack variable. This makes the access faster, but on e.g. 8 bit processor you would get garbage data in the upper bits instead.
No it's not going to crash your program, however it is going to be reading a portion of other variables (or possibly garbage) on the stack. I don't know what tutorial you got this from, but that kind of code is scary.
First of all, all addresses are of the same size and if you're in a 64bit architecture, each char *, short * or int * will have 8 bytes.
When using a star before an ampersand it will cancel the effect, so *&x is semantically equivalent to just x.
Basically you are right in the sense that since you are accessing an int * pointer, this will fetch 4 bytes instead of the only 2 reserved for 's' storage and the resulting content won't be a perfect reflection of what 's' really means.
However this most likely won't crash since 's' is located on the stack so depending on how your stack is laid out at this point, you will most likely read data pushed during the 'main' function prologue...
See for a program to crash due to invalid read memory access, you need to access a memory region that is not mapped which will trigger a 'segmentation fault' at the userworld level while a 'page fault' at the kernel level. By 'mapped' I mean you have a known mapping between a virtual memory region and a physical memory region (such mapping is handled by the operating system). That is why if you access a NULL pointer you will get such exception because there is no valid mapping at the userworld level. A valid mapping will usually be given to you by calling something like malloc() (note that malloc() is not a syscall but a smart wrapper around that manages your virtual memory blocks). Your stack is no exception since it is just memory like anything else but some pre-mapped area is already done for you so that when you create a local variable in a block you don't have to worry about its memory location since that's handled for you and in this case you are not accessing far enough to reach something non-mapped.
Now let's say you do something like that:
short s = 10;
int *i = (int *)&s;
*i = -1;
Then in this case your program is more likely to crash since in this case you start overwriting data. Depending on the data you are touching the effect of this might range from harmless program misbehavior to a program crash if for instance you overwrite the return address pushed in the stack... Data corruption is to me one of the hardest (if not the hardest) bugs category to deal with since its effect can affect your system randomly with non-deterministic pattern and might happen long after the original offending instructions were actually executed.
If you want to understand more about internal memory management, you probably want to look into Virtual Memory Management in Operating System designs.
Hope it helps,