I'm looking at this simple program. My understanding is that trying to modify the values at memory addresses past the maximum index should result in a segmentation fault. However, the following code runs without any problems. I am even able to print out all 6 of the array indexes 0->6
main()
{
int i;
int a[3];
for(i=0; i<6; i++)
a[i] = i;
}
However, when I change the for loop to
for(i=0; i<7; i++)
and executing the program, it will segfault.
This almost seems to me like it is some kind of extra padding done by malloc. Why does this happen only after the 6th index (s+6)? Will this behavior happen with longer/shorter arrays? Excuse my foolishness as lowly java programmer :)
This almost seems to me like it is some kind of extra padding done by malloc.
You did not call malloc(). You declared a as an array of 3 integers in stack memory, whereas malloc() uses heap memory. In both cases, accessing any element past the last one (the third one, a[2], in this case) is undefined behaviour. When it's done with heap memory, it usually causes a segmentation fault.
Well, malloc didn't do it, because you didn't call malloc. My guess is, the extra three writes were enough to chew through your frame pointer and your stack pointer, but not through your return address (but the seventh one hit that). Your program is not guaranteed to crash when you access out-of-bounds memory (though there are other languages which do guarantee it), any more than it is guaranteed not to crash. That's what undefined behavior is: unpredictable.
As others said, accessing an array beyond its limits is undefined behaviour. So what happens? If you access memory that you should not access, it depends on where that memory is.
Generally (but very generally, specifics can vary between systems and compilers), the following main things can happen:
It can be that you simply access other variables of your process that lie directly "behind" the array. If you write to that memory, you simply modify the values of the other variables. You will probably not get a segfault, so you may never notice why your program produces bad results or acts so weird or why, during debugging, your variables have values you never (knowlingly) assigned to them. This is, IMO, really bad, because you think everything is fine while it isn't.
It can be, especially on a stack, that you access other data on the stack, like saved processor registers or even the address to which the processor should return if the function ends. If you overwrite these with some other data, it is hard to tell what happens. In that case, a segfault is probably the lesser of all possible evils. You simply don't know what can happen.
If the memory beyond your array does not belong to your process then, on most modern computers, you will get a segfault or similar exception (e.g. an access violation, or whatever your OS calls it).
I may have forgotten a few more possible problems that can occur, but those are, IMO, the most usual things that happen if you write beyond array bounds.
It depends on the free memory available, if free memory available is less then it will give segmentation fault otherwise it will use the extra memory to store the data and it will not be giving segmentation fault.There is no need for malloc because array itself allocates memory.
In your system memory is available only for 6 integers and when you are trying to access to next memory(which is not accessible or say not free)it is giving segmentation fault.
Related
Consider the following:
int* x = calloc(3,sizeof(int));
x[3] = 100;
which is located inside of a function.
I get no error when I compile and run the program, but when I run it with valgrind I get an "Invalid write of size 4".
I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
Does some address in the stack(?) still have the value 100? Because there must certainly be more available memory than what I have allocated with calloc. Is the valgrind error more of a "Hey, you probably did not mean to do that"?
I understand that I am accessing a memory place outside of what I have allocated with calloc, but I'm trying to understand what actually happens.
"What actually happens" is not well-defined; it depends entirely on what gets overwritten. As long as you don't overwrite anything important, your code will appear to run as expected.
You could wind up corrupting other data that was allocated dynamically. You could wind up corrupting some bit of heap bookkeeping.
The language does not enforce any kind of bounds-checking on array accesses, so if you read or write past the end of the array, there are no guarantees on what will happen.
Does some address in the stack(?) still have the value 100?
First of all, calloc allocates memory on the heap not stack.
Now, regarding the error.
Sure most of the time there is plenty of memory available when your program is running. However when you allocate memory for x bytes, the memory manager looks for some free chunk of memory of that exact size(+ maybe some more if calloc requested larger memory to store some auxiliary info), there are no guaranties on what the bytes after that chunk are used for, and even no guaranties that they are not read-only or can be accessed by your program.
So anything can happen. In the case if the memory was just there waiting for it to be used by your program, nothing horrible happens, but if that memory was used by something else in your program, the values would be mess up, or worst of all the program could crash because of accessing something that wasn't supposed to be accessed.
So the valgrind error should be treated very seriously.
The C language doesn't require bounds checking on array accesses, and most C compilers don't implement it. Besides if you used some variable size instead of constant value 3, the array size could be unknown during compilation, and there would be no way to check if the access isn't out of bound.
There's no guarantees on what was allocated in the space past x[3] or what will be written there in the future. alinsoar mentioned that x[3] itself does not cause undefined behavior, but you should not attempt to fetch or store a value from there. Often you will probably be able to write and access this memory location without problems, but writing code that relies on reaching outside of your allocated arrays is setting yourself up for very hard to find errors in the future.
Does some address in the stack(?) still have the value 100?
When using calloc or malloc, the values of the array are not actually on the stack. These calls are used for dynamic memory allocation, meaning they are allocated in a seperate area of memory known as the "Heap". This allows you to access these arrays from different parts of the stack as long as you have a pointer to them. If the array were on the stack, writing past the bounds would risk overwriting other information contained in your function (like in the worst case the return location).
The act of doing that is what is called undefined behavior.
Literally anything can happen, or nothing at all.
I give you extra points for testing with Valgrind.
In practice, it is likely you will find the value 100 in the memory space after your array.
Beware of nasal demons.
You're allocating memory for 3 integer elements but accessing the 4th element (x[3]). Hence, the warning message from valgrind. Compiler will not complain about it.
int *array; //it allocate a pointer to an int right?
array=malloc(sizeof(int); //allocate the space for ONE int right?
scanf("%d", &array[4]); //must generate a segmentation fault cause there is not allocated space enough right?
printf("%d",array[4]); //it shouldn't print nothing...
but it print 4! why?
Reading or writing off the end of an array in C results in undefined behavior, meaning that absolutely anything can happen. It can crash the program, or format the hard drive, or set the computer on fire. In your case, it just so happens to be the case that this works out perfectly fine, probably because malloc pads the length of the allocated memory for efficiency reasons (or perhaps because you're trashing memory that malloc wants to use later on). However, this isn't at all portable, isn't guaranteed to work, and is a disaster waiting to happen.
Hope this helps!
Cause the operating system happens to have this memory you have asked. This code is by no means guaranteed to run on another machine or another time.
C doesn't check your code for getting of boundaries of an array size, that depends to the programmer. I guess you can call it an undefined behavior (although it's not exactly what they mean when they say an undefined behavior because mostly the memory would be in a part of the stack or the heap so you can still get to it.
When you say array[4] you actually say *(array + 4) which is of course later translated to *(array + 4*sizeof(int)) and you actually go to a certain space in the memory, the space exists, it could be maybe a read-only, maybe a place of another array or variable in your program or it might just work perfectly. No guarantee it'll be an error, but it's not what undefined behavior.
To understand more about undefined behavior you can go to this article (which I find very interesting).
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
behaviour of malloc(0)
I'm trying to understand memory allocation in C. So I am experimenting with malloc. I allotted 0 bytes for this pointer but yet it can still hold an integer. As a matter of fact, no matter what number I put into the parameter of malloc, it can still hold any number I give it. Why is this?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *ptr = (int*)malloc(0);
*ptr = 9;
printf("%i", *ptr); // 9
free(ptr);
return 0;
}
It still prints 9, what's up with that?
If size is 0, then malloc() returns either NULL, or a unique pointer
value that can later be successfully passed to free().
I guess you are hitting the 2nd case.
Anyway that pointer just by mistake happens to be in an area where you can write without generating segmentation fault, but you are probably writing in the space of some other variable messing up its value.
A lot of good answers here. But it is definitely undefined behavior. Some people declare that undefined behavior means that purple dragons may fly out of your computer or something like that... there's probably some history behind that outrageous claim that I'm missing, but I promise you that purple dragons won't appear regardless of what the undefined behavior will be.
First of all, let me mention that in the absence of an MMU, on a system without virtual memory, your program would have direct access to all of the memory on the system, regardless of its address. On a system like that, malloc() is merely the guy who helps you carve out pieces of memory in an ordered manner; the system can't actually enforce you to use only the addresses that malloc() gave you. On a system with virtual memory, the situation is slightly different... well, ok, a lot different. But within your program, any code in your program can access any part of the virtual address space that's mapped via the MMU to real physical memory. It doesn't matter whether you got an address from malloc() or whether you called rand() and happened to get an address that falls in a mapped region of your program; if it's mapped and not marked execute-only, you can read it. And if it isn't marked read-only, you can write it as well. Yes. Even if you didn't get it from malloc().
Let's consider the possibilities for the malloc(0) undefined behavior:
malloc(0) returns NULL.
OK, this is simple enough. There really is a physical address 0x00000000 in most computers, and even a virtual address 0x00000000 in all processes, but the OS intentionally doesn't map any memory to that address so that it can trap null pointer accesses. There's a whole page (generally 4KB) there that's just never mapped at all, and maybe even much more than 4KB. Therefore if you try to read or write through a null pointer, even with an offset from it, you'll hit these pages of virtual memory that aren't even mapped, and the MMU will throw an exception (a hardware exception, or interrupt) that the OS catches, and it declares a SIGSEGV (on Linux/Unix), or an illegal access (on Windows).
malloc(0) returns a valid address to previously unallocated memory of the smallest allocable unit.
With this, you actually get a real piece of memory that you can legally call your own, of some size you don't know. You really shouldn't write anything there (and probably not read either) because you don't know how big it is, and for that matter, you don't know if this is the particular case you're experiencing (see the following cases). If this is the case, the block of memory you were given is almost guaranteed to be at least 4 bytes and probably is 8 bytes or perhaps even larger; it all depends on whatever the size is of your implementation's minimum allocable unit.
malloc(0) intentionally returns the address of an unmapped page of
memory other than NULL.
This is probably a good option for an implementation, as it would allow you or the system to track & pair together malloc() calls with their corresponding free() calls, but in essence, it's the same as returning NULL. If you try to access (read/write) via this pointer, you'll crash (SEGV or illegal access).
malloc(0) returns an address in some other mapped page of memory
that may be used by "someone else".
I find it highly unlikely that a commercially-available system would take this route, as it serves to simply hide bugs rather than bring them out as soon as possible. But if it did, malloc() would be returning a pointer to somewhere in memory that you do not own. If this is the case, sure, you can write to it all you want, but you'd be corrupting some other code's memory, though it would be memory in your program's process, so you can be assured that you're at least not going to be stomping on another program's memory. (I hear someone getting ready to say, "But it's UB, so technically it could be stomping on some other program's memory. Yes, in some environments, like an embedded system, that is right. No modern commercial OS would let one process have access to another process's memory as easily as simply calling malloc(0) though; in fact, you simply can't get from one process to another process's memory without going through the OS to do it for you.) Anyway, back to reality... This is the one where "undefined behavior" really kicks in: If you're writing to "someone else's memory" (in your own program's process), you'll be changing the behavior of your program in difficult-to-predict ways. Knowing the structure of your program and where everything is laid out in memory, it's fully predictable. But from one system to another, things would be laid out in memory (appearing a different locations in memory), so the effect on one system would not necessarily be the same as the effect on another system, or on the same system at a different time.
And finally.... No, that's it. There really, truly, are only those four
possibilities. You could argue for special-case subset points for
the last two of the above, but the end result will be the same.
For one thing, your compiler may be seeing these two lines back to back and optimizing them:
*ptr = 9;
printf("%i", *ptr);
With such a simplistic program, your compiler may actually be optimizing away the entire memory allocate/free cycle and using a constant instead. A compiler-optimized version of your program could end up looking more like simply:
printf("9");
The only way to tell if this is indeed what is happening is to examine the assembly that your compiler emits. If you're trying to learn how C works, I recommend explicitly disabling all compiler optimizations when you build your code.
Regarding your particular malloc usage, remember that you will get a NULL pointer back if allocation fails. Always check the return value of malloc before you use it for anything. Blindly dereferencing it is a good way to crash your program.
The link that Nick posted gives a good explanation about why malloc(0) may appear to work (note the significant difference between "works" and "appears to work"). To summarize the information there, malloc(0) is allowed to return either NULL or a pointer. If it returns a pointer, you are expressly forbidden from using it for anything other than passing it to free(). If you do try to use such a pointer, you are invoking undefined behavior and there's no way to tell what will happen as a result. It may appear to work for you, but in doing so you may be overwriting memory that belongs to another program and corrupting their memory space. In short: nothing good can happen, so leave that pointer alone and don't waste your time with malloc(0).
The answer to the malloc(0)/free() calls not crashing you can find here:
zero size malloc
About the *ptr = 9, is just like overflowing a buffer (like malloc'ing 10 bytes and access the 11th), you are writing to memory you don't own, and doing that is looking for trouble. In this particular implementation malloc(0) happens to return a pointer instead of NULL.
Bottom line, it is wrong even if it seems to work on a simple case.
Some memory allocators have the notion of "minimum allocatable size". So, even if you pass zero, this will return pointer to the memory of word-size, for example. You need to check up with your system allocator documentation. But if it does return pointer to some memory it'd be wrong to rely on it as the pointer is only supposed to be passed either to be passed realloc() or free().
I have an array that's declared as char buff[8]. That should only be 8 bytes, but looking as the assembly and testing the code, I get a segmentation fault when I input something larger than 32 characters into that buff, whereas I would expect it to be for larger than 8 characters. Why is this?
What you're saying is not a contradiction:
You have space for 8 characters.
You get an error when you input more than 32 characters.
So what?
The point is that nobody told you that you would be guaranteed to get an error if you input more than 8 characters. That's simply undefined behaviour, and anything can (and will) happen.
You absolutely mustn't think that the absence of obvious misbehaviour is proof of the correctness of your code. Code correctness can only be verified by checking the code against the rules of the language (though some automated tools such as valgrind are an immense help).
Writing beyond the end of the array is undefined behavior. Undefined behavior means nothing (including a segmentation fault) is guaranteed.
In other words, it might do anything. More practical, it's likely the write didn't touch anything protected, so from the point of view of the OS everything is still OK until 32.
This raises an interesting point. What is "totally wrong" from the point of view of C might be OK with the OS. The OS only cares about what pages you access:
Is the address mapped for your process ?
Does your process have the rights ?
You shouldn't count on the OS slapping you if anything goes wrong. A useful tool for this (slapping) is valgrind, if you are using Unix. It will warn you if your process is doing nasty things, even if those nasty things are technically OK with the OS.
C arrays have no bound checking.
As other said, you are hitting undefined behavior; until you stay inside the bounds of the array, everything works fine. If you cheat, as far as the standard is concerned, anything can happen, including your program seeming to work right as well as the explosion of the Sun.
What happens in practice is that with stack-allocated variables you are likely to overwrite other variables on the stack, getting "impossible" bugs, or, if you hit a canary value put by the compiler, it may detect the buffer overflow on return from the function. For variables allocated in the so-called heap, the heap allocator may have given some more room than requested, so the mistake may be less easy to spot, although you may easily mess up the internal structures of the heap.
In both cases you can also hit a protected memory page, which will result in your program being terminated forcibly (for the stack this happens less often because usually you have to overwrite the entire stack to get to a protected page).
Your declaration char buff[8] sounds like a stack allocated variable, although it could be heap allocated if part of a struct. Accessing out of bounds of an array is undefined behaviour and is known as a buffer overrun. Buffer overruns on stack allocated memory may corrupt the current stack frame and possibly other stack frames in the call stack. With undefined behaviour, anything could happen, including no apparent error. You would not expect a seg fault immediately because the stack is typically when the thread starts.
For heap allocated memory, memory managers typically allocate large blocks of memory and then sub-allocate from those larger blocks. That is why you often don't get a seg fault when you access beyond the end of a block of memory.
It is undefined behaviour to access beyond the end of a memory block. And it is perfectly valid, according to the standard, for such out of bounds accesses to result in seg faults or indeed an apparently successful read or write. I say apparently successful because if you are writing then you will quite possibly produce a heap corruption by writing out of bounds.
Unless you are not telling us something you answered your owflown question.
declaring
char buff[8] ;
means that the compiler grabs 8 bytes of memory. If you try and stuff 32 char's into it you should get a seg fault, that's called a buffer overflow.
Each char is a byte ( unless you are doing unicode in which it is a word ) so you are trying to put 4x the number of chars that will fit in your buffer.
Is this your first time coding in C ?
Why can i do this?
int array[4]; // i assume this creates an array with 4 indexes
array[25]=1; // i have assigned an index larger than the int declaration
NSLog(#"result: %i", array[25]); // this prints "1" to the screen
Why does this work, if the index exceeds the declaration? what is the significance of the number in the declaration if it has no effect on what you can actually do with the array?
Thanks.
You are getting undefined behavior. It could print anything, it could crash, it could burst into singing (okay that isn't likely but you get the idea).
If it happens to write to a location that is mapped with the adequate permissions it will work. Until one day when it won't because of a different layout.
it is undefined. some OS will give you segmentation fault, while some tolerate this. anyhow, exceeding the array's size should be avoided.
An array is really just a pointer to the start of a contiguous, allocated block of memory.
In this case, you have allocated 4 ints worth of memory.
So if you went array[2] it would think "the memory at array + sizeof(int) * 2"
Change the 2 to 25, and you're just looking at 25 int's worth of memory past the start. Since there's no checks to verify you're in bounds (either when assigning or printing) it works.
There's nothing to say somehting else might be allocated there though!
The number in the decleration determines how many memory should be reserved, in that case 4 * sizeof(int).
If you write to memory out of the bounds, this is possible but not recommended. In C you can access any point in memory available to your program, but if you write to that if it's not reserved for that thing, you can cause memory corruption. Arrays are pointers (but not the other way around).
The behavior depends on the compiler, the platform and some randomness. Don't do it.
It's doing very bad things. If the array is declared locally to a function, you're probably writing on stack locations above the current function's stack frame. If it is declared globally, you're probably writing on memory allocated to adjacent variables. Either way, it is "working" by pure luck.
It is possible your C compiler had padded your array for memory alignment purposes, and by luck your array overrun just happens to still be within the rounded-up allocation. Don't rely on it though.
This is unsafe programming. It really should be avoided because it may not crash your program. Which is really the best thing you could hope for. It could give you garbage results. These are unpredictable and could really screw up your program. However since you don't know that is wrong because it not crashing it will ruin the integrity of your data. Since there is no try/catch with C you really should check inputs. Remember scanf returns an int.
C by design does not perform array bounds checking. It was designed as a systems level language, and the overhead of explicit run-time bounds checking could be prohibitive in a great many cases. Consequently C will allow 'dangerous' code and must be used carefully. If you need something 'safer' then C# or Java may be more appropriate, but there is a definite performance hit.
Automated static analysis tools may help, and there are run-time bounds checking tools for use in development and debugging.
In C an array is a contiguous block of memory. By accessing the array out-of-bounds, you are simply accessing memory beyond the end of the array. What accessing such memory will do is non-deterministic, it may be junk, it may belong to an adjacent variable, or it may belong to the variables of the calling function or above. It maybe a return address for the current function or a calling function. In a memory protected OS such as Windows or Linux, if you access so far out of bounds as to be no longer within the address range assigned to the process, a fault exception will occur.