I'm trying to get to grips with malloc, and so far I'm mostly getting unexpected results when testing and playing around with it.
int main(int argc, char** argv)
{
int size = 10;
int *A;
A = (int *)malloc(size * sizeof(int));
for (int i = 0; i < 10000; i++)
{
A[i] = i;
}
for (int i = 0; i < 10000; i++)
{
printf("%d) %d\n", i, A[i]);
}
}
With this example code above for example, the code runs without an error. Even though I only allocated A to hold 10 * int, so I expected the loop to only run 10 times before hitting an error. If I increment the loop to about 30-40k instead, it then hits a segmentation error. However if I increase my size to the loop amount, it would always work like expected. So I know how to avoid the error.. kinda, I was just hoping someone might be kind enough to explain why this is.
Edit: Turned out I didn't appreciate that C doesn't detect out of bounds, I've been looked after way too much by Java and C++. I had undefined behavior and now know it's my job to prevent them. Thanks for everyone that replied.
C isn't required to perform any bounds checking on array access. It can allow you read/write past that without any warning or error.
You're invoking undefined behavior by reading and writing past the end of allocated memory. This means the behavior of your program can't be predicted. It could crash, it could output strange results, or it could (as in your case) appear to work properly.
Just because the program can crash doesn't mean it will.
The codes runs without an error, but it is still wrong. You just do not notice it. Your loop runs out of the allocated area, but the system remains unaware of that fact until you run out of a much larger area your program can potentially access.
Picture it this way:
<UNAVAILABLE><other data>1234567890<other data><UNAVAILABLE>
Your 10 ints are in the middle of other data, which you can read and even write - to very unpleasant effects. C is not holding your hand here - only once you go out of the total available memory, the program will crash, not before.
Undefined behavior doesn't mean "guaranteed segmentation fault"; it may work in some cases.
There is no way of knowing how far beyond an array's bounds you can go before you finally crash; even dereferencing one element beyond a boundary is undefined behavior.
Also: if malloc succeeds, it will allocate at least as much space as you requested, possibly more.
Related
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *a;
a = (int *)malloc(100*sizeof(int));
int i=0;
for (i=0;i<100;i++)
{
a[i] = i+1;
printf("a[%d] = %d \n " , i,a[i]);
}
a = (int*)realloc(a,75*sizeof(int));
for (i=0;i<100;i++)
{
printf("a[%d] = %d \n " , i,a[i]);
}
free(a);
return 0;
}
In this program I expected the program to give me a segmentation fault because im trying to access an element of an array which is freed using realloc() . But then the output is pretty much the same except for a few final elements !
So my doubt is whether the memory is actually getting freed ? What exactly is happening ?
The way realloc works is that it guarantees that a[0]..a[74] will have the same values after the realloc as they did before it.
However, the moment you try to access a[75] after the realloc, you have undefined behaviour. This means that the program is free to behave in any way it pleases, including segfaulting, printing out the original values, printing out some random values, not printing anything at all, launching a nuclear strike, etc. There is no requirement for it to segfault.
So my doubt is whether the memory is actually getting freed?
There is absolutely no reason to think that realloc is not doing its job here.
What exactly is happening?
Most likely, the memory is getting freed by shrinking the original memory block and not wiping out the now unused final 25 array elements. As a result, the undefined behaviour manifests itself my printing out the original values. It is worth noting that even the slightest changes to the code, the compiler, the runtime library, the OS etc could make the undefined behaviour manifest itself differently.
You may get a segmentation fault, but you may not. The behaviour is undefined, which means anything can happen, but I'll attempt to explain what you might be experiencing.
There's a mapping between your virtual address space and physical pages, and that mapping is usually in pages of 4096 bytes at least (well, there's virtual memory also, but lets ignore that for the moment).
You get a segmentation fault if you attempt to address virtual address space that doesn't map to a physical page. So your call to realloc may not have resulted in a physical page being returned to the system, so it's still mapped to you program and can be used. However a following call to malloc could use that space, or it could be reclaimed by the system at any time. In the former case you'd possibly overwrite another variable, in the latter case you'll segfault.
Accessing an array beyond its bounds is undefined behaviour. You might encounter a runtime error. Or you might not. The memory manager may well have decided to re-use the original block of memory when you re-sized. But there's no guarantee of that. Undefined behaviour means that you cannot reason about or predict what will happen. There's no grounds for you to expect anything to happen.
Simply put, don't access beyond the end of the array.
Some other points:
The correct main declaration here is int main(void).
Casting the value returned by malloc is not needed and can mask errors. Don't do it.
Always store the return value of realloc into a separate variable so that you can detect NULL being returned and so avoid losing and leaking the original block.
Can someone explain why does this code works, even though i allocate memory only for 2 cells in st array?
int main(void){
st=(int *)malloc(sizeof(int)*2);
int j;
for(j=0; j<5; j++){
st[j]=j*10000;
}
}
While the next code won't work...
int main(void){
st=(int *)malloc(sizeof(int)*2);
int j;
for(j=0; j<6; j++){
st[j]=j*10000;
}
for(j=0; j<6; j++)
printf("st[%d]=%d\n",j,st[j]);
}
As i understand, i should not be able to put a number in st[j] for j>1.
Thanks a lot !!!
Accessing unallocated memory is "undefined behavior". That means the program can exit with a runtime error, but it doesn't have to.
Many compilers build code with certain saveguards around allocated memory to cause crashs when writing beyond it, but these don't have to work in any condition. In your special case it seems like going 12 bytes over the array boundary doesn't trigger it, but going 16 bytes does. But that's also something you can't and shouldn't rely on. Depending on other circumstances, another program, the same program compiled with other options or even the same program executed at a different time might behave different.
In your first example, the memory you write to is never read again. A standard-compliant compiler that can detect this is allowed to eliminate the code, as long as it behaves identically. Since the program is not required to crash, the compiler can replace it with an empty program.
You are writing past to allocated space for array. It invoke undefined behavior. In this case anything could happen.
I'm learning C and trying to build an dynamic array. I found a great tutorial on this but I don't get it all the way. The code I have now is
typedef struct{
int size;
int capacity;
char *data;
}Brry;
void brry_init(Brry *brry){
brry->size = 0;
brry->capacity = 2;
brry->data = (char *)calloc(brry->capacity, sizeof(char));
}
void brry_insert(Brry *brry, char value){
brry->data[brry->size++] = value; //so do check here if I have enough memory, but checking something out
}
int main(void){
Brry brry;
brry_init(&brry);
for (int i = 0; i < 3; i++) {
brry_insert(&brry, 'a');
}
printf("%c\n", brry.data[2]);
return 0;
}
In my main function I add 3 element to the array, but it only allocated for 2. But when I print it it works just fine? I expected some strange value to be printed. Why is this or am I doing something wrong?
You are writing into a buffer you didn't allocate enough memory for. That it works is not guaranteed.
What you're trying now is to read from some junk value in memory, who knows, which sometimes leads to a segmentation fault and other times you are lucky and get some junk value, and it doesn't segfault.
Writing into junk memory will invoke undefined behavior, so better watch it.
If you do get errors it will almost always be a segfault, short for segmentation fault.
Read up on it here.
The technical for what you're doing by reading past the bounds of the array is called derefencing a pointer. You might also want to read more about that here.
Yes, you are indeed writing to the third element of a two element array. This means your program will exhibit undefined behavior and you have no guarantee of what is going to happen. In your case you got lucky and the program "worked", but you might not always be so lucky.
Trying to read/write past the end of the array results in undefined behaviour. Exactly what happens depends on several factors which you cannot predict or control. Sometimes, it will seem to read and/or write successfully without complaining. Other times, it may fail horribly and effectively crash your program.
The critical thing is that you should never try to use or rely on undefined behaviour. It's unfortunately a common rookie mistake to think that it will always work because one test happened to succeed. That's definitely not the case, and is a recipe for disaster sooner or later.
int main(int argc, char const *argv[])
{
int anArray[5];
anArray[0] = 54;
anArray[1] = 54;
anArray[2] = 54;
anArray[3] = 54;
anArray[4] = 54;
anArray[5] = 54;
anArray[6] = 54;
anArray[7] = 54;
printf ("%i\n", anArray[7]);
return 0;
}
This prints 54.
How does this even work? We say that C arrays are not dynamic. Why should this even compile? or even if it compiles, it should throw a seg fault.
I have defined an array of 5 elements, then I accessed elements 5,6,7. Why is it possible to assign a value to, for example, anArray[5]?
Please note that I have a c++ background and I haven't used this kind of array for a long time.
You are scribbling into memory that you don't own, so anything could happen. You got lucky and the computer let you write and then read the value in that location. But it's just luck: the behavior is undefined.
Note that the exact same thing applies to C++ (since you mentioned it), not only with C-style arrays but also with std::vector::operator[] and std::array in C++11. In C++ you can use vec.at(idx) instead of vec[idx] to do bounds checking always.
The language itself doesn't say the runtime or the compiler has to check you're actually accessing elements inside the bounds of the array. The compiler could emit a warning, but that's it. You are responsible for accessing valid elements. Not doing so results in undefined behavior, which means anything can happen, including appearing to work.
You're basically reading into memory to places where you don't know what's there. This can be a useful thing in C (if you really know what you're doing) but also can get you hours of frustrating debugging because it is undefined behaviour what's going to happen there.
From wikipedia:
Many programming languages, such as C, never perform automatic bounds checking to raise speed. However, this leaves many off-by-one errors and buffer overflows uncaught. Many programmers believe these languages sacrifice too much for rapid execution.
No compiler error: as no compiler error related issues.
Run time error: because of undefined behavior and you are lucky that the memory location you
were trying to access was free at that time !
In the code snippet, I expected a segmentation fault as soon as trying to assign a value to count[1]. However the code continues and executes the second for-loop, only indicating a segmentation fault when the program terminates.
#include <stdio.h>
int main()
{
int count[1];
int i;
for(i = 0; i < 100; i++)
{
count[i] = i;
}
for(i = 0; i < 100; i++)
{
printf("%d\n", count[i]);
}
return 0;
}
Could someone explain what is happening?
Reasons for edit:
Improved the example code as per comments of users,
int count[0] -> int count[1],
too avoid flame wars.
You're writing beyond the bounds of the array. That doesn't mean you're going to get a segmentation fault. It just means that you have undefined behavior. Your program's behavior is no longer constrained by the C standard. Anything could happen (including the program seeming to work) -- a segfault is just one possible outcome.
In practice, a segmentation fault occurs when you try to access a memory page that is not mapped to your process by the OS. Each page is 4KB on a typical x86 PC, so basically, your process is given access to memory in 4KB chunks, and you only get a segfault if you write outside the current chunk.
With the small indices you're using, you're still staying within the current memory page, which is allocated to your process, and so the CPU doesn't detect that you're accessing memory out of bounds.
When you write beyond the array bounds, you are probably still writing data into the area of memory under the control of your process; you are also almost certainly overwriting memory used by other software, such as heap or stack frame management code. It is only when that code executes, such as when the current function attempts to return, that your code might go awry. Actually, you really hope for a seg fault.
Your code is broken:
seg.c:5: warning: ISO C forbids zero-size array ‘count’
Always compile with high warning levels, for example -Wall -pedantic for GCC.
Edit:
What you are effectively doing is corrupting mains function stack frame. Since stack nowadays pretty much always grows down, this is what's happening:
First loop overrides stack memory holding main parameters and return address to crt0 routines.
Second loop happily reads that memory.
When main returns the segmentation fault is triggered since return address is fubar-ed.
This is a classic case of buffer overrun and is the basis of many network worms.
Run the program under the debugger and check the addresses of local variables. In GDB you can say set backtrace past-main so backtrace would show you all the routines leading to main.
By the way, the same effect could be achieved without zero-length array - just make its size smaller then number of loop iterations.