This question already has answers here:
Writing more characters than malloced. Why does it not fail?
(9 answers)
Why don't I get a segmentation fault when I write beyond the end of an array?
(4 answers)
Closed 6 years ago.
I was writing some code and I used the function calloc.
I understand that, when the first and the second arguments passed to this function are both zero, the function is going to alloc the necessary space for 0 elements, each of them with size 0, but here is the strange thing.
This program works fine even if n > 0. Why is that happening? I think it should display an error because I'm trying to write in a position of the array that doesn't exist. Thanks!
#include <stdio.h>
#include <stdlib.h>
int main(){
int n;
scanf("%d", &n);
int *test = calloc(0, 0);
for(int i = 0; i < n; i++){
test[i] = 100;
printf("%d ", test[i]);
}
return 0;
}
In C, a lot of wrong and "wrong" things don't display error messages from the compiler. In fact, you may not even see error messagens when you run the program -- but another person running your program in a different computer may see the error.
One important concept in C is called undefined behavior. To put it in simple terms, it means that the behavior of your program is unpredictable, but you can read more about this subject in this question: Undefined, unspecified and implementation-defined behavior.
Your program is undefined for a two reasons:
When either size or nmemb is zero, calloc() may return NULL. Your program is not checking the output of calloc(), so there's a good chance that when you do test[i] you are attempting to derreference a NULL pointer -- you should always check the result of calloc() and malloc().
When you call malloc() or calloc(), you are essentialy allocating dynamic memory for an array. You can use your pointer to access the elements of the array. But you can't access anything past the array. That is, if you allocate n elements you should not try to access the n+1-th element -- nether for reading.
Both items above make your program invoke undefined behavior. There might also be something undefined about accessing an empty object other than item #2 listed above, but I'm unsure.
You should always be careful about undefined behavior because, when your program invokes UB, it is essentialy unpredictable. You could see a compilation error, the program could give an error messagen, it could run successfuly without any problems or it could wipe out every single file in your hard disk.
Related
I was reading through some source code and found a functionality that basically allows you to use an array as a linked list? The code works as follows:
#include <stdio.h>
int
main (void)
{
int *s;
for (int i = 0; i < 10; i++)
{
s[i] = i;
}
for (int i = 0; i < 10; i++)
{
printf ("%d\n", s[i]);
}
return 0;
}
I understand that s points to the beginning of an array in this case, but the size of the array was never defined. Why does this work and what are the limitations of it? Memory corruption, etc.
Why does this work
It does not, it appears to work (which is actually bad luck).
and what are the limitations of it? Memory corruption, etc.
Undefined behavior.
Keep in mind: In your program whatever memory location you try to use, it must be defined. Either you have to make use of compile-time allocation (scalar variable definitions, for example), or, for pointer types, you need to either make them point to some valid memory (address of a previously defined variable) or, allocate memory at run-time (using allocator functions). Using any arbitrary memory location, which is indeterminate, is invalid and will cause UB.
I understand that s points to the beginning of an array in this case
No the pointer has automatic storage duration and was not initialized
int *s;
So it has an indeterminate value and points nowhere.
but the size of the array was never defined
There is neither array declared or defined in the program.
Why does this work and what are the limitations of it?
It works by chance. That is it produced the expected result when you run it. But actually the program has undefined behavior.
As I have pointed out first on the comments, what you are doing does not work, it seems to work, but it is in fact undefined behaviour.
In computer programming, undefined behavior (UB) is the result of
executing a program whose behavior is prescribed to be unpredictable,
in the language specification to which the computer code adheres.
Hence, it might "work" sometimes, and sometimes not. Consequently, one should never rely on such behaviour.
If it would be that easy to allocate a dynamic array in C what would one use malloc?! Try it out with a bigger value than 10 to increase the likelihood of leading to a segmentation fault.
Look into the SO Thread to see the how to properly allocation and array in C.
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Writing to pointer out of bounds after malloc() not causing error
(7 answers)
Why is it that we can write outside of bounds in C?
(7 answers)
What happens if I try to access memory beyond a malloc()'d region?
(5 answers)
Why does int pointer '++' increment by 4 rather than 1?
(5 answers)
Closed 3 years ago.
I've been digging into memory allocation and pointers in C. I was under the impression that if you do not allocate enough memory for a value and then try to put that value in that memory cell, the program would either crash or behave incorrectly.
But what I get is a seemingly correct output where I'd expect something else.
#include <stdio.h>
#include <stdlib.h>
int main()
{
// Here we intentionally allocate only 1 byte,
// even though an `int` takes up 4 bytes
int * address = malloc(1);
address[0] = 16777215; // this value surely takes more than 3 bytes. It cannot fit 1 byte.
address[1] = 1337; // just for demo, let's put a random other number in the next memory cell.
printf("%i\n", address[0]); // Prints 16777215. How?! Didn't we overwrite a part of the number?
return 0;
}
Why does this work? Does malloc actually allocate more than the number of bytes that we pass to it?
EDIT
Thanks for the comments! But I wish to note that being able to write to unassigned memory is not the part that surprises me and it's not part of the question. I know that writing out of bounds is possible and it is "undefined behavior".
For me, the unexpected part is that the line address[1] = 1337; does not in any way corrupt the int value at address[0].
It seems that the explanations for this diverge, too.
#Mini suggests that the reason for this is that malloc actually allocates more than what's passed, because of cross-platform differences.
#P__J__ in the comments says that address[1] for some reason points to the next sizeof(int) byte, not to the next byte. But I don't think I understand what controls this behavior then, because malloc doesn't seem to know about what types we will put into the allocated blocks.
EDIT 2
So thanks to the comments, I believe I understand the program behavior now.
The answer lies in the pointer arithmetic. The program "knows" that an address pointer is of type int, and therefore adding 1 to it (or accessing via address[1]) gives an address of the block that lies 4 (sizeof(int)) bytes ahead.
And if we really wanted, we could move just one byte and really corrupt the value at address[0] by coercing address to char * as described in this answer
Thanks to all and to #P__J__ and #Blastfurnace in particular!
malloc often allocates more than you actually ask for (all system/environment/OS dependent), which is why it works in you scenario (sometimes). However, this is still undefined behavior it can actually allocate only 1 byte (and you are writing to what may not be allocated heap memory).
C doesn't mandate any kinds of bounds checking on array accesses, and it's possible to overflow storage and write into memory you don't technically own. As long as you don't clobber anything "important", your code will appear to work as intended.
However, the behavior on buffer overruns is undefined, so the results will not generally be predictable or repeatable.
I recently wrote this code in C:
#include <stdio.h>
#define N_ROWS 100
int main() {
char *inputFileName = "triangle_data.txt";
FILE *inputFile = fopen(inputFileName, "r");
if (inputFile == NULL) {
printf("ERROR: Failed to open \"%s\".\n", inputFileName);
return -1;
}
int triangle[(N_ROWS*(N_ROWS+1))/2 - 1];
size_t size = sizeof(triangle)/sizeof(int);
size_t index;
for (index = 0; !feof(inputFile); ++index) {
fscanf(inputFile, "%d", &triangle[index]);
}
return 1;
}
and was expecting a Segmentation Fault, since N_ROWS*(N_ROWS+1))/2 is just enough space to hold the data in the file, but as you can see I made the array one element smaller. Somehow this doesn't trigger a segmentation fault. It does if I replace the body of the for-loop with:
int tmp;
fscanf(inputFile, "%d", &tmp);
triangle[index] = tmp;
What is happening here. If I make the array three elements to small it still doesn't trigger a segmentation fault. Five elements to small trigger one. I'm sure there is enough data in the file.
As a test I printed the array afterwards and if I choose a smaller array there were elements missing.
What is happening here?
PS: Compiled with clang on a OS X.
A segmentation fault doesn't mean that you accessed an array out of bounds, it means that you've accessed a virtual memory address that isn't mapped. Often accessing an array out of bounds will cause this, but just because you aren't seeing a segfault it doesn't mean that all of your memory accesses are valid.
As to why you're seeing the different behavior, it's hard to say and it isn't necessarily a worthwhile use of time to try justifying different results when the results are specified as undefined. If you're really curious about what's going on you could look at the assembly generated by the two versions of your code (use the --save-temps argument to clang).
What is happening here?
Your program invokes undefined behavior as you are writing outside your array object. Undefined behavior in C is undefined, your program can work today and crash all the other days or even print Shakespeare complete works.
The behaviour of your program (accessing an array element out of bounds) is undefined.
There is no particular requirement that undefined behaviour result in a segmentation fault, or any other observable error condition.
Undefined behaviour means - literally - that the C standard does not impose any restrictions on what is allowed to occur. That means anything can happen, including appearing to work correctly, or working in one circumstance but not another.
The trick is not to worry about the particular potential causes of segmentation faults (or any other error condition that any instance of undefined behaviour might trigger). It is to ensure the program has well-defined behaviour, so such symptoms are guaranteed not to occur.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *a;
a = (int *)malloc(100*sizeof(int));
int i=0;
for (i=0;i<100;i++)
{
a[i] = i+1;
printf("a[%d] = %d \n " , i,a[i]);
}
a = (int*)realloc(a,75*sizeof(int));
for (i=0;i<100;i++)
{
printf("a[%d] = %d \n " , i,a[i]);
}
free(a);
return 0;
}
In this program I expected the program to give me a segmentation fault because im trying to access an element of an array which is freed using realloc() . But then the output is pretty much the same except for a few final elements !
So my doubt is whether the memory is actually getting freed ? What exactly is happening ?
The way realloc works is that it guarantees that a[0]..a[74] will have the same values after the realloc as they did before it.
However, the moment you try to access a[75] after the realloc, you have undefined behaviour. This means that the program is free to behave in any way it pleases, including segfaulting, printing out the original values, printing out some random values, not printing anything at all, launching a nuclear strike, etc. There is no requirement for it to segfault.
So my doubt is whether the memory is actually getting freed?
There is absolutely no reason to think that realloc is not doing its job here.
What exactly is happening?
Most likely, the memory is getting freed by shrinking the original memory block and not wiping out the now unused final 25 array elements. As a result, the undefined behaviour manifests itself my printing out the original values. It is worth noting that even the slightest changes to the code, the compiler, the runtime library, the OS etc could make the undefined behaviour manifest itself differently.
You may get a segmentation fault, but you may not. The behaviour is undefined, which means anything can happen, but I'll attempt to explain what you might be experiencing.
There's a mapping between your virtual address space and physical pages, and that mapping is usually in pages of 4096 bytes at least (well, there's virtual memory also, but lets ignore that for the moment).
You get a segmentation fault if you attempt to address virtual address space that doesn't map to a physical page. So your call to realloc may not have resulted in a physical page being returned to the system, so it's still mapped to you program and can be used. However a following call to malloc could use that space, or it could be reclaimed by the system at any time. In the former case you'd possibly overwrite another variable, in the latter case you'll segfault.
Accessing an array beyond its bounds is undefined behaviour. You might encounter a runtime error. Or you might not. The memory manager may well have decided to re-use the original block of memory when you re-sized. But there's no guarantee of that. Undefined behaviour means that you cannot reason about or predict what will happen. There's no grounds for you to expect anything to happen.
Simply put, don't access beyond the end of the array.
Some other points:
The correct main declaration here is int main(void).
Casting the value returned by malloc is not needed and can mask errors. Don't do it.
Always store the return value of realloc into a separate variable so that you can detect NULL being returned and so avoid losing and leaking the original block.
I'm learning C and trying to build an dynamic array. I found a great tutorial on this but I don't get it all the way. The code I have now is
typedef struct{
int size;
int capacity;
char *data;
}Brry;
void brry_init(Brry *brry){
brry->size = 0;
brry->capacity = 2;
brry->data = (char *)calloc(brry->capacity, sizeof(char));
}
void brry_insert(Brry *brry, char value){
brry->data[brry->size++] = value; //so do check here if I have enough memory, but checking something out
}
int main(void){
Brry brry;
brry_init(&brry);
for (int i = 0; i < 3; i++) {
brry_insert(&brry, 'a');
}
printf("%c\n", brry.data[2]);
return 0;
}
In my main function I add 3 element to the array, but it only allocated for 2. But when I print it it works just fine? I expected some strange value to be printed. Why is this or am I doing something wrong?
You are writing into a buffer you didn't allocate enough memory for. That it works is not guaranteed.
What you're trying now is to read from some junk value in memory, who knows, which sometimes leads to a segmentation fault and other times you are lucky and get some junk value, and it doesn't segfault.
Writing into junk memory will invoke undefined behavior, so better watch it.
If you do get errors it will almost always be a segfault, short for segmentation fault.
Read up on it here.
The technical for what you're doing by reading past the bounds of the array is called derefencing a pointer. You might also want to read more about that here.
Yes, you are indeed writing to the third element of a two element array. This means your program will exhibit undefined behavior and you have no guarantee of what is going to happen. In your case you got lucky and the program "worked", but you might not always be so lucky.
Trying to read/write past the end of the array results in undefined behaviour. Exactly what happens depends on several factors which you cannot predict or control. Sometimes, it will seem to read and/or write successfully without complaining. Other times, it may fail horribly and effectively crash your program.
The critical thing is that you should never try to use or rely on undefined behaviour. It's unfortunately a common rookie mistake to think that it will always work because one test happened to succeed. That's definitely not the case, and is a recipe for disaster sooner or later.