I recently wrote this code in C:
#include <stdio.h>
#define N_ROWS 100
int main() {
char *inputFileName = "triangle_data.txt";
FILE *inputFile = fopen(inputFileName, "r");
if (inputFile == NULL) {
printf("ERROR: Failed to open \"%s\".\n", inputFileName);
return -1;
}
int triangle[(N_ROWS*(N_ROWS+1))/2 - 1];
size_t size = sizeof(triangle)/sizeof(int);
size_t index;
for (index = 0; !feof(inputFile); ++index) {
fscanf(inputFile, "%d", &triangle[index]);
}
return 1;
}
and was expecting a Segmentation Fault, since N_ROWS*(N_ROWS+1))/2 is just enough space to hold the data in the file, but as you can see I made the array one element smaller. Somehow this doesn't trigger a segmentation fault. It does if I replace the body of the for-loop with:
int tmp;
fscanf(inputFile, "%d", &tmp);
triangle[index] = tmp;
What is happening here. If I make the array three elements to small it still doesn't trigger a segmentation fault. Five elements to small trigger one. I'm sure there is enough data in the file.
As a test I printed the array afterwards and if I choose a smaller array there were elements missing.
What is happening here?
PS: Compiled with clang on a OS X.
A segmentation fault doesn't mean that you accessed an array out of bounds, it means that you've accessed a virtual memory address that isn't mapped. Often accessing an array out of bounds will cause this, but just because you aren't seeing a segfault it doesn't mean that all of your memory accesses are valid.
As to why you're seeing the different behavior, it's hard to say and it isn't necessarily a worthwhile use of time to try justifying different results when the results are specified as undefined. If you're really curious about what's going on you could look at the assembly generated by the two versions of your code (use the --save-temps argument to clang).
What is happening here?
Your program invokes undefined behavior as you are writing outside your array object. Undefined behavior in C is undefined, your program can work today and crash all the other days or even print Shakespeare complete works.
The behaviour of your program (accessing an array element out of bounds) is undefined.
There is no particular requirement that undefined behaviour result in a segmentation fault, or any other observable error condition.
Undefined behaviour means - literally - that the C standard does not impose any restrictions on what is allowed to occur. That means anything can happen, including appearing to work correctly, or working in one circumstance but not another.
The trick is not to worry about the particular potential causes of segmentation faults (or any other error condition that any instance of undefined behaviour might trigger). It is to ensure the program has well-defined behaviour, so such symptoms are guaranteed not to occur.
Related
I have this code in C which takes in bunch of chars
#include<stdio.h>
# define NEWLINE '\n'
int main()
{
char c;
char str[6];
int i = 0;
while( ((c = getchar()) != NEWLINE))
{
str[i] = c;
++i;
printf("%d\n", i);
}
return 0;
}
Input is: testtesttest
Output:
1
2
3
4
5
6
7
8
117
118
119
120
My questions are:
Why don't I get an out of bounds (segmentation fault) exception although I clearly exceed the capacity of the array?
Why do the numbers in the output suddenly jump to very big numbers?
I tried this in C++ and got the same behavior. Could anyone please explain what is the reason for this?
C doesn't check array boundaries. A segmentation fault will only occur if you try to dereference a pointer to memory that your program doesn't have permission to access. Simply going past the end of an array is unlikely to cause that behaviour. Undefined behaviour is just that - undefined. It may appear to work just fine, but you shouldn't be relying on its safety.
Your program causes undefined behaviour by accessing memory past the end of the array. In this case, it looks like one of your str[i] = c writes overwrites the value in i.
C++ has the same rules as C does in this case.
When you access an array index, C and C++ don't do bound checking. Segmentation faults only happen when you try to read or write to a page that was not allocated (or try to do something on a page which isn't permitted, e.g. trying to write to a read-only page), but since pages are usually pretty big (multiples of a few kilobytes; on Mac OS, multiples of 4 KB), it often leaves you with lots of room to overflow.
If your array is on the stack (like yours), it can be even worse as the stack is usually pretty large (up to several megabytes). This is also the cause of security concerns: writing past the bounds of an array on the stack may overwrite the return address of the function and lead to arbitrary code execution (the famous "buffer overflow" security breaches).
The values you get when you read are just what happens to exist at this particular place. They are completely undefined.
If you use C++ (and are lucky enough to work with C++11), the standard defines the std::array<T, N> type, which is an array that knows its bounds. The at method will throw if you try to read past the end of it.
C does not check array bounds.
In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system. It occurs when your process tries to access memory that does not belong to it, or if it tries to access a memory address that doesn't exist.
Writing outside array bounds (actually even just performing the pointer arithmetic/array subscripting, even if you don't use the result to read or write anything) results in undefined behavior. Undefined behavior is not a reported or reportable error; it measn your program could do anything at all. It's very dangerous and you are fully responsible for avoiding it. C is not Java/Python/etc.
Memory allocation is more complicated than it seems. The variable "str," in this case, is on the stack, next to other variables, so it's not followed by unallocated memory. Memory is also usually word-aligned (one "word" is four to eight bytes.) You were possibly messing with the value for another variable, or with some "padding" (empty space added to maintain word alignment,) or something else entirely.
Like R.. said, it's undefined behavior. Out-of-bounds conditions could cause a segfault... or they could cause silent memory corruption. If you're modifying memory which has already been allocated, this will not be caught by the operating system. That's why out-of-bounds errors are so insidious in C.
Because C/C++ doesn't check bounds.
Arrays are internally pointers to a location in memory. When you call arr[index] what it does is:
type value = *(arr + index);
The results are big numbers (not necessarily) because they're garbage values. Just like an uninitialized variable.
You have to compile like this:
gcc -fsanitize=address -ggdb -o test test.c
There is more information here.
I have this code in C which takes in bunch of chars
#include<stdio.h>
# define NEWLINE '\n'
int main()
{
char c;
char str[6];
int i = 0;
while( ((c = getchar()) != NEWLINE))
{
str[i] = c;
++i;
printf("%d\n", i);
}
return 0;
}
Input is: testtesttest
Output:
1
2
3
4
5
6
7
8
117
118
119
120
My questions are:
Why don't I get an out of bounds (segmentation fault) exception although I clearly exceed the capacity of the array?
Why do the numbers in the output suddenly jump to very big numbers?
I tried this in C++ and got the same behavior. Could anyone please explain what is the reason for this?
C doesn't check array boundaries. A segmentation fault will only occur if you try to dereference a pointer to memory that your program doesn't have permission to access. Simply going past the end of an array is unlikely to cause that behaviour. Undefined behaviour is just that - undefined. It may appear to work just fine, but you shouldn't be relying on its safety.
Your program causes undefined behaviour by accessing memory past the end of the array. In this case, it looks like one of your str[i] = c writes overwrites the value in i.
C++ has the same rules as C does in this case.
When you access an array index, C and C++ don't do bound checking. Segmentation faults only happen when you try to read or write to a page that was not allocated (or try to do something on a page which isn't permitted, e.g. trying to write to a read-only page), but since pages are usually pretty big (multiples of a few kilobytes; on Mac OS, multiples of 4 KB), it often leaves you with lots of room to overflow.
If your array is on the stack (like yours), it can be even worse as the stack is usually pretty large (up to several megabytes). This is also the cause of security concerns: writing past the bounds of an array on the stack may overwrite the return address of the function and lead to arbitrary code execution (the famous "buffer overflow" security breaches).
The values you get when you read are just what happens to exist at this particular place. They are completely undefined.
If you use C++ (and are lucky enough to work with C++11), the standard defines the std::array<T, N> type, which is an array that knows its bounds. The at method will throw if you try to read past the end of it.
C does not check array bounds.
In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system. It occurs when your process tries to access memory that does not belong to it, or if it tries to access a memory address that doesn't exist.
Writing outside array bounds (actually even just performing the pointer arithmetic/array subscripting, even if you don't use the result to read or write anything) results in undefined behavior. Undefined behavior is not a reported or reportable error; it measn your program could do anything at all. It's very dangerous and you are fully responsible for avoiding it. C is not Java/Python/etc.
Memory allocation is more complicated than it seems. The variable "str," in this case, is on the stack, next to other variables, so it's not followed by unallocated memory. Memory is also usually word-aligned (one "word" is four to eight bytes.) You were possibly messing with the value for another variable, or with some "padding" (empty space added to maintain word alignment,) or something else entirely.
Like R.. said, it's undefined behavior. Out-of-bounds conditions could cause a segfault... or they could cause silent memory corruption. If you're modifying memory which has already been allocated, this will not be caught by the operating system. That's why out-of-bounds errors are so insidious in C.
Because C/C++ doesn't check bounds.
Arrays are internally pointers to a location in memory. When you call arr[index] what it does is:
type value = *(arr + index);
The results are big numbers (not necessarily) because they're garbage values. Just like an uninitialized variable.
You have to compile like this:
gcc -fsanitize=address -ggdb -o test test.c
There is more information here.
I have this code in C which takes in bunch of chars
#include<stdio.h>
# define NEWLINE '\n'
int main()
{
char c;
char str[6];
int i = 0;
while( ((c = getchar()) != NEWLINE))
{
str[i] = c;
++i;
printf("%d\n", i);
}
return 0;
}
Input is: testtesttest
Output:
1
2
3
4
5
6
7
8
117
118
119
120
My questions are:
Why don't I get an out of bounds (segmentation fault) exception although I clearly exceed the capacity of the array?
Why do the numbers in the output suddenly jump to very big numbers?
I tried this in C++ and got the same behavior. Could anyone please explain what is the reason for this?
C doesn't check array boundaries. A segmentation fault will only occur if you try to dereference a pointer to memory that your program doesn't have permission to access. Simply going past the end of an array is unlikely to cause that behaviour. Undefined behaviour is just that - undefined. It may appear to work just fine, but you shouldn't be relying on its safety.
Your program causes undefined behaviour by accessing memory past the end of the array. In this case, it looks like one of your str[i] = c writes overwrites the value in i.
C++ has the same rules as C does in this case.
When you access an array index, C and C++ don't do bound checking. Segmentation faults only happen when you try to read or write to a page that was not allocated (or try to do something on a page which isn't permitted, e.g. trying to write to a read-only page), but since pages are usually pretty big (multiples of a few kilobytes; on Mac OS, multiples of 4 KB), it often leaves you with lots of room to overflow.
If your array is on the stack (like yours), it can be even worse as the stack is usually pretty large (up to several megabytes). This is also the cause of security concerns: writing past the bounds of an array on the stack may overwrite the return address of the function and lead to arbitrary code execution (the famous "buffer overflow" security breaches).
The values you get when you read are just what happens to exist at this particular place. They are completely undefined.
If you use C++ (and are lucky enough to work with C++11), the standard defines the std::array<T, N> type, which is an array that knows its bounds. The at method will throw if you try to read past the end of it.
C does not check array bounds.
In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system. It occurs when your process tries to access memory that does not belong to it, or if it tries to access a memory address that doesn't exist.
Writing outside array bounds (actually even just performing the pointer arithmetic/array subscripting, even if you don't use the result to read or write anything) results in undefined behavior. Undefined behavior is not a reported or reportable error; it measn your program could do anything at all. It's very dangerous and you are fully responsible for avoiding it. C is not Java/Python/etc.
Memory allocation is more complicated than it seems. The variable "str," in this case, is on the stack, next to other variables, so it's not followed by unallocated memory. Memory is also usually word-aligned (one "word" is four to eight bytes.) You were possibly messing with the value for another variable, or with some "padding" (empty space added to maintain word alignment,) or something else entirely.
Like R.. said, it's undefined behavior. Out-of-bounds conditions could cause a segfault... or they could cause silent memory corruption. If you're modifying memory which has already been allocated, this will not be caught by the operating system. That's why out-of-bounds errors are so insidious in C.
Because C/C++ doesn't check bounds.
Arrays are internally pointers to a location in memory. When you call arr[index] what it does is:
type value = *(arr + index);
The results are big numbers (not necessarily) because they're garbage values. Just like an uninitialized variable.
You have to compile like this:
gcc -fsanitize=address -ggdb -o test test.c
There is more information here.
This question already has answers here:
Writing more characters than malloced. Why does it not fail?
(9 answers)
Why don't I get a segmentation fault when I write beyond the end of an array?
(4 answers)
Closed 6 years ago.
I was writing some code and I used the function calloc.
I understand that, when the first and the second arguments passed to this function are both zero, the function is going to alloc the necessary space for 0 elements, each of them with size 0, but here is the strange thing.
This program works fine even if n > 0. Why is that happening? I think it should display an error because I'm trying to write in a position of the array that doesn't exist. Thanks!
#include <stdio.h>
#include <stdlib.h>
int main(){
int n;
scanf("%d", &n);
int *test = calloc(0, 0);
for(int i = 0; i < n; i++){
test[i] = 100;
printf("%d ", test[i]);
}
return 0;
}
In C, a lot of wrong and "wrong" things don't display error messages from the compiler. In fact, you may not even see error messagens when you run the program -- but another person running your program in a different computer may see the error.
One important concept in C is called undefined behavior. To put it in simple terms, it means that the behavior of your program is unpredictable, but you can read more about this subject in this question: Undefined, unspecified and implementation-defined behavior.
Your program is undefined for a two reasons:
When either size or nmemb is zero, calloc() may return NULL. Your program is not checking the output of calloc(), so there's a good chance that when you do test[i] you are attempting to derreference a NULL pointer -- you should always check the result of calloc() and malloc().
When you call malloc() or calloc(), you are essentialy allocating dynamic memory for an array. You can use your pointer to access the elements of the array. But you can't access anything past the array. That is, if you allocate n elements you should not try to access the n+1-th element -- nether for reading.
Both items above make your program invoke undefined behavior. There might also be something undefined about accessing an empty object other than item #2 listed above, but I'm unsure.
You should always be careful about undefined behavior because, when your program invokes UB, it is essentialy unpredictable. You could see a compilation error, the program could give an error messagen, it could run successfuly without any problems or it could wipe out every single file in your hard disk.
I'm learning C and trying to build an dynamic array. I found a great tutorial on this but I don't get it all the way. The code I have now is
typedef struct{
int size;
int capacity;
char *data;
}Brry;
void brry_init(Brry *brry){
brry->size = 0;
brry->capacity = 2;
brry->data = (char *)calloc(brry->capacity, sizeof(char));
}
void brry_insert(Brry *brry, char value){
brry->data[brry->size++] = value; //so do check here if I have enough memory, but checking something out
}
int main(void){
Brry brry;
brry_init(&brry);
for (int i = 0; i < 3; i++) {
brry_insert(&brry, 'a');
}
printf("%c\n", brry.data[2]);
return 0;
}
In my main function I add 3 element to the array, but it only allocated for 2. But when I print it it works just fine? I expected some strange value to be printed. Why is this or am I doing something wrong?
You are writing into a buffer you didn't allocate enough memory for. That it works is not guaranteed.
What you're trying now is to read from some junk value in memory, who knows, which sometimes leads to a segmentation fault and other times you are lucky and get some junk value, and it doesn't segfault.
Writing into junk memory will invoke undefined behavior, so better watch it.
If you do get errors it will almost always be a segfault, short for segmentation fault.
Read up on it here.
The technical for what you're doing by reading past the bounds of the array is called derefencing a pointer. You might also want to read more about that here.
Yes, you are indeed writing to the third element of a two element array. This means your program will exhibit undefined behavior and you have no guarantee of what is going to happen. In your case you got lucky and the program "worked", but you might not always be so lucky.
Trying to read/write past the end of the array results in undefined behaviour. Exactly what happens depends on several factors which you cannot predict or control. Sometimes, it will seem to read and/or write successfully without complaining. Other times, it may fail horribly and effectively crash your program.
The critical thing is that you should never try to use or rely on undefined behaviour. It's unfortunately a common rookie mistake to think that it will always work because one test happened to succeed. That's definitely not the case, and is a recipe for disaster sooner or later.