This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Array index out of bound behavior
(10 answers)
No out of bounds error
(7 answers)
Closed 1 year ago.
int main ()
{
char *strA = "abc";
int tam_strA = strlen(strA);
char strB[tam_strA];
strB[0] = 'a';
strB[1] = 'b';
strB[2] = 'c';
strB[3] = 'd';
strB[9] = 'z';
printf("%c", strB[9]);
return 0;
}
It prints 'z' normally. Why it doesn't return segmentation fault error? Since I'm trying to access an index that shouldn't exist because the size (amount of indexes) of strB is equal to tam_strA which is equal to 3.
Also, is there any difference/problem on doing char strB[strlen(strA)]; instead?
C language does not have a specification that stops you from accessing invalid memory, nor does it guarantee a segmentation fault. The only promise which is made is that, if you attempt to access invalid memory, that will cause undefined behavior.
Segmentation fault is one of the possible outcomes, NOT the ONLY one.
That said, the only problem with
char strB[strlen(strA)];
is that, strB will not be long enough to hold the content in strA, because it will lack one byte to hold the null-terminator. Sure, byte-wise use will be fine, but if you want to copy the content (or any content of the same length as strA) and use strB as a string, you'll run past the allocated memory (in absence of the null terminator), invoking undefined behavior.
You only get Segmentation fault when accessing the memory that you do not own. You own your entire stack. strB[9] is a valid memory access in the eyes of the OS. The reason that you shouldn't do this is because the compiler doesn't know that you're using that memory, so it might decide to use that memory for other uses. It's also good for improving readability and minimising mistakes from the programmer. And, the standard defines the using of undeclared memory to be undefined behaviour, so you can't use it safely. Declaring a variable like int x;(or an array) tells the compiler that you will use the memory at x.
This is actually related to this question: Why does the first element outside of a defined array default to zero?. Read the much more detailed answers over there.
Related
I have this code in C which takes in bunch of chars
#include<stdio.h>
# define NEWLINE '\n'
int main()
{
char c;
char str[6];
int i = 0;
while( ((c = getchar()) != NEWLINE))
{
str[i] = c;
++i;
printf("%d\n", i);
}
return 0;
}
Input is: testtesttest
Output:
1
2
3
4
5
6
7
8
117
118
119
120
My questions are:
Why don't I get an out of bounds (segmentation fault) exception although I clearly exceed the capacity of the array?
Why do the numbers in the output suddenly jump to very big numbers?
I tried this in C++ and got the same behavior. Could anyone please explain what is the reason for this?
C doesn't check array boundaries. A segmentation fault will only occur if you try to dereference a pointer to memory that your program doesn't have permission to access. Simply going past the end of an array is unlikely to cause that behaviour. Undefined behaviour is just that - undefined. It may appear to work just fine, but you shouldn't be relying on its safety.
Your program causes undefined behaviour by accessing memory past the end of the array. In this case, it looks like one of your str[i] = c writes overwrites the value in i.
C++ has the same rules as C does in this case.
When you access an array index, C and C++ don't do bound checking. Segmentation faults only happen when you try to read or write to a page that was not allocated (or try to do something on a page which isn't permitted, e.g. trying to write to a read-only page), but since pages are usually pretty big (multiples of a few kilobytes; on Mac OS, multiples of 4 KB), it often leaves you with lots of room to overflow.
If your array is on the stack (like yours), it can be even worse as the stack is usually pretty large (up to several megabytes). This is also the cause of security concerns: writing past the bounds of an array on the stack may overwrite the return address of the function and lead to arbitrary code execution (the famous "buffer overflow" security breaches).
The values you get when you read are just what happens to exist at this particular place. They are completely undefined.
If you use C++ (and are lucky enough to work with C++11), the standard defines the std::array<T, N> type, which is an array that knows its bounds. The at method will throw if you try to read past the end of it.
C does not check array bounds.
In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system. It occurs when your process tries to access memory that does not belong to it, or if it tries to access a memory address that doesn't exist.
Writing outside array bounds (actually even just performing the pointer arithmetic/array subscripting, even if you don't use the result to read or write anything) results in undefined behavior. Undefined behavior is not a reported or reportable error; it measn your program could do anything at all. It's very dangerous and you are fully responsible for avoiding it. C is not Java/Python/etc.
Memory allocation is more complicated than it seems. The variable "str," in this case, is on the stack, next to other variables, so it's not followed by unallocated memory. Memory is also usually word-aligned (one "word" is four to eight bytes.) You were possibly messing with the value for another variable, or with some "padding" (empty space added to maintain word alignment,) or something else entirely.
Like R.. said, it's undefined behavior. Out-of-bounds conditions could cause a segfault... or they could cause silent memory corruption. If you're modifying memory which has already been allocated, this will not be caught by the operating system. That's why out-of-bounds errors are so insidious in C.
Because C/C++ doesn't check bounds.
Arrays are internally pointers to a location in memory. When you call arr[index] what it does is:
type value = *(arr + index);
The results are big numbers (not necessarily) because they're garbage values. Just like an uninitialized variable.
You have to compile like this:
gcc -fsanitize=address -ggdb -o test test.c
There is more information here.
This question already has answers here:
No out of bounds error
(7 answers)
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
In C, I noticed I can write to static array off-limits, for example:
static char a[10] = {0};
for (int i=0; i<20; i++) {
a[i] = 'a'; // Should fail when i > 9
}
I expected to get segmentation fault but it executes just fine.
If static arrays were allocated on the stack, it would make sense, but they're not, so why is that so?
Note: static int arrays behave similarly. Didn't check other types.
Thanks.
Edit: This is not a duplicate since the other questions were not about static arrays. Unlike "regular" arrays, static arrays are allocated in BSS. The behavior might be different, which is why I'm asking separately.
You will only get a segmentation fault when you actually attempt to write to memory
that is an illegal address. Your example code writes beyond what you allocated for the array, but that isn't an address beyond what the OS determines is legal for you to use.
Even if you do not get a segmentation fault, your example code could corrupt other data structures in your code and cause major faulty behavior of a program, and possibly even worse, it can cause intermittent and difficult to debug faulty behavior.
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Writing to pointer out of bounds after malloc() not causing error
(7 answers)
Why is it that we can write outside of bounds in C?
(7 answers)
What happens if I try to access memory beyond a malloc()'d region?
(5 answers)
Why does int pointer '++' increment by 4 rather than 1?
(5 answers)
Closed 3 years ago.
I've been digging into memory allocation and pointers in C. I was under the impression that if you do not allocate enough memory for a value and then try to put that value in that memory cell, the program would either crash or behave incorrectly.
But what I get is a seemingly correct output where I'd expect something else.
#include <stdio.h>
#include <stdlib.h>
int main()
{
// Here we intentionally allocate only 1 byte,
// even though an `int` takes up 4 bytes
int * address = malloc(1);
address[0] = 16777215; // this value surely takes more than 3 bytes. It cannot fit 1 byte.
address[1] = 1337; // just for demo, let's put a random other number in the next memory cell.
printf("%i\n", address[0]); // Prints 16777215. How?! Didn't we overwrite a part of the number?
return 0;
}
Why does this work? Does malloc actually allocate more than the number of bytes that we pass to it?
EDIT
Thanks for the comments! But I wish to note that being able to write to unassigned memory is not the part that surprises me and it's not part of the question. I know that writing out of bounds is possible and it is "undefined behavior".
For me, the unexpected part is that the line address[1] = 1337; does not in any way corrupt the int value at address[0].
It seems that the explanations for this diverge, too.
#Mini suggests that the reason for this is that malloc actually allocates more than what's passed, because of cross-platform differences.
#P__J__ in the comments says that address[1] for some reason points to the next sizeof(int) byte, not to the next byte. But I don't think I understand what controls this behavior then, because malloc doesn't seem to know about what types we will put into the allocated blocks.
EDIT 2
So thanks to the comments, I believe I understand the program behavior now.
The answer lies in the pointer arithmetic. The program "knows" that an address pointer is of type int, and therefore adding 1 to it (or accessing via address[1]) gives an address of the block that lies 4 (sizeof(int)) bytes ahead.
And if we really wanted, we could move just one byte and really corrupt the value at address[0] by coercing address to char * as described in this answer
Thanks to all and to #P__J__ and #Blastfurnace in particular!
malloc often allocates more than you actually ask for (all system/environment/OS dependent), which is why it works in you scenario (sometimes). However, this is still undefined behavior it can actually allocate only 1 byte (and you are writing to what may not be allocated heap memory).
C doesn't mandate any kinds of bounds checking on array accesses, and it's possible to overflow storage and write into memory you don't technically own. As long as you don't clobber anything "important", your code will appear to work as intended.
However, the behavior on buffer overruns is undefined, so the results will not generally be predictable or repeatable.
I have this code in C which takes in bunch of chars
#include<stdio.h>
# define NEWLINE '\n'
int main()
{
char c;
char str[6];
int i = 0;
while( ((c = getchar()) != NEWLINE))
{
str[i] = c;
++i;
printf("%d\n", i);
}
return 0;
}
Input is: testtesttest
Output:
1
2
3
4
5
6
7
8
117
118
119
120
My questions are:
Why don't I get an out of bounds (segmentation fault) exception although I clearly exceed the capacity of the array?
Why do the numbers in the output suddenly jump to very big numbers?
I tried this in C++ and got the same behavior. Could anyone please explain what is the reason for this?
C doesn't check array boundaries. A segmentation fault will only occur if you try to dereference a pointer to memory that your program doesn't have permission to access. Simply going past the end of an array is unlikely to cause that behaviour. Undefined behaviour is just that - undefined. It may appear to work just fine, but you shouldn't be relying on its safety.
Your program causes undefined behaviour by accessing memory past the end of the array. In this case, it looks like one of your str[i] = c writes overwrites the value in i.
C++ has the same rules as C does in this case.
When you access an array index, C and C++ don't do bound checking. Segmentation faults only happen when you try to read or write to a page that was not allocated (or try to do something on a page which isn't permitted, e.g. trying to write to a read-only page), but since pages are usually pretty big (multiples of a few kilobytes; on Mac OS, multiples of 4 KB), it often leaves you with lots of room to overflow.
If your array is on the stack (like yours), it can be even worse as the stack is usually pretty large (up to several megabytes). This is also the cause of security concerns: writing past the bounds of an array on the stack may overwrite the return address of the function and lead to arbitrary code execution (the famous "buffer overflow" security breaches).
The values you get when you read are just what happens to exist at this particular place. They are completely undefined.
If you use C++ (and are lucky enough to work with C++11), the standard defines the std::array<T, N> type, which is an array that knows its bounds. The at method will throw if you try to read past the end of it.
C does not check array bounds.
In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system. It occurs when your process tries to access memory that does not belong to it, or if it tries to access a memory address that doesn't exist.
Writing outside array bounds (actually even just performing the pointer arithmetic/array subscripting, even if you don't use the result to read or write anything) results in undefined behavior. Undefined behavior is not a reported or reportable error; it measn your program could do anything at all. It's very dangerous and you are fully responsible for avoiding it. C is not Java/Python/etc.
Memory allocation is more complicated than it seems. The variable "str," in this case, is on the stack, next to other variables, so it's not followed by unallocated memory. Memory is also usually word-aligned (one "word" is four to eight bytes.) You were possibly messing with the value for another variable, or with some "padding" (empty space added to maintain word alignment,) or something else entirely.
Like R.. said, it's undefined behavior. Out-of-bounds conditions could cause a segfault... or they could cause silent memory corruption. If you're modifying memory which has already been allocated, this will not be caught by the operating system. That's why out-of-bounds errors are so insidious in C.
Because C/C++ doesn't check bounds.
Arrays are internally pointers to a location in memory. When you call arr[index] what it does is:
type value = *(arr + index);
The results are big numbers (not necessarily) because they're garbage values. Just like an uninitialized variable.
You have to compile like this:
gcc -fsanitize=address -ggdb -o test test.c
There is more information here.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Invalid read/write sometimes creates segmentation fault and sometimes does not
I was doing some experimentation with malloc and wrote this very small program on a linux m/c:
int main(){
int *p=NULL;
p = (int *)malloc(10);
*(p + 33*1000) = 5;
free(p);
return 0;
}
This program is not giving segmentation fault but if i change the line 5 to this
*(p + 34*1000) = 5;
Then it gives a segmentation fault. On my system the page size is 4K.
I am not able to explain why its giving a segmentation fault at around 128Kb(34*1000 is around 128K) after p.
If anyone can explain this with the perspective of memory management in linux that would be great.
You are accessing beyond the memory you allocated for p with both *(p+33*1000),*(p+34*1000) which is undefined behaviour. You can't reason out as it may "work" or crash or anything can happen.
You are modifying memory that you have not allocated yourself - the address you are writing to is way beyond the limits of your array. Whenever you write beyond an array bounds you run the risk of a segfault - it depends on the memory location. It may not segfault depending on the address, but there is no way that this is a good thing to do and results will be unpredictable.
This program exhibits undefined behavior (per the C standard) and, strictly speaking, there's nothing else to explain about it.
The language standard does not in any way describe how memory management is or should be implemented at the low level on any particular platform. Some memory areas can be accessible despite you not explicitly allocating them.
After allocating space for 10 integers you can only dereference those 10 by *(p+0), *(p+1), ... ,*p(p+8) ,*p(p+9). No more else you're beyond the extents of what you've allocated.
Dereferencing elsewhere may mean you're attempting to use an invalid pointer, and hence the segmentation fault.
May be that's the available memory in the system.
In this case all <= 33 * 1000 will pass and all >= 34 *1000 will fail.