This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 6 years ago.
I wrote a short program in C just to see what happens when you index past the end of an array.
I found that it mostly produces random values (I know they are not actually random) up until a point (52 indexes past the end in this case) where it produced 0 every single time. Every value past this point and the program crashes. Why is this? is it the end of the programs allocated memory space?
main()
{
int ar[4];
ar[0] = 99;
ar[1] = 45;
printf("array: %d, %d random value: %d", ar[0], ar[1], ar[55]);
}
Edit: I also found that if I alter this value that always ends up being 0 (i.e. ar[55] = 1000) then the return code for the program goes up.
... just to see what happens when you index past the end of an array
Trying to access out of bound memory, invokes undefined behavior. Anything can happen, just anything.
In your case, for some reason, the memory address for upto index 52 is accessible from the process, so it allows the access. Index past 52 points to a memory region not allocated to your process address space and thus, raises the access violation leading to segfault. This is not a deterministic behaviour, at all and there's no way you can rely on the output of a program invoking UB.
Accessing array elements beyond array boundaries (before 0 or from its size up) is undefined behavior. It may or may not produce values, it may cause the program to end abruptly, it may cause your system to stop, restart or catch fire...
Modern systems try to confine undefined behavior within reasonable limits via memory protection, user space limitations etc. but even user space code errors can have dire consequences:
pacemaker messing with its timing values can cause premature death ;
banking software overflowing array boundaries can overwrite account balance information crediting some random account with untold amounts of dollars.
your self driving car could behave worse than drunk drivers...
think of nuclear power-plant control software, airplane instruments, military stuff...
There is no question undefined behavior should be avoided.
Regarding the exit status, your program uses an obsolete syntax for the definition of main(), implicit return type, which is no longer supported in C99 and later, but does not return anything, which means its return value can be any random value, including a different value for every execution. C99 specified a kludge for the main() function and forces an implicit return 0; at the end of main(), but relying on it is bad style.
Similarly, invoking printf() without a proper prototype is undefined behavior. You should include <stdio.h> before the definition of function main().
Lastly, ar[0] and ar[1] are initialized in main(), but ar[2] and ar[3] are not. Be aware that accessing uninitialized values also has undefined behavior. The values can be anything at all, what you describe as random values, but on some systems, they could be trap values, causing undefined behavior by just reading them.
Some very handy tools are available to track this kind of problems in simple and complex programs, most notably valgrind. If you are curious about this subject, You should definitely look at it.
Related
I am new to this particular forum, so if there are any egregious formatting choices, please let me know, and I will promptly update.
In the book C Programming: A Modern Approach (authored by K. N. King), the following passage is written:
If a pointer variable p hasn't been initialized, attempting to use the value of p in any way causes undefined behavior. In the following example, the call of printf may print garbage, cause the program to crash, or have some other effect:
int *p;
printf("%d", *p);
As far as I understand pointers and how the compiler treats them, the declaration int *p effectively says, "Hey, if you dereference p in the future, I will look at a block of four consecutive bytes in memory, whose starting address is the value contained in p, and interpret those 4 bytes as a signed integer."
As to whether or not that is correct...if it is correct, then I am a little confused about why the aforementioned block of code:
is classified as undefined behavior
can cause programs to crash
can have some other effect
Commenting on the above-numbered cases:
My understanding of undefined behavior is that, at run time, anything can happen. With that being said, in the above code it appears to me that only a very defined subset of things can happen. I understand that p (due to its lack of initialization) is storing a random address that could point anywhere in memory. However, when printf is passed the dereferenced value *p, won't the compiler just look at the 4 consecutive bytes of memory (which start at whatever random address) and interpret those 4 bytes as a signed integer?
Therefore, printf should only do one thing: print a number that ranges anywhere from -2,147,483,648 to 2,147,483,647. Clearly that is a lot of different possible outputs, but does that really qualify as "undefined behavior". Further, how could such an "undefined behavior" lead to "program crash" or "have some other effect".
Any clarification would be greatly appreciated! Thanks!
The value of an uninitialized value is indeterminate. It could hold any value (including 0), and it's even possible that a different value could be read each time you attempt to read it. It's also possible that the value could be a trap representation, meaning that attempting to read it will trigger a processor exception that can crash the program.
Assuming you got lucky and were able to read a value for p, due to the virtual memory model most systems use that value may not correspond to an address that is mapped to the process's memory space. So if you attempt to read from that address by dereferencing the pointer it triggers a segmentation fault that can crash the program.
Notice that in both of these scenarios the crash occurs before printf is even called.
Also, compilers are allowed to assume your program does not have undefined behavior and will perform optimizations based on that assumption. That can make your program behave in ways you might not expect.
As for why doing these things is undefined behavior, it is because the C standard says so. In particular, appendix J2 gives as an example of undefined behavior:
The value of an object with automatic storage duration
is used while it is indeterminate. (6.2.4, 6.7.9, 6.8)
Undefined Behavior is defined as "we are not specifying what must happen, it's up to the implementers."
In a practical sense, *p is likely to contain whatever that memory area held last, maybe zeros, maybe something more random, maybe a chunk of data from a previous use. On occasion, a compiler will implicitly zero memory for safeties sake, sacrificing a bit of time to offer that feature.
Notably, if p were defined as a char*, and you printf'ed it, it'd try to print contents until it found a 0x00. If that takes you to a memory boundary, you could get a segmentation fault.
//this code should give segmentation error....but it works fine ....how is it possible.....i just got this code by hit and trail whle i was trying out some code of topic ARRAY OF POINTERS....PLZ can anyone explain
int main()
{
int i,size;
printf("enter the no of names to be entered\n");
scanf("%d",&size);
char *name[size];
for(i=0;i<size;i++)
{
scanf("%s",name[i]);
}
printf("the names in your array are\n");
for(i=0;i<size;i++)
{
printf("%s\n",&name[i]);
}
return 0
The problem in your code (which is incomplete, BTW; you need #include <stdio.h> at the top and a closing } at the bottom) can be illustrated in a much shorter chunk of code:
char *name[10]; // make the size an arbitrary constant
scanf("%s", name[0]); // Read into memory pointed to by an uninitialized pointer
(name could be a single pointer rather than an array, but I wanted to preserve your program's structure for clarity.)
The pointer name[0] has not been initialized, so its value is garbage. You pass that garbage pointer value to scanf, which reads characters from stdin and stores them in whatever memory location that garbage pointer happens to point to.
The behavior is undefined.
That doesn't mean that the program will die with a segmentation fault. C does not require checking for invalid pointers (nor does it forbid it, but most implementations don't do that kind of checking). So the most likely behavior is that your program will take whatever input you provide and attempt to store it in some arbitrary memory location.
If the garbage value of name[0] happens to point to a detectably invalid memory location, your program might die with a segmentation fault. That's if you're luck. If you're not, it might happen to point to some writable memory location that your program is able to modify. Storing data in that location might be harmless, or it might clobber some critical internal data structure that your program depends on.
Again, your program's behavior is undefined. That means the C standard imposes no requirements on its behavior. It might appear to "work", it might blow up in your face, or it might do anything that it's physically possible for a program to do. Apparently to behave correctly is probably the worst consequence of undefined behavior, since it makes it difficult to diagnose the problem (which will probably appear during a critical demo).
Incidentally, using scanf with a %s format specifier is inherently unsafe, since there's no way to limit the amount of data it will attempt to read. Even with a properly initialized pointer, there's no way to guarantee that it points to enough memory to hold whatever input it receives.
You may be accustomed to languages that do run-time checking and can reliably detect (most) problems like this. C is not such a language.
I'm not sure what's your test case (No enough reputation to post a comment). I just try to input it with 0 and 1\n1\n2\n.
It's a little complex to explain the detail. However, Let's start it :-). There are two things you should know. First, main() is a function. Second, you use a C99 feature, variable-length array or gnu extension, zero-length array (supported by gcc), on char *name[size];.
main() is a function, so all the variable declared in this function is local variables. Local variables locate at stack section. You must know about it first.
If you input 1\n1\n2\n, the variable-length array is used. The implementation of it is also to allocate it on stack. Notice that value of each element in array is not initialized as 0. That is the possible answer for you to execute without segmentation fault. You cannot make sure that it'll point to the address which isn't writable (At least failed on me).
If the input is 0\n, you will use extension feature, zero-length array, supported by GNU. As you saw, it means no element in array. The value of name is equal to &size, because size is the last local variable you declared before you declared name[0] (Consider stack pointer). The value of name[0] is equal to dereference to &size, that's zero (='\0') , so it will work fine.
The simple answer to your question is that a segmentation fault is:
A segmentation fault (aka segfault) are caused by a program trying to read or write an illegal memory location.
So it all depends upon what is classed as illegal. If the memory in question is a part of the valid address space, e.g. the stack, for the process the program is running, it may not cause a segfault.
When I run this code in a debugger the line:
scanf("%s, name[i]);
over writes the content of the size variable, clearly not the intended behaviour, and the code essentially goes into an infinite loop.
But that is just what happens on my 64 bit Intel linux machine using gcc 5.4. Another environment will probably do something different.
If I put the missing & in front of name[i] it works OK. Whether that is luck, or expertly exploiting the intended behaviour of C99 variable length arrays, as suggested. I'm afraid I don't know.
So welcome to the world of subtle memory overwriting bugs.
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 8 years ago.
In c program we can initialize an array like int array[10]. So it can store 10 integer value.But when I give input using loop it takes input more than 10 and doesn't show any error.
actually what is happening??
#include<stdio.H>
main()
{
int array[10],i;
for(i=0;i<=11;i++)
scanf("%d",&array[i]);
for(i=0;i<10;i++)
printf("%d",array[i]);
}
Because C doesn't do any array bounds checking. You as a programmer are responsible for making sure that you don't index out of bounds.
Depending on the used compiler and the system the code is running on, you might read random data from memory or get a SIGSEGV eventually when reading/writing out of bounds.
The C compiler and the runtime are not required to perform any array bounds checking.
What you describe is an example of a whole class of programming errors that result in undefined behavior. From Wikipedia:
In computer programming, undefined behavior refers to computer code whose behavior is specified to be arbitrary.
What this means is that the program is allowed to misbehave (or not) in any way it pleases.
In practice, any of the following are reasonably likely to happen when you write past the end of an array:
The program crashes, either immediately or at a later point.
Other, unrelated, data gets overwritten. This could result in arbitrary misbehaviour and/or in serious security vulnerabilities.
Internal data structures that are used to keep track of allocated memory get corrupted by the out-of-bounds write.
The program works exactly as if more memory had been allocated in the first place (memory is often allocated in block, and by luck there might happen to be some spare capacity after the end of the array).
(This is not an exhaustive list.)
There exist tools, such as Valgrid, that can help discover and diagnose this type of errors.
The C-language standard does not dictate how variables should be allocated in memory.
So the theoretical answer is that you are performing an unsafe memory access operation, which will lead to undefined behavior (anything could happen).
Technically, however, all compilers allocate local variables in the stack and global variables in the data-section, so the practical answer is:
In the case of a local array, you will either override some other local variable or perform an illegal memory access operation.
In the case of a global array, you will either override some other global variable or perform an illegal memory access operation.
I have this piece of code, and it runs perfectly fine, and I don't why:
int main(){
int len = 10;
char arr[len];
arr[150] = 'x';
}
Seriously, try it! It works (at least on my machine)!
It doesn't, however, work if I try to change elements at indices that are too large, for instance index 20,000. So the compiler apparently isn't smart enough to just ignore that one line.
So how is this possible? I'm really confused here...
Okay, thanks for all the answers!
So I can use this to write into memory consumed by other variables on the stack, like so:
#include <stdio.h>
main(){
char b[4] = "man";
char a[10];
a[10] = 'c';
puts(b);
}
Outputs "can". That's a really bad thing to do.
Okay, thanks.
C compilers generally do not generate code to check array bounds, for the sake of efficiency. Out-of-bounds array accesses result in "undefined behavior", and one
possible outcome is that "it works". It's not guaranteed to cause a crash or other
diagnostic, but if you're on an operating system with virtual memory support, and your array index points to a virtual memory location that hasn't yet been mapped to physical memory, your program is more likely to crash.
So how is this possible?
Because the stack was, on your machine, large enough that there happened to be a memory location on the stack at the location to which &arr[150] happened to correspond, and because your small example program exited before anything else referred to that location and perhaps crashed because you'd overwritten it.
The compiler you're using doesn't check for attempts to go past the end of the array (the C99 spec says that the result of arr[150], in your sample program, would be "undefined", so it could fail to compile it, but most C compilers don't).
Most implementations don't check for these kinds of errors. Memory access granularity is often very large (4 KiB boundaries), and the cost of finer-grained access control means that it is not enabled by default. There are two common ways for errors to cause crashes on modern OSs: either you read or write data from an unmapped page (instant segfault), or you overwrite data that leads to a crash somewhere else. If you're unlucky, then a buffer overrun won't crash (that's right, unlucky) and you won't be able to diagnose it easily.
You can turn instrumentation on, however. When using GCC, compile with Mudflap enabled.
$ gcc -fmudflap -Wall -Wextra test999.c -lmudflap
test999.c: In function ‘main’:
test999.c:3:9: warning: variable ‘arr’ set but not used [-Wunused-but-set-variable]
test999.c:5:1: warning: control reaches end of non-void function [-Wreturn-type]
Here's what happens when you run it:
$ ./a.out
*******
mudflap violation 1 (check/write): time=1362621592.763935 ptr=0x91f910 size=151
pc=0x7f43f08ae6a1 location=`test999.c:4:13 (main)'
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_check+0x41) [0x7f43f08ae6a1]
./a.out(main+0xa6) [0x400a82]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
Nearby object 1: checked region begins 0B into and ends 141B after
mudflap object 0x91f960: name=`alloca region'
bounds=[0x91f910,0x91f919] size=10 area=heap check=0r/3w liveness=3
alloc time=1362621592.763807 pc=0x7f43f08adda1
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_register+0x41) [0x7f43f08adda1]
/usr/lib/x86_64-linux-gnu/libmudflap.so.0(__mf_wrap_alloca_indirect+0x1a4) [0x7f43f08afa54]
./a.out(main+0x45) [0x400a21]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f43f0538ead]
number of nearby objects: 1
Oh look, it crashed.
Note that Mudflap is not perfect, it won't catch all of your errors.
Native C arrays do not get bounds checking. That would require additional instructions and data structures. C is designed for efficiency and leanness, so it doesn't specify features that trade performance for safety.
You can use a tool like valgrind, which runs your program in a kind of emulator and attempts to detect such things as buffer overflows by tracking which bytes are initialized and which aren't. But it's not infallible, for example if the overflowing access happens to perform an otherwise-legal access to another variable.
Under the hood, array indexing is just pointer arithmetic. When you say arr[ 150 ], you are just adding 150 times the sizeof one element and adding that to the address of arr to obtain the address of a particular object. That address is just a number, and it might be nonsense, invalid, or itself an arithmetic overflow. Some of these conditions result in the hardware generating a crash, when it can't find memory to access or detects virus-like activity, but none result in software-generated exceptions because there is no room for a software hook. If you want a safe array, you'll need to build functions around the principle of addition.
By the way, the array in your example isn't even technically of fixed size.
int len = 10; /* variable of type int */
char arr[len]; /* variable-length array */
Using a non-const object to set the array size is a new feature since C99. You could just as well have len be a function parameter, user input, etc. This would be better for compile-time analysis:
const int len = 10; /* constant of type int */
char arr[len]; /* constant-length array */
For the sake of completeness: The C standard doesn't specify bounds checking but neither is it prohibited. It falls under the category of undefined behavior, or errors that need not generate error messages, and can have any effect. It is possible to implement safe arrays, various approximations of the feature exist. C does nod in this direction by making it illegal, for example, to take the difference between two arrays in order to find the correct out-of-bounds index to access an arbitrary object A from array B. But the language is very free-form, and if A and B are part of the same memory block from malloc it is legal. In other words, the more C-specific memory tricks you use, the harder automatic verification becomes even with C-oriented tools.
Under the C spec, accessing an element past the end of an array is undefined behaviour. Undefined behaviour means that the specification does not say what would happen -- therefore, anything could happen, in theory. The program might crash, or it might not, or it might crash hours later in a completely unrelated function, or it might wipe your harddrive (if you got unlucky and poked just the right bits into the right place).
Undefined behaviour is not easily predictable, and it should absolutely never be relied upon. Just because something appears to work does not make it right, if it invokes undefined behaviour.
Because you were lucky. Or rather unlucky, because it means it's harder to find the bug.
The runtime will only crash if you start using the memory of another process (or in some cases unallocated memory). Your application is given a certain amount of memory when it opens, which in this case is enough, and you can mess about in your own memory as much as you like, but you'll give yourself a nightmare of a debugging job.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
initial value of int array in C
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main(){
char name[10];
printf("%s\n", name);
return 0;
}
What value does an uninitialized string in C hold ? Does the compiler automatically allocate a storage of size 10 and fill it with garbage values ? What basically happens on writing the above code ?
10 bytes are allocated on the stack, that's all. Their value is left as is, that means it is what ever had been written to such 10 bytes before having been allocated.
As the string is uninitialized, the value is not defined - it may be anything. I would also say it is unsafe to print uninitialized string as it does not have a terminating zero character, so in theory you may end up printing way more than 10 chars.
And another thing - C does not fill the storage with anything. It just leaves it the way it is.
EDIT: Please note I am not saying that as long as you have a 0 terminating character it is safe to access the uninitialized string. Invoking an undefined behavior is never safe as it is undefined - you never know what will happen.
The contents of uninitialized variables is - other than e.g. in Java - undefined. In other words: The contents consists of values pushed on the stack lately for other method invocations.
In your particular example, it's probably going to be zeroes. But it doesn't matter.
The key point is that it's undefined. If you can't trust it to always be the same, it's of no use to you. You can't make any assumptions. No other part of your code can depend on it. It's like it didn't exist.
If you're curious as to where the actual contents come from, they are remainders of previous execution contexts stored in the stack. If you run a few function calls, you're going to leave garbage lying around that your program will feel free to overwrite. Those only-good-for-overwriting bytes may end up in your string.
The C standard uses the term "unspecified", i. e. it can be anything. In real life, it will most likely be filled with random garbage, and if you're unlucky, it won't have a terminating zero byte, so you invoke undefined behavior and probably the call to printf() will crash (segmentation fault, anyone?).
it contains garbage (random) values. Please do see more information on storage classes to have a better understanding.