how badly can c crash? - c

I have often heard that C can crash spectacularly. Recently I got my first taste of this when a function I expected to return a string instead returned little happy faces. Since then I have been being more careful with initializing pointers and mallocing memory for arrays. Still, though, I have trouble believing that a program could crash THAT badly...
I guess it would depend on the scope of the program though? I mean, if a bug in a program that dealt with your fan copied happy faces into some important space in memory...?
My question is, how much myth is there in the world of spectacular C crashes? Can I get some concrete examples of dangerous things that one ought to avoid?
z.

http://xkcd.com/371/
http://blog.raamdev.com/category/technology/programming/cc
http://en.wikipedia.org/wiki/Segmentation_fault

I think it was probably worse way back in the pre-virtual memory days when you could trash other processes memory, but nowadays the worst that can happen is really just crashing your own program. Usually via segmentation faults from bad pointers.
That excludes of course blowing things up by misusing system resources - you can do that in any language.

Back when I was learning to program C++, it was on a Mac running system 7 or 8 I don't remember which. Anyway, it had no protected virtual memory so a lot of mistakes like leaving a dangling pointer or a buffer overrun would cause the whole computer to crash. I recall that when Apple first announced they were going to create a new OS which had protected memory space at Macworld or something, they showed the source code for a program:
while (true)
*(int *)i++ = 1;
And when they ran the program and just the program terminated, and not the whole machine (it had a message like "You do not need to restart your computer") the whole room full of developers apparantly burst into applause. Anyway, obviously not having protected memory really made programming C or C++ really tough because of the increased severity of the crash.
Nowadays it is not such a big deal unless you are programming something that runs at supervisor level, you do not have the ability to crash the OS.

The worst thing that happened to me were memory corruptions which did not cause a crash immediately, but after a while.. making it so hard to detect.. argh

The operating system prevents most horrible issues these days. The worst I've ever done is hard-lock the machine (just had to reboot by holding down the power button) and scramble a few files.
All depends what resources you're accessing, really. If you're writing files, there are some ways directory structures can get tangled that used to confuse system utilities, but most of those problems have been fixed. If you're doing something as root, well, then you can sure make a mess because many more system files are writeable. If you're using the network, there's a lot of stuff that can go moderately wrong, but not much more than using up too much bandwidth is likely. Of course, a few years of programming and you'll see all sorts of unlikely things.
For the most part though, it's ok to experiment and play around. These days the systems are resilient enough that you won't make a mess that's too hard to get back out of. The operating system keeps each program to its own piece of memory, and disallows access to change critical systems unless you're administrator/root. Your garden variety dangling pointer may print funny things or crash your program, but it isn't going to destroy a modern computer.
From a comment in another reply: "I am using the Nintendo DS to run them"
Ok, that matters! (First: Awesome idea! Sounds like fun.) Coding for something like that is not the same in terms of what can go wrong as most coding for a desktop computer. A brief look at the documentation for libnds and some tutorials on Nintendo DS programming indicates to me that there's no OS to speak of. So, I have no idea how much you could do with a stray pointer, probably a lot. Possibly something damaging. It might be a good idea to hunt for people who've done programming for that platform before, see what they have to say.

Here is a quick snippet from Henry Spencer's "Ten Commandments for C Programmers":
Commandment #2 - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end.
Clearly the holy scriptures were mis-transcribed here, as the words should have been ``null pointer'', to minimize confusion between the concept of null pointers and the macro NULL (of which more anon). Otherwise, the meaning is plain. A null pointer points to regions filled with dragons, demons, core dumps, and numberless other foul creatures, all of which delight in frolicing in thy program if thou disturb their sleep. A null pointer doth not point to a 0 of any type, despite some blasphemous old code which impiously assumes this.
For those who are not familiar with C, I think the best written concise introduction to C is done by "The Ten Commandments for C Programmers (Annotated Edition)" by Henry Spencer. The way it is written really gets across to the reader the dangers of C ... while at the same time being funny ( which means the reader will actually pay attention more ).
=========================
Personally... C doesn't crash that badly when you are doing desktop development because you have the luxury of seg-faulting. Seg-faults are when the OS sees you trying to REALLY F' things up and it says "hey! you aren't allowed there" and stops the process.
When you are doing embedded C development... that is when you get the REALLY spectacular crazy stuff... i.e. they REQUIRE you to power-cycle 99.9% of the time. Like this one time where the code somehow messes up my call stack... and then you are executing some random other function... and then the ISR somehow is still going... and it takes 2 weeks to fix that kind o bug.

Well, if you're writing kernel code, sometimes you can overwrite system critical bits of memory, such as interrupt vectors, global descriptor tables, process tables, causing all sorts of fun stuff!

C itself can't crash anything. Sloppy programming can crash everything.
The "happy faces" are an indication that your code corrupted memory. The only thing C had to do with that is the fact that you chose to use it. (And the fact your OS allowed it to happen is surprising - are you still running a version of DOS?)

Nowadays is kind of hard making C crash that hard (unless you are coding an OS Kernel or something like that).
Back in the DOS/Win95/Win98 days, you could make a C program chash really, really badly. I used to get this a lot:
Whenever I had a dangerous pointer operation which messed with the memory, I got a character based screen full of all sorts of characters, in different colors, some of them, blinking!!! I guess the operations messed someway with Video Memory.
But today, since processes run within safe kernels, the worst you'll get if your process going away.

prior to protected-memory architectures, most of the jump vectors used by the operating system were stored in the zero page of memory (addresses starting at zero)
so writing to memory locations in the zero page - easily done with null/bad pointers - would change the operating system's jump vectors, causing all kinds of bizarre crash behavior - everything from locked-out keyboards and blinking video screens full of garbage to hard drive light flashing frenzy, blue screen of death, rebooting, etc.
[coding in assembly language was even more fun]

If your code is running on a remotely modern OS, you can't copy happy faces into random points in memory. You can crash as much as you want, and it will just result in the termination of your process.
The closest you can come to actually messing up your system is abusing processor/memory/disk resources, or spawning so many subprocesses that the OS runs out of PIDs (if it's still using a 32-bit value to store those).

There was a computer, the Commodore PET 4032 (aka the "Fat 40") where it was actually possible to permanently burn out the video chip if you poked the wrong value into the wrong part of memory. You can imagine that if there had been a C compiler on that machine, a wild pointer could actually do irreparable physical damage to the computer.

Back in the DOS days I actually over-wrote bios information. Had to get a tech to fix it. My first home computer - 286. Unbootable after a day or two.

On any OS with protected memory the worst that can happen is that your process crashes. Now if your process happens to be part of the kernel or a kernel extension then you can obviously crash the whole OS but that is the worst you can do.
However, really this the same with many other languages (for example in C deference a null pointer, in java use an object reference set to null, both will crash your process).
So, I believe with protected memory, C can't do any more damage than any other Language (unless your process is part of the OS ;))

Well, back in the DOS days, I managed to over-write part of the boot sector - nothing like rebooting to find "OS Not Found" or whatever the message was.
You learn the hard way to be very, very careful writing to disk after that...

C lets you deal with the machine pretty close to directly. How spectacularly it can crash is a function of what the machine can do.
So: A user mode process without special privileges in a nice modern operating system won't really do that much. But you have software all over the place. Think about the software controlling the braking systems in a train. Think about the software running the emergency intercom when someone really needs help. Think about the software running the signs on a busy highway. Think about the software running a guided missile system.
There is software all over the place these days. A lot of it is written in C.

Back when I was writing the Win98 drivers, the BSOD haunted everyone. I remember the following mistake I made
typedef struct _SOME_TAG_ {
int nSomeVar;
int nSomeMore;
...
} MYSTRUCT, *PMYSTRUCT;
....
PMYSTRUCT pMyStruct;
// And I use this structure without allocating any memory ;-)
pMyStruct->nSomeVar = 0;
And the driver crashes were so horrible but we had a SoftICE from Numega, Though it is only 10 years ago .. I feel its like ages back

Back in college I was tasked on creating a multi-threading proxy.
From time to time, the proxy wasn't responding any of the resources the page pulled. The code causing the issue:
Some code with overflow issues, that used variables near a file handler. Before knowing what was happening, I found it really weird that moving the file handler declaration "fixed" the issue, lol.
Ps. check this one (not in c, but a good story :)): http://trixter.wordpress.com/2006/02/02/computing-myth-1-software-cannot-damage-hardware/

If the software in question is running on a PC, it may "merely" bring down your computer.
If, however, it's controlling the operation of the engine in your car -- or worse, the ABS -- it won't just be the software that crashes...

A crash is not the worst thing that can happen.
I read about an old Unix file compression program (you know, like Zip) that didn't check the return value from fclose. Yes, fclose can return an error. Output to a file is normally buffered, so even if a call to fwrite or putc seems to work, and returns OK, the data may still be in a buffer, waiting to be written. When fclose is called, any unwritten data is flushed, and this may fail, since (for example) the disk may be full. And since the compression program was usually run just because the disk was nearly full, this happened rather often. So the program silently truncated the new, compressed file, the original uncompressed file was removed, and the next year or so when someone tried to uncompress the file, the end was missing!
I think this is a good example of why throwing exceptions can be good thing.

Related

What happens when you do array[-1]? [duplicate]

How dangerous is accessing an array outside of its bounds (in C)? It can sometimes happen that I read from outside the array (I now understand I then access memory used by some other parts of my program or even beyond that) or I am trying to set a value to an index outside of the array. The program sometimes crashes, but sometimes just runs, only giving unexpected results.
Now what I would like to know is, how dangerous is this really? If it damages my program, it is not so bad. If on the other hand it breaks something outside my program, because I somehow managed to access some totally unrelated memory, then it is very bad, I imagine.
I read a lot of 'anything can happen', 'segmentation might be the least bad problem', 'your hard disk might turn pink and unicorns might be singing under your window', which is all nice, but what is really the danger?
My questions:
Can reading values from way outside the array damage anything
apart from my program? I would imagine just looking at things does
not change anything, or would it for instance change the 'last time
opened' attribute of a file I happened to reach?
Can setting values way out outside of the array damage anything apart from my
program? From this
Stack Overflow question I gather that it is possible to access
any memory location, that there is no safety guarantee.
I now run my small programs from within XCode. Does that
provide some extra protection around my program where it cannot
reach outside its own memory? Can it harm XCode?
Any recommendations on how to run my inherently buggy code safely?
I use OSX 10.7, Xcode 4.6.
As far as the ISO C standard (the official definition of the language) is concerned, accessing an array outside its bounds has "undefined behavior". The literal meaning of this is:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
A non-normative note expands on this:
Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So that's the theory. What's the reality?
In the "best" case, you'll access some piece of memory that's either owned by your currently running program (which might cause your program to misbehave), or that's not owned by your currently running program (which will probably cause your program to crash with something like a segmentation fault). Or you might attempt to write to memory that your program owns, but that's marked read-only; this will probably also cause your program to crash.
That's assuming your program is running under an operating system that attempts to protect concurrently running processes from each other. If your code is running on the "bare metal", say if it's part of an OS kernel or an embedded system, then there is no such protection; your misbehaving code is what was supposed to provide that protection. In that case, the possibilities for damage are considerably greater, including, in some cases, physical damage to the hardware (or to things or people nearby).
Even in a protected OS environment, the protections aren't always 100%. There are operating system bugs that permit unprivileged programs to obtain root (administrative) access, for example. Even with ordinary user privileges, a malfunctioning program can consume excessive resources (CPU, memory, disk), possibly bringing down the entire system. A lot of malware (viruses, etc.) exploits buffer overruns to gain unauthorized access to the system.
(One historical example: I've heard that on some old systems with core memory, repeatedly accessing a single memory location in a tight loop could literally cause that chunk of memory to melt. Other possibilities include destroying a CRT display, and moving the read/write head of a disk drive with the harmonic frequency of the drive cabinet, causing it to walk across a table and fall onto the floor.)
And there's always Skynet to worry about.
The bottom line is this: if you could write a program to do something bad deliberately, it's at least theoretically possible that a buggy program could do the same thing accidentally.
In practice, it's very unlikely that your buggy program running on a MacOS X system is going to do anything more serious than crash. But it's not possible to completely prevent buggy code from doing really bad things.
In general, Operating Systems of today (the popular ones anyway) run all applications in protected memory regions using a virtual memory manager. It turns out that it is not terribly EASY (per se) to simply read or write to a location that exists in REAL space outside the region(s) that have been assigned / allocated to your process.
Direct answers:
Reading will almost never directly damage another process, however it can indirectly damage a process if you happen to read a KEY value used to encrypt, decrypt, or validate a program / process. Reading out of bounds can have somewhat adverse / unexpected affects on your code if you are making decisions based on the data you are reading
The only way your could really DAMAGE something by writing to a loaction accessible by a memory address is if that memory address that you are writing to is actually a hardware register (a location that actually is not for data storage but for controlling some piece of hardware) not a RAM location. In all fact, you still wont normally damage something unless you are writing some one time programmable location that is not re-writable (or something of that nature).
Generally running from within the debugger runs the code in debug mode. Running in debug mode does TEND to (but not always) stop your code faster when you have done something considered out of practice or downright illegal.
Never use macros, use data structures that already have array index bounds checking built in, etc....
ADDITIONAL
I should add that the above information is really only for systems using an operating system with memory protection windows. If writing code for an embedded system or even a system utilizing an operating system (real-time or other) that does not have memory protection windows (or virtual addressed windows) that one should practice a lot more caution in reading and writing to memory. Also in these cases SAFE and SECURE coding practices should always be employed to avoid security issues.
Not checking bounds can lead to to ugly side effects, including security holes. One of the ugly ones is arbitrary code execution. In classical example: if you have an fixed size array, and use strcpy() to put a user-supplied string there, the user can give you a string that overflows the buffer and overwrites other memory locations, including code address where CPU should return when your function finishes.
Which means your user can send you a string that will cause your program to essentially call exec("/bin/sh"), which will turn it into shell, executing anything he wants on your system, including harvesting all your data and turning your machine into botnet node.
See Smashing The Stack For Fun And Profit for details on how this can be done.
You write:
I read a lot of 'anything can happen', 'segmentation might be the
least bad problem', 'your harddisk might turn pink and unicorns might
be singing under your window', which is all nice, but what is really
the danger?
Lets put it that way: load a gun. Point it outside the window without any particular aim and fire. What is the danger?
The issue is that you do not know. If your code overwrites something that crashes your program you are fine because it will stop it into a defined state. However if it does not crash then the issues start to arise. Which resources are under control of your program and what might it do to them? I know at least one major issue that was caused by such an overflow. The issue was in a seemingly meaningless statistics function that messed up some unrelated conversion table for a production database. The result was some very expensive cleanup afterwards. Actually it would have been much cheaper and easier to handle if this issue would have formatted the hard disks ... with other words: pink unicorns might be your least problem.
The idea that your operating system will protect you is optimistic. If possible try to avoid writing out of bounds.
Not running your program as root or any other privileged user won't harm any of your system, so generally this might be a good idea.
By writing data to some random memory location you won't directly "damage" any other program running on your computer as each process runs in it's own memory space.
If you try to access any memory not allocated to your process the operating system will stop your program from executing with a segmentation fault.
So directly (without running as root and directly accessing files like /dev/mem) there is no danger that your program will interfere with any other program running on your operating system.
Nevertheless - and probably this is what you have heard about in terms of danger - by blindly writing random data to random memory locations by accident you sure can damage anything you are able to damage.
For example your program might want to delete a specific file given by a file name stored somewhere in your program. If by accident you just overwrite the location where the file name is stored you might delete a very different file instead.
NSArrays in Objective-C are assigned a specific block of memory. Exceeding the bounds of the array means that you would be accessing memory that is not assigned to the array. This means:
This memory can have any value. There's no way of knowing if the data is valid based on your data type.
This memory may contain sensitive information such as private keys or other user credentials.
The memory address may be invalid or protected.
The memory can have a changing value because it's being accessed by another program or thread.
Other things use memory address space, such as memory-mapped ports.
Writing data to unknown memory address can crash your program, overwrite OS memory space, and generally cause the sun to implode.
From the aspect of your program you always want to know when your code is exceeding the bounds of an array. This can lead to unknown values being returned, causing your application to crash or provide invalid data.
You may want to try using the memcheck tool in Valgrind when you test your code -- it won't catch individual array bounds violations within a stack frame, but it should catch many other sorts of memory problem, including ones that would cause subtle, wider problems outside the scope of a single function.
From the manual:
Memcheck is a memory error detector. It can detect the following problems that are common in C and C++ programs.
Accessing memory you shouldn't, e.g. overrunning and underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.
Using undefined values, i.e. values that have not been initialised, or that have been derived from other undefined values.
Incorrect freeing of heap memory, such as double-freeing heap blocks, or mismatched use of malloc/new/new[] versus free/delete/delete[]
Overlapping src and dst pointers in memcpy and related functions.
Memory leaks.
ETA: Though, as Kaz's answer says, it's not a panacea, and doesn't always give the most helpful output, especially when you're using exciting access patterns.
If you ever do systems level programming or embedded systems programming, very bad things can happen if you write to random memory locations. Older systems and many micro-controllers use memory mapped IO, so writing to a memory location that maps to a peripheral register can wreak havoc, especially if it is done asynchronously.
An example is programming flash memory. Programming mode on the memory chips is enabled by writing a specific sequence of values to specific locations inside the address range of the chip. If another process were to write to any other location in the chip while that was going on, it would cause the programming cycle to fail.
In some cases the hardware will wrap addresses around (most significant bits/bytes of address are ignored) so writing to an address beyond the end of the physical address space will actually result in data being written right in the middle of things.
And finally, older CPUs like the MC68000 can locked up to the point that only a hardware reset can get them going again. Haven't worked on them for a couple of decades but I believe it's when it encountered a bus error (non-existent memory) while trying to handle an exception, it would simply halt until the hardware reset was asserted.
My biggest recommendation is a blatant plug for a product, but I have no personal interest in it and I am not affiliated with them in any way - but based on a couple of decades of C programming and embedded systems where reliability was critical, Gimpel's PC Lint will not only detect those sort of errors, it will make a better C/C++ programmer out of you by constantly harping on you about bad habits.
I'd also recommend reading the MISRA C coding standard, if you can snag a copy from someone. I haven't seen any recent ones but in ye olde days they gave a good explanation of why you should/shouldn't do the things they cover.
Dunno about you, but about the 2nd or 3rd time I get a coredump or hangup from any application, my opinion of whatever company produced it goes down by half. The 4th or 5th time and whatever the package is becomes shelfware and I drive a wooden stake through the center of the package/disc it came in just to make sure it never comes back to haunt me.
I'm working with a compiler for a DSP chip which deliberately generates code that accesses one past the end of an array out of C code which does not!
This is because the loops are structured so that the end of an iteration prefetches some data for the next iteration. So the datum prefetched at the end of the last iteration is never actually used.
Writing C code like that invokes undefined behavior, but that is only a formality from a standards document which concerns itself with maximal portability.
More often that not, a program which accesses out of bounds is not cleverly optimized. It is simply buggy. The code fetches some garbage value and, unlike the optimized loops of the aforementioned compiler, the code then uses the value in subsequent computations, thereby corrupting theim.
It is worth catching bugs like that, and so it is worth making the behavior undefined for even just that reason alone: so that the run-time can produce a diagnostic message like "array overrun in line 42 of main.c".
On systems with virtual memory, an array could happen to be allocated such that the address which follows is in an unmapped area of virtual memory. The access will then bomb the program.
As an aside, note that in C we are permitted to create a pointer which is one past the end of an array. And this pointer has to compare greater than any pointer to the interior of an array.
This means that a C implementation cannot place an array right at the end of memory, where the one plus address would wrap around and look smaller than other addresses in the array.
Nevertheless, access to uninitialized or out of bounds values are sometimes a valid optimization technique, even if not maximally portable. This is for instance why the Valgrind tool does not report accesses to uninitialized data when those accesses happen, but only when the value is later used in some way that could affect the outcome of the program. You get a diagnostic like "conditional branch in xxx:nnn depends on uninitialized value" and it can be sometimes hard to track down where it originates. If all such accesses were trapped immediately, there would be a lot of false positives arising from compiler optimized code as well as correctly hand-optimized code.
Speaking of which, I was working with some codec from a vendor which was giving off these errors when ported to Linux and run under Valgrind. But the vendor convinced me that only several bits of the value being used actually came from uninitialized memory, and those bits were carefully avoided by the logic.. Only the good bits of the value were being used and Valgrind doesn't have the ability to track down to the individual bit. The uninitialized material came from reading a word past the end of a bit stream of encoded data, but the code knows how many bits are in the stream and will not use more bits than there actually are. Since the access beyond the end of the bit stream array does not cause any harm on the DSP architecture (there is no virtual memory after the array, no memory-mapped ports, and the address does not wrap) it is a valid optimization technique.
"Undefined behavior" does not really mean much, because according to ISO C, simply including a header which is not defined in the C standard, or calling a function which is not defined in the program itself or the C standard, are examples of undefined behavior. Undefined behavior doesn't mean "not defined by anyone on the planet" just "not defined by the ISO C standard". But of course, sometimes undefined behavior really is absolutely not defined by anyone.
Besides your own program, I don't think you will break anything, in the worst case you will try to read or write from a memory address that corresponds to a page that the kernel didn't assign to your proceses, generating the proper exception and being killed (I mean, your process).
Arrays with two or more dimensions pose a consideration beyond those mentioned in other answers. Consider the following functions:
char arr1[2][8];
char arr2[4];
int test1(int n)
{
arr1[1][0] = 1;
for (int i=0; i<n; i++) arr1[0][i] = arr2[i];
return arr1[1][0];
}
int test2(int ofs, int n)
{
arr1[1][0] = 1;
for (int i=0; i<n; i++) *(arr1[0]+i) = arr2[i];
return arr1[1][0];
}
The way gcc will processes the first function will not allow for the possibility that an attempt to write arr[0][i] might affect the value of arr[1][0], and the generated code is incapable of returning anything other than a hardcoded value of 1. Although the Standard defines the meaning of array[index] as precisely equivalent to (*((array)+(index))), gcc seems to interpret the notion of array bounds and pointer decay differently in cases which involve using [] operator on values of array type, versus those which use explicit pointer arithmetic.
I just want to add some practical examples to this questions - Imagine the following code:
#include <stdio.h>
int main(void) {
int n[5];
n[5] = 1;
printf("answer %d\n", n[5]);
return (0);
}
Which has Undefined Behaviour. If you enable for example clang optimisations (-Ofast) it would result in something like:
answer 748418584
(Which if you compile without will probably output the correct result of answer 1)
This is because in the first case the assignment to 1 is never actually assembled in the final code (you can look in the godbolt asm code as well).
(However it must be noted that by that logic main should not even call printf so best advice is not to depend on the optimiser to solve your UB - but rather have the knowledge that sometimes it may work this way)
The takeaway here is that modern C optimising compilers will assume undefined behaviour (UB) to never occur (which means the above code would be similar to something like (but not the same):
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int n[5];
if (0)
n[5] = 1;
printf("answer %d\n", (exit(-1), n[5]));
return (0);
}
Which on contrary is perfectly defined).
That's because the first conditional statement never reaches it's true state (0 is always false).
And on the second argument for printf we have a sequence point after which we call exit and the program terminates before invoking the UB in the second comma operator (so it's well defined).
So the second takeaway is that UB is not UB as long as it's never actually evaluated.
Additionally I don't see mentioned here there is fairly modern Undefined Behaviour sanitiser (at least on clang) which (with the option -fsanitize=undefined) will give the following output on the first example (but not the second):
/app/example.c:5:5: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:5:5 in
/app/example.c:7:27: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:7:27 in
Here is all the samples in godbolt:
https://godbolt.org/z/eY9ja4fdh (first example and no flags)
https://godbolt.org/z/cGcY7Ta9M (first example and -Ofast clang)
https://godbolt.org/z/cGcY7Ta9M (second example and UB sanitiser on)
https://godbolt.org/z/vE531EKo4 (first example and UB sanitiser on)

How am I writing on some spot of memory that I didnt allocated? [duplicate]

How dangerous is accessing an array outside of its bounds (in C)? It can sometimes happen that I read from outside the array (I now understand I then access memory used by some other parts of my program or even beyond that) or I am trying to set a value to an index outside of the array. The program sometimes crashes, but sometimes just runs, only giving unexpected results.
Now what I would like to know is, how dangerous is this really? If it damages my program, it is not so bad. If on the other hand it breaks something outside my program, because I somehow managed to access some totally unrelated memory, then it is very bad, I imagine.
I read a lot of 'anything can happen', 'segmentation might be the least bad problem', 'your hard disk might turn pink and unicorns might be singing under your window', which is all nice, but what is really the danger?
My questions:
Can reading values from way outside the array damage anything
apart from my program? I would imagine just looking at things does
not change anything, or would it for instance change the 'last time
opened' attribute of a file I happened to reach?
Can setting values way out outside of the array damage anything apart from my
program? From this
Stack Overflow question I gather that it is possible to access
any memory location, that there is no safety guarantee.
I now run my small programs from within XCode. Does that
provide some extra protection around my program where it cannot
reach outside its own memory? Can it harm XCode?
Any recommendations on how to run my inherently buggy code safely?
I use OSX 10.7, Xcode 4.6.
As far as the ISO C standard (the official definition of the language) is concerned, accessing an array outside its bounds has "undefined behavior". The literal meaning of this is:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
A non-normative note expands on this:
Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So that's the theory. What's the reality?
In the "best" case, you'll access some piece of memory that's either owned by your currently running program (which might cause your program to misbehave), or that's not owned by your currently running program (which will probably cause your program to crash with something like a segmentation fault). Or you might attempt to write to memory that your program owns, but that's marked read-only; this will probably also cause your program to crash.
That's assuming your program is running under an operating system that attempts to protect concurrently running processes from each other. If your code is running on the "bare metal", say if it's part of an OS kernel or an embedded system, then there is no such protection; your misbehaving code is what was supposed to provide that protection. In that case, the possibilities for damage are considerably greater, including, in some cases, physical damage to the hardware (or to things or people nearby).
Even in a protected OS environment, the protections aren't always 100%. There are operating system bugs that permit unprivileged programs to obtain root (administrative) access, for example. Even with ordinary user privileges, a malfunctioning program can consume excessive resources (CPU, memory, disk), possibly bringing down the entire system. A lot of malware (viruses, etc.) exploits buffer overruns to gain unauthorized access to the system.
(One historical example: I've heard that on some old systems with core memory, repeatedly accessing a single memory location in a tight loop could literally cause that chunk of memory to melt. Other possibilities include destroying a CRT display, and moving the read/write head of a disk drive with the harmonic frequency of the drive cabinet, causing it to walk across a table and fall onto the floor.)
And there's always Skynet to worry about.
The bottom line is this: if you could write a program to do something bad deliberately, it's at least theoretically possible that a buggy program could do the same thing accidentally.
In practice, it's very unlikely that your buggy program running on a MacOS X system is going to do anything more serious than crash. But it's not possible to completely prevent buggy code from doing really bad things.
In general, Operating Systems of today (the popular ones anyway) run all applications in protected memory regions using a virtual memory manager. It turns out that it is not terribly EASY (per se) to simply read or write to a location that exists in REAL space outside the region(s) that have been assigned / allocated to your process.
Direct answers:
Reading will almost never directly damage another process, however it can indirectly damage a process if you happen to read a KEY value used to encrypt, decrypt, or validate a program / process. Reading out of bounds can have somewhat adverse / unexpected affects on your code if you are making decisions based on the data you are reading
The only way your could really DAMAGE something by writing to a loaction accessible by a memory address is if that memory address that you are writing to is actually a hardware register (a location that actually is not for data storage but for controlling some piece of hardware) not a RAM location. In all fact, you still wont normally damage something unless you are writing some one time programmable location that is not re-writable (or something of that nature).
Generally running from within the debugger runs the code in debug mode. Running in debug mode does TEND to (but not always) stop your code faster when you have done something considered out of practice or downright illegal.
Never use macros, use data structures that already have array index bounds checking built in, etc....
ADDITIONAL
I should add that the above information is really only for systems using an operating system with memory protection windows. If writing code for an embedded system or even a system utilizing an operating system (real-time or other) that does not have memory protection windows (or virtual addressed windows) that one should practice a lot more caution in reading and writing to memory. Also in these cases SAFE and SECURE coding practices should always be employed to avoid security issues.
Not checking bounds can lead to to ugly side effects, including security holes. One of the ugly ones is arbitrary code execution. In classical example: if you have an fixed size array, and use strcpy() to put a user-supplied string there, the user can give you a string that overflows the buffer and overwrites other memory locations, including code address where CPU should return when your function finishes.
Which means your user can send you a string that will cause your program to essentially call exec("/bin/sh"), which will turn it into shell, executing anything he wants on your system, including harvesting all your data and turning your machine into botnet node.
See Smashing The Stack For Fun And Profit for details on how this can be done.
You write:
I read a lot of 'anything can happen', 'segmentation might be the
least bad problem', 'your harddisk might turn pink and unicorns might
be singing under your window', which is all nice, but what is really
the danger?
Lets put it that way: load a gun. Point it outside the window without any particular aim and fire. What is the danger?
The issue is that you do not know. If your code overwrites something that crashes your program you are fine because it will stop it into a defined state. However if it does not crash then the issues start to arise. Which resources are under control of your program and what might it do to them? I know at least one major issue that was caused by such an overflow. The issue was in a seemingly meaningless statistics function that messed up some unrelated conversion table for a production database. The result was some very expensive cleanup afterwards. Actually it would have been much cheaper and easier to handle if this issue would have formatted the hard disks ... with other words: pink unicorns might be your least problem.
The idea that your operating system will protect you is optimistic. If possible try to avoid writing out of bounds.
Not running your program as root or any other privileged user won't harm any of your system, so generally this might be a good idea.
By writing data to some random memory location you won't directly "damage" any other program running on your computer as each process runs in it's own memory space.
If you try to access any memory not allocated to your process the operating system will stop your program from executing with a segmentation fault.
So directly (without running as root and directly accessing files like /dev/mem) there is no danger that your program will interfere with any other program running on your operating system.
Nevertheless - and probably this is what you have heard about in terms of danger - by blindly writing random data to random memory locations by accident you sure can damage anything you are able to damage.
For example your program might want to delete a specific file given by a file name stored somewhere in your program. If by accident you just overwrite the location where the file name is stored you might delete a very different file instead.
NSArrays in Objective-C are assigned a specific block of memory. Exceeding the bounds of the array means that you would be accessing memory that is not assigned to the array. This means:
This memory can have any value. There's no way of knowing if the data is valid based on your data type.
This memory may contain sensitive information such as private keys or other user credentials.
The memory address may be invalid or protected.
The memory can have a changing value because it's being accessed by another program or thread.
Other things use memory address space, such as memory-mapped ports.
Writing data to unknown memory address can crash your program, overwrite OS memory space, and generally cause the sun to implode.
From the aspect of your program you always want to know when your code is exceeding the bounds of an array. This can lead to unknown values being returned, causing your application to crash or provide invalid data.
You may want to try using the memcheck tool in Valgrind when you test your code -- it won't catch individual array bounds violations within a stack frame, but it should catch many other sorts of memory problem, including ones that would cause subtle, wider problems outside the scope of a single function.
From the manual:
Memcheck is a memory error detector. It can detect the following problems that are common in C and C++ programs.
Accessing memory you shouldn't, e.g. overrunning and underrunning heap blocks, overrunning the top of the stack, and accessing memory after it has been freed.
Using undefined values, i.e. values that have not been initialised, or that have been derived from other undefined values.
Incorrect freeing of heap memory, such as double-freeing heap blocks, or mismatched use of malloc/new/new[] versus free/delete/delete[]
Overlapping src and dst pointers in memcpy and related functions.
Memory leaks.
ETA: Though, as Kaz's answer says, it's not a panacea, and doesn't always give the most helpful output, especially when you're using exciting access patterns.
If you ever do systems level programming or embedded systems programming, very bad things can happen if you write to random memory locations. Older systems and many micro-controllers use memory mapped IO, so writing to a memory location that maps to a peripheral register can wreak havoc, especially if it is done asynchronously.
An example is programming flash memory. Programming mode on the memory chips is enabled by writing a specific sequence of values to specific locations inside the address range of the chip. If another process were to write to any other location in the chip while that was going on, it would cause the programming cycle to fail.
In some cases the hardware will wrap addresses around (most significant bits/bytes of address are ignored) so writing to an address beyond the end of the physical address space will actually result in data being written right in the middle of things.
And finally, older CPUs like the MC68000 can locked up to the point that only a hardware reset can get them going again. Haven't worked on them for a couple of decades but I believe it's when it encountered a bus error (non-existent memory) while trying to handle an exception, it would simply halt until the hardware reset was asserted.
My biggest recommendation is a blatant plug for a product, but I have no personal interest in it and I am not affiliated with them in any way - but based on a couple of decades of C programming and embedded systems where reliability was critical, Gimpel's PC Lint will not only detect those sort of errors, it will make a better C/C++ programmer out of you by constantly harping on you about bad habits.
I'd also recommend reading the MISRA C coding standard, if you can snag a copy from someone. I haven't seen any recent ones but in ye olde days they gave a good explanation of why you should/shouldn't do the things they cover.
Dunno about you, but about the 2nd or 3rd time I get a coredump or hangup from any application, my opinion of whatever company produced it goes down by half. The 4th or 5th time and whatever the package is becomes shelfware and I drive a wooden stake through the center of the package/disc it came in just to make sure it never comes back to haunt me.
I'm working with a compiler for a DSP chip which deliberately generates code that accesses one past the end of an array out of C code which does not!
This is because the loops are structured so that the end of an iteration prefetches some data for the next iteration. So the datum prefetched at the end of the last iteration is never actually used.
Writing C code like that invokes undefined behavior, but that is only a formality from a standards document which concerns itself with maximal portability.
More often that not, a program which accesses out of bounds is not cleverly optimized. It is simply buggy. The code fetches some garbage value and, unlike the optimized loops of the aforementioned compiler, the code then uses the value in subsequent computations, thereby corrupting theim.
It is worth catching bugs like that, and so it is worth making the behavior undefined for even just that reason alone: so that the run-time can produce a diagnostic message like "array overrun in line 42 of main.c".
On systems with virtual memory, an array could happen to be allocated such that the address which follows is in an unmapped area of virtual memory. The access will then bomb the program.
As an aside, note that in C we are permitted to create a pointer which is one past the end of an array. And this pointer has to compare greater than any pointer to the interior of an array.
This means that a C implementation cannot place an array right at the end of memory, where the one plus address would wrap around and look smaller than other addresses in the array.
Nevertheless, access to uninitialized or out of bounds values are sometimes a valid optimization technique, even if not maximally portable. This is for instance why the Valgrind tool does not report accesses to uninitialized data when those accesses happen, but only when the value is later used in some way that could affect the outcome of the program. You get a diagnostic like "conditional branch in xxx:nnn depends on uninitialized value" and it can be sometimes hard to track down where it originates. If all such accesses were trapped immediately, there would be a lot of false positives arising from compiler optimized code as well as correctly hand-optimized code.
Speaking of which, I was working with some codec from a vendor which was giving off these errors when ported to Linux and run under Valgrind. But the vendor convinced me that only several bits of the value being used actually came from uninitialized memory, and those bits were carefully avoided by the logic.. Only the good bits of the value were being used and Valgrind doesn't have the ability to track down to the individual bit. The uninitialized material came from reading a word past the end of a bit stream of encoded data, but the code knows how many bits are in the stream and will not use more bits than there actually are. Since the access beyond the end of the bit stream array does not cause any harm on the DSP architecture (there is no virtual memory after the array, no memory-mapped ports, and the address does not wrap) it is a valid optimization technique.
"Undefined behavior" does not really mean much, because according to ISO C, simply including a header which is not defined in the C standard, or calling a function which is not defined in the program itself or the C standard, are examples of undefined behavior. Undefined behavior doesn't mean "not defined by anyone on the planet" just "not defined by the ISO C standard". But of course, sometimes undefined behavior really is absolutely not defined by anyone.
Besides your own program, I don't think you will break anything, in the worst case you will try to read or write from a memory address that corresponds to a page that the kernel didn't assign to your proceses, generating the proper exception and being killed (I mean, your process).
Arrays with two or more dimensions pose a consideration beyond those mentioned in other answers. Consider the following functions:
char arr1[2][8];
char arr2[4];
int test1(int n)
{
arr1[1][0] = 1;
for (int i=0; i<n; i++) arr1[0][i] = arr2[i];
return arr1[1][0];
}
int test2(int ofs, int n)
{
arr1[1][0] = 1;
for (int i=0; i<n; i++) *(arr1[0]+i) = arr2[i];
return arr1[1][0];
}
The way gcc will processes the first function will not allow for the possibility that an attempt to write arr[0][i] might affect the value of arr[1][0], and the generated code is incapable of returning anything other than a hardcoded value of 1. Although the Standard defines the meaning of array[index] as precisely equivalent to (*((array)+(index))), gcc seems to interpret the notion of array bounds and pointer decay differently in cases which involve using [] operator on values of array type, versus those which use explicit pointer arithmetic.
I just want to add some practical examples to this questions - Imagine the following code:
#include <stdio.h>
int main(void) {
int n[5];
n[5] = 1;
printf("answer %d\n", n[5]);
return (0);
}
Which has Undefined Behaviour. If you enable for example clang optimisations (-Ofast) it would result in something like:
answer 748418584
(Which if you compile without will probably output the correct result of answer 1)
This is because in the first case the assignment to 1 is never actually assembled in the final code (you can look in the godbolt asm code as well).
(However it must be noted that by that logic main should not even call printf so best advice is not to depend on the optimiser to solve your UB - but rather have the knowledge that sometimes it may work this way)
The takeaway here is that modern C optimising compilers will assume undefined behaviour (UB) to never occur (which means the above code would be similar to something like (but not the same):
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int n[5];
if (0)
n[5] = 1;
printf("answer %d\n", (exit(-1), n[5]));
return (0);
}
Which on contrary is perfectly defined).
That's because the first conditional statement never reaches it's true state (0 is always false).
And on the second argument for printf we have a sequence point after which we call exit and the program terminates before invoking the UB in the second comma operator (so it's well defined).
So the second takeaway is that UB is not UB as long as it's never actually evaluated.
Additionally I don't see mentioned here there is fairly modern Undefined Behaviour sanitiser (at least on clang) which (with the option -fsanitize=undefined) will give the following output on the first example (but not the second):
/app/example.c:5:5: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:5:5 in
/app/example.c:7:27: runtime error: index 5 out of bounds for type 'int[5]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.c:7:27 in
Here is all the samples in godbolt:
https://godbolt.org/z/eY9ja4fdh (first example and no flags)
https://godbolt.org/z/cGcY7Ta9M (first example and -Ofast clang)
https://godbolt.org/z/cGcY7Ta9M (second example and UB sanitiser on)
https://godbolt.org/z/vE531EKo4 (first example and UB sanitiser on)

Dangers of pointers

I'm trying to learn C. Since I have already some familiarity with higher level languages (PHP, Javascript, Python) I feel most of the work I have to do involves learning how to replace structures which I would give for granted (say variable sized arrays) through the use of pointers, and manually managing memory. My problem is that I am a bit worried about playing with pointers.
Usually I try to experiment with other language features, but my problem is that a bad use of pointers could have unexpected results. In particular: is it possible - if I make a mistake - that I may corrupt memory segments that other programs are using, causing those programs to misbehave? Or will the operating system, (in my case various flavours of Ubuntu) prevent me to intefere with the memory assigned to different processes?
In the former case I guess it would be possible (though unlikely) that I may cause other programs to write bad data on the disk, corrupting some information I have on the hard drive. Or even worst (and even more unlikely, I guess) it could damage some hardware - for instance older monitors could be burned by software which would set an out of range refresh rate.
I know that probably my worries are not justified, but I would like to know how far the compiler/operating system will prevent me from doing dangerous operations when I make a mistake managing pointers.
In particular: is it possible - if I make a mistake - that I may corrupt memory segments that other programs are using, causing those programs to misbehave?
Not on most (all?) major operation systems available today. Memory protection has been a feature of most Unix/Linux systems, Windows, and Mac OS for a decade. Memory protection is an OS level access control system that prevents programs writing to memory that doesn't belong to them. This is both, as you suggest, to prevent software writing into memory that belongs to other processes, but also to prevent software from reading memory that doesn't belong to it (a major security risk).
That's not to say it's something you may not have to worry about in the future, but if you're starting off learning C on a modern desktop it's something you shouldn't really think about. If you make a mistake in your C code you're probably not going to cripple the OS! :)
It's a very interesting topic, and one I think everyone would benefit from knowing about. You'll almost certainly run into situations when you're learning when you try to access memory that doesn't belong to you and your process is terminated by the system. Check out these two wiki articles for more information:
http://en.wikipedia.org/wiki/Buffer_overflow and
http://en.wikipedia.org/wiki/Memory_protection
Most modern, desktop-oriented operating systems (including Linux) use virtual memory to prevent a misbehaving program from disrupting other programs. Each process gets its own address space, and if you overflow a buffer or similarly misuse pointers, about the worst that can happen is you crash your process. You shouldn't affect other processes in the system unless you're writing a device driver or running as root (and if you're running as root, you'd have to do something bad to a file another process is reading, etc... you still wouldn't have direct access to another process's memory).
In terms of catching errors before you make them, here are some rules to follow that can help you:
Always check any newly-received pointers (such as return values of malloc, function arguments, etc.) for null before using them.
Turn your compiler's warnings all the way up and pay attention to them.
After freeing dynamically allocated memory, set the pointer to null. This should help you catch some dangling pointer or double-free errors. (Of course, if you've copied that pointer and stored it in other places in your code, you could still have these problems unless you're careful.)
Corruption and crashes are possible, but as long as you are careful with the size of value in memory the pointer points to, and you are sure you don't interchange pointers, and you are sure you DON'T HARDCODE MEMORY ADDRESSES you should be fine.
You cannot affect other processes except for some rare OSes. You can obviously not damage any hardware. (If that would be possible, my iOS device would be burned a long time ago due to about every app in the app store contains memory faults because programmers are too lazy to read the damn documentation.)
Good example
int *i = malloc(sizeof(int));
// Use I further.
free(i);
Wrong example
int *i;
i = malloc(sizeof(int));
double j;
i = &j;
j = 3.1415// Mem corruption due to differently sized values, given a double is LARGER than an int.
(Not sure if this compiles. Depends on the compiler flags.)
Terrible example, 99% guaranteed corruption
int *i = 0x00ABCDEF; // Hard coded mem adress.
int j = 123;
int *k = &j;
memcpy(i, k, sizeof(int));
In most modern operating systems (Linux included) each process runs in a separate address space, which means that all the memory that it can address is not real RAM, but its own private "memory sandbox", that do not affect other applications (actually it could if you had RW memory segments shared with other applications you could, but you and the other application have to create them explicitly).
By the way, often if you do mistakes with pointers you end up with pointers which point to invalid memory locations, and the first time you'll try to dereference them the OS will shut down your application with a segmentation fault; don't worry about it, it's just the OS telling you you're doing mess with pointers. :) The most serious problems, on the other hand, are those where the OS can't detect that you got some pointer stuff wrong.

Is it possible to write a program in C that does nothing - not even taking up memory?

This is a tricky C question asked in interview: Write a program that does nothing, not even taking up memory.
Is it possible to do so?
All programs use memory. When you run the program, the OS will set up an address space for the program, copy its arguments into its process space, give it a process ID and a thread, give it some file descriptors for I/O, etc. Even if your program immediately terminates you still use up this memory and CPU time.
No its not possible. The code and stack must go somewhere and that will, nearly always, be in memory.
Ignoring that surely its pretty easy to just write an application that exits straight away.
your response should be along the lines of enquiring as to 'why' you'd want to do such a thing. this would show a latitude for thinking beyond the question.
On the surface the question seems to have a simple answer: "No, it can't be done." #templatetypedef has given some reasons.
But perhaps the point of the question is to see how you address it. You might get "marks" for asking "what kind of memory" or for observing some of the points that #templatetypedef made. Or for showing the empty main() method given by #Mihran Hovsepyan and then explaining that some memory will be involved even in this minimal case.
Although there will be some memory allocated by OS when you launch a program, most people don't know that main() is not the real program entry point. mainCRTStartup is, at least on Windows console app. If you create a program with real entry point you will avoid heap initialization routines, command argument parsing, global variable initialization and so on.
So, in some sense, you can make a program that avoids heap management and stuff. But OS will still read it into memory.
See: http://www.catch22.net/tuts/minexe
Empty program is a program, isn't it?
Below is my no resource use program :)
Also note that. Strictly speaking, a program really don't consume any resource until OS load it and make it run. When this happen we call it a Process.
The correct answer is that it's implementation-specific. An implementation could support null programs and the execve (or equivalent) mechanism could perform the equivalent of _Exit(0) when it encounters one, but in practice it doesn't.

c runtime error message

this error appeared while creating file using fopen in c programming language
the NTVDM cpu has encountered an illegal instruction CS:0000 IP0075
OP:f0 00 f0 37 05 choos 'close to terminate the operation
This kind of thing typically happens when a program tries to execute data as code. In turn, this typically happens when something tramples the stack and overwrites a return address.
In this case, I would guess that "IP0075" is the instruction pointer, and that the illegal instructions executed were at address 0x0075. My bet is that this address is NOT mapped to the apps executable code.
UPDATE on the possible connection with 'fopen': The OP states that deleting the fopen code makes the problem go away. Unfortunately, this does not prove that the fopen code is the cause of the problem. For example:
The deleted code may include extra local variables, which may mean that the stack trampling is hitting the return address in one case ... and in the other case, some word that is not going to be used.
The deleted code may cause the size of the code segment to change, causing some significant address to point somewhere else.
The problem is almost certainly that your application has done something that has "undefined behavior" per the C standard. Anything can happen, and the chances are that it won't make any sense.
Debugging this kind of problem can be really hard. You should probably start by running "lint" or the equivalent over your code and fixing all of the warnings. Next, you should probably use a good debugger and single step the application to try to find where it is jumping to the bad code/address. Then work back to figure out what caused it to happen.
Assuming that it's really the fopen() call that causes problems (it's hard to say without your source code), have you checked that the 2 character pointers that you pass to the function are actually pointers to a correctly allocated memory?
Maybe they are not properly initialized?
Hmmm.... you did mention NTVDM which sounds like an old 16 bit application that crashed inside an old command window with application compatibility set, somehow. As no code was posted, it could be possible to gauge a guess that its something to do with files (but fopen - how do you know that without showing a hint?) Perhaps there was a particular file that is longer than the conventional 8.3 DOS filename convention and it borked when attempting to read it or that the 16 bit application is running inside a folder that has again, name longer than 8.3?
Hope this helps,
Best regards,
Tom.

Resources