What Can Cause a C Program to Crash Operating System - c

I recently found that a fairly large image manipulation program I'm writing in C on a Windows 8 machine has a bug when used in very particular circumstances. Unfortunately, the bug is causing my entire computer to come to a standstill so that my only option is to pull the plug on the computer (especially annoying when I'm working remotely...)
Because it's an image manipulation program, I can't just flood it with print statements to isolate the problematic section - the problem occurs somewhere in a loop that's called billions of times, so adding a printf slows it down to the point that it would take days to get to a failing iteration.
I understand, therefore, if this question is too broad, as it isn't really reasonable for me to put down all of the code that could cause my problem, I'm simply asking
What are the circumstances in which C code can, instead of seg faulting or halting the program, actually freeze the entire OS
When I search the problem, I see code golf questions like this
A C program which crashes the system(shuts down the system)
This is not what I'm asking - obviously I haven't written system("shutdown") anywhere in my loop.
Being most familiar with python and java, this problem is not what I'm used to, but in my experience,
Dividing by zero produces a seg fault
Accessing memory by accident that is slightly outside an intended array causes a seg fault (sometimes down the road a little)
Accessing protected memory causes the program to hang
Stack overflow causes a seg fault
Dereferencing a non-initialized pointer causes a seg fault
Is this impression false - could those cases cause the whole system to crash? What cases am I missing? Is it dependent on my version of gcc, or my permission status?
I haven't been able to try to reproduce it on a different operating system yet, as it requires a few dependencies to run the entire program.
If my only option is to sit for days waiting for the program to run with print statements, or avoid weird situations, then, of course, so be it. I'm looking for key places to look for the bug.

On modern systems with hardware-enforced privilege separation between user-mode and kernel-mode, and an operating system that functions to correctly configure these mechanisms, you simply cannot crash the system from a user mode process.
Any of those errors are trapped by the CPU, which call exception handlers in the OS which will quickly pull the plug on your system.
If I had to guess, a piece of hardware is overheating or malfunctioning:
Overheating CPU due to poor thermal conductivity with heatsink
Failing / under-sized power supply
Failing DIMMs
Failing hard drive
Failing CPU
Failing / overheating GPU
I've seen cryptocoin-mining software bring a system to its knees because it was pushing the limits of the GPU. When the card would lock-up/reset, the driver would get confused or lock-up, and the system would end up needed rebooted.
Your system is doing next to nothing when you're just sitting there browsing the web, etc. But if your system locks up when you start running a CPU-intensive application, it can bring out problems that you didn't know where there.
While this is a little out-of-place on Stack Overflow, it falls into one of those grey areas between hardware and software. I would stress-test your system, keeping an eye on CPU/GPU/memory temperatures, and power supply voltages. Check out MemTest86, Stresslinux.

The most trivial cause of OS freezing is "memory full". If you have processes that use a lot of memory, then your system is going to swap from main memory (typically RAM) to secondary memory (typically disk) which lead to a very huge overhead... As a user what you usually observe is a almost freezed computer, sometimes so freezed that you think it is crashed. If your OS is badly designed then it sometimes crashes!

Related

Making mistakes in C in a safer manner?

Firstly, I'm a novice.
Secondly, I once ran a C program that tries to modify an OOB array element. I did it in my browser and the website handled the segmentation fault without damaging the environment.
But I'm afraid to do the same thing on my laptop as it might break stuff or corrupt pieces of data.
So, how can I make these kind of grave mistakes in C without damaging my PC or crying afterwords?
Any kind of solution will be considered: VM or whatever.
Lastly, thanks.
Do not worry! Modern os protects you! When a segmentation fault happens, it is a signal from the memory management unit, saying: this is not your memory, you are trying to read/write/execute. It will interrupt your program before anything can happen. If you look deeper into how memory works, you will see that nothing can happen, because memory pages of other processes are not mapped to your process. But there is a scary thing. If you write to memory of your process, where you don't want to write, but you are allowed to, weird things will happen. But don't worry, usually they wont do anything are are even detected by your os (stack smashing).
Also if you are using Windows 95 or some other weird thing, you have to be careful, because some old os allow you to write to your text segments (the loaded code).
An exception is the so called kernel space. There anything can happen.
EDIT:
I am not saying it is ok to make your programs crash in a segmentation fault in production. I want to say that it is not harmful for your machine.

Can bad C code cause a Blue Screen of Death?

I am a new coder in c, recently moved over from python, but still like to challenge myself with fairly ambitious projects (like a chess program), and have found that my computer suffers an unusual number of BSODs, both when I am running a program and not (admittedly, attempting to use the entirety of my memory as a hash table may not have been the greatest idea).
So my question is, are these most likely caused by my crappy c code, or is it more likely that my 3 year old, overworked laptop is the culprit?
If it could be the code, what are the big things I should avoid doing so as to prevent this?
BSOD usually contains some information as to what caused it.
What information it contains, and how exactly it is displayed depends on the version of Windows you are running.
As can be seen from the list here:
https://hetmanrecovery.com/recovery_news/bsod-errors
Most BSOD errors come from device / driver / kernel code, and not from your typical userland program.
That said, it might be possible to trigger BSOD if your code uses particularly low level windows API, especially if you run it with administrator privileges.
Note, that simply filling up memory will result in allocations for your program failing, and possibly your program, but not the whole OS crashing.
Also, windows does place limits on how much an individual process can allocate.
One final note:
"3 year old laptop" does not provide enough information to tell anything about your hardware, since there are different tiers of laptops available, and some of the high end 3 year old ones will still be better performing then a mid tier one bought yesterday.
As a troubleshooting measure, I would recommend backing up your data, making a clean install of your OS (aka "format the machine"), then making sure all your drivers are up to date.
You may also want to try hardware diagnostic tools, such as memtes86, check SMART on your storage, etc.
It's not supposed to be possible for anything you do in an ordinary "user space" program to crash the whole computer. Something else must also be wrong. Here are some possibilities:
If you are making the computer do CPU- and RAM-intensive work for long periods, you may stress the hardware to the point where a marginally defective component fails. Usually it's either the RAM, the power supply, or the cooling fans at fault.
Make sure your power supply is rated for all of the kit you have, running simultaneously. Make sure you have enough airflow for the amount of heat you're generating; check for dust-clogged heatsinks and fans that aren't actually spinning. If you have more than one RAM stick, take one out at a time and see if that makes the problem disappear.
I'd like to tell you to get error-correcting RAM if you don't have it already, but for infuriating market differentiation reasons you'd have to replace the motherboard and CPU as well. It's still worth doing, in the long run, but it amounts to replacing the whole computer.
You may be tickling a bug in the OS or the drivers. The most probable culprit is the GPU driver, particularly if your program does anything graphical. Regrettably, all you can do about this is make sure you're fully patched up.

Has programming with C gotten easier with operating system security and execute disable?

I understand that in the past with C you could screw up pointers and memory allocation, and potentially accidentally corrupt other running programs or the operating system itself outside of your program, and crash the system. This would require a restart to pick up the pieces and continue with program development.
Have system security improvements stopped this from happening?
In the past with MSDOS and Windows 3.1/95/98/Me, and MacOS prior to version 10, (generally before preemptive multitasking became the norm for everything) system security generally did not exist. Programs had full control to write data anywhere at any time.
But now with more modern system design and process security, programs generally are blocked by system security from accidentally or intentionally damaging anything else.
The execute-disable feature of modern processors may also be helping with preventing accidentally jumping to a random memory location and running whatever is there as processor machine code.
So how badly can you screw up with modern programming with C without attempting to hack the operating system security?
Can you still manage to accidentally crash the whole system? I assume this is no longer possible. The kernel or other system security steps in and halts the action.
Can you corrupt your login environment and have to log out and back in again? I assume this too is prevented, as processes should not normally have access to other process memory space, even within your own login security environment.
In general it seems like programming in C may now be much easier than it was in the past just due to these system protections that are now used everywhere, to keep you from shooting yourself and the system in the foot.
In the realm of what you can accidentally do, it has certainly gotten easier than the MS-DOS days. I remember a bug that corrupted the in-memory disk cache. I was lucky to have anything left after that one.
Now, unless you're writing C code that runs inside the kernel, that's not possible to do anymore. Nor is crashing the OS in general unless you are actively trying to exploit a flaw.
The various other things, like the NX bit and other such attempts to erect a bit of a security fence around C programs, they do make your program crash faster and at a place closer to where the error really happened. But they aren't anywhere near the level of win that you got with simple separated address spaces. They are designed to make active attempts to exploit things much more difficult, and they're far better at this than they are at catching accidents or mistakes.
And corrupting the login environment as distinct from the OS as a whole is also generally not statistically likely. Though if your program is doing sophisticated file manipulation you could have a bug that messed up the user's files.
And, short of actually crashing the OS you can accidentally cause resource exhaustion. And while this can often be recovered from, it's very uncomfortable while it's going on. Your system slows to a crawl and may not even be able to launch processes. Linux has some protection against this. And if you put your development environment in a cgroup, you can prevent it completely.
Of course, anything that an active exploit could do, you could do by mistake. But, I'm talking about the statistical likelihood of doing these things by accident.
Probably the biggest improvement since separated address spaces are tools like Valgrind that monitor your program on-the-fly for out-of-bounds acesses or accesses to freed memory and the like.
MS-DOS and early Windows were rather weird for their use as general purpose computers with high-quality C development environments and also having such promiscuous memory. It's taken Windows a long time to outgrow that, and programming practices on the platform are still a little weird.
Short answer: yes.
On computers that use virtual memory and keep data and code separate, it is much harder to write catastrophic bugs as you'll get a hardware exception (that the OS translates into something softer). So you can't have bugs that run off to overwrite your own executable code, or have runaway code that starts executing random "op codes" from the data segment. The bug will still be there of course, and needs to be fixed. But it will be a whole lot less mysterious and not nearly as disastrous.
Computers that don't have these security features require more care and testing by the programmer.
Crashing the whole system is still quite possible, but mostly this is because of bugs in the OS and API. The occasional "blue screen of death" in Windows is still a thing. With some effort you could also lag down the whole computer by using 100% CPU or by allocating excessive amounts of heap memory, turning it next to useless.

Debug printf hanging kernel — possible causes and solutions?

There are numerous parts in kernel where putting printf can lead to hanging of kernel (especially at early boot). (On a side note, it comes from my experience the same situation can have place with printk at early stages of boot of Linux). It can have numerous reasons, but sometimes reason is not obvious. You just put printf in a function that there is no way it is executed more than once during boot (or at least just a few times, not hundreds of times). It could still have some timing issues?
What are the typical causes of kernel boot hanging by adding printf in code?
As far as it comes for solution, you can probably create your own data structure and push your "stack trace" there and then print it at some point. But finding proper point to print it could be problematic. Can there be any possible policy to apply in such problematic early stages of boot? Or we are doomed to do detailed step by step analysis every time we encounter such a problem?

How to debug memory issues in embedded application

I'm new to embedded programming but I have to debug a quite complex application running on an embedded platform. I use GDB through a JTAG interface.
My program crashes at some point in an unexpected way. I suppose this happens due to some memory related issue. Does GDB allow me to inspect the memory after the system has crashed, thus being completely unresponsive?
It depends on your setup a bit. In particular, since you're using JTAG, you may be able to set your debugger up to halt the processor when it detects an exception (for example accessing protected memory illegally and so forth). If not, you can replace your exception handlers with infinite loops. Then you can manually unroll the exception to see what the processor was doing that caused the crash. Normally, you'll still have access to memory in that situation and you can either use GDB to look around directly, or just dump everything to a file so you can look around later.
It depends on what has crashed. If the system is only unresponsive (in some infinite loop, deadlock or similar), then it will normally respond to GDB and you will be able to see a backtrace (call stack), etc.
If the system/bus/cpu has actually crashed (on lower level), then it probably will not respond. In this case you can try setting breakpoints at suspicious places/variables and observe what is happening. Also simulator (ISS, RTL - if applicable) could come handy, to compare behavior with HW.

Resources