Segmentation Fault in a multi-threaded server - c

I have been developing a multi-threaded server (using Pthreads) for a network for about 2 months now, under Linux (Ubuntu 11.04 64-bit kernel 2.6.38).
The code is about 7000 lines of C at the moment. I have been using it in the network where multiple clients connect to it and get served. It has been running quite smoothly.
Suddenly I am facing a bit of strange problem. Every now and then (about 1 out of 10 times) the server crashes due to segmentation fault. I have looked all over the code but can not seem to find the actual reason behind this. Can anyone guide me on this as to what may be going wrong here or what things I should try to find the actual bug?

Enable core file generation. When the application crashes, load up the debugger
run your application using valgrind with memory check
write unit tests. Lots of them, and increase coverage to 100%.
stress test your application using valgrind's hellgrind to test multithreaded applications

100% coverage isn't realistic, but 85%-95% can reasonably happen with diligence.
About why weird errors happen:
http://stromberg.dnsalias.org/~strombrg/checking-early.html
You said this started happening suddenly. Hopefully you've been using a source code control system like Mercurial or Git or SVN. If you have (or perhaps you have nightly backups?), you probably should look back at the changes made at about the time the problems started, trying to find the error, which is likely an undefined memory reference.

Related

How do you debug the bug that only appears when the load is huge?

We are currently developing a cluster manager software in C. If several nodes connect to the manager, it works perfect, but if we use some tools to simulate 1000 nodes to connect the manager, it will sometimes work in unexpected ways.
How can one debug this kind of bug? It only appears when the load(connection/nodes) is large?
If I use gdb to debug step by step, the app never malfunctions.
How to debug this kind of bug?
In general, you want to use at least these techniques:
Make sure the code compiles and links without warnings. The -Wall is a good start, but -Wextra is better.
Make sure the application has designed-in logging and tracing, which can be turned on or off, and which has sufficient details to debug these kinds of issues, and low overhead.
Make sure the code has good unit-test coverage.
Make sure the tests are sanitizer-clean.
there's also no warning in valgrind check.
It's not clear whether you've simply ran the target application under Valgrind, or whether you also have the unit tests, and the tests are Valgrind-clean. It's also not clear whether you've observed the application mis-behavior under Valgrind or not.
Valgrind used to be the best tool available for heap and unintialized memory problems, but in 2017 this is no longer the case.
Compiler-based Address, Thread and Memory sanitizers catch significantly wider class of errors (e.g. global and stack overflows, and data races), and you should run your unit tests under all of them.
When all of the above still fails to find the problem, you may be able to run the real application instrumented with sanitizers.
Lastly, there are tools like GDB tracing and systemtap -- they are harder to learn, but give you significant power. Overview here.
Sadly the debugger is less useful for debugging concurrency/load issues.
Keep adding logs/printfs, trigger the issue with load testing, then try to narrow it down with more logs/printfs. Repeat.
The faster it is to trigger the bug the faster this will converge. Also prefer the classic "bisection" / "binary search" technique when adding logs - try to narrow down the areas you're looking at by at least half every time.

CLion uses system memory excessively

I recently started to use CLion, on Windows 7 64-bit, for editing C files.
One thing that bothers me a lot is that it uses too much system memory. It doesn't cause out of memory error as asked in another question. Actually CLion shows much lesser memory consumption in IDE (~500 mb out of ~2000 mb) than it takes from system (~1000 mb). You can see a snapshot of the system memory usage and CLion's memory display below:
I use CLion not for C++ but for C projects. My project isn't that big (~5 c files < 300 lines and ~10 h files). I don't use it to compile the project, I just use it for editing. And during the snapshot there was no user program running by it. And CLion wasn't showing any processes running (indexing etc). It is a general behaviour.
I'm not sure if what I experience is something expected/normal, or it is caused because of my system setup, project settings or the way I use the IDE.
Is there any known causes for excessive memory usage? Can you suggest practices to decrease memory usage?
The post is 2 years old, but I am also having this issue with CLion 2018.1, and I imagine, others do, too. Some tips that worked for me:
Excluding directories from indexing.
Deleting source files I don't need.
Resolving a circular dependency between two classes. (Note: I can't vouch it was exactly that, because I tried several things at once, and it seems odd that such a powerful IDE would be affected by such an issue, but I can't rule it out.)
If it's really bad, the indexing can be paused. Guaranteed to reduce the memory usage. Of course, the intelligent completion won't work then.
Currently the RAM usage is stable at ~1 Gb with RocksDB, RapidJson, and ~50 classes.
UPDATE: tweaking clion64.exe.vmoptions reduced the consumption radically.
Same issue here. I haven't used CLion just sitting there so that I do not have to open again, 2 projects few files open, nothing major, still eating up +3GB is not something that I can accept, switching back to Sublime, that works fine, as others have mentioned I am using it only for editing/refactoring, compilation happens in Terminal.
(PyCharm has similar issues)
CLion need to index and support all information about the system headers to provide you smart completion, auto-import and symbol resolution. Your project is the smallest part of code base for analyzing.
I have heard about version 2020.3, which brings option to switch off refreshing files.
https://intellij-support.jetbrains.com/hc/en-us/community/posts/360007093580-How-to-disable-refreshing-files-after-build
Unfortunately I cannot try it out in my professional development environment.

Profiling a code running as a service in linux

May be this question is noobish but I am not well versed with unix environment and profiling.
I want to profile a server code written in C running on Ubuntu as a service (I start it with service command). Once it is started it listens for request and then performs some operation.
I am not able to understand how exactly to do the profiling using the tools like gprof, valgrind and sprof.
I have tried all three but not able to generate any log.
I tried valgrind but it just executes, doesn't wait for the actual request to come.
Used gprof and sprof but no files are being generated.
I looked at several examples on SO and other sites but they talk about a sample code which generates an executable which is then run.
I really need some help now.
Thanks

How to track down exceptional bugs in application when released?

When an application causes a serious segment-fault issue, which is hard to find or track. I can use a debug version and generate a core dump file when issue happens. And debug this app with core-dump file.
But how to track down exceptional bugs in application when released? There seems to be no core-dump file in release version. Although log is an option, it is useless when there is a hard to track bugs happens.
So my question is how to track down those hard to track bugs in release version? Any suggestions or technology out there available?
Following reference may help the discussion.
[1] Core dump in Linux
[2] generate a core dump in linux
[3] Solaris Core dump analysis
You can compile a release version with gcc -g -O2 ...
The lack of core dump is related to your user's setting of resource limits (unless the application is explicitly calling setrlimit or is setuid; then you should offer a way to avoid that call). You might teach your users how to get core dumps (with the appropriate bash ulimit builtin).
(and there is some obscure way to put the debugging information outside of the executable)
The distributions provide -dbg packages that provide debugging symbols for programs. They are built along with the binary packages and can provide your users the ability to generate meaningful backtraces from core dumps. If you build your packages using the same utilities, you can get these -dbg packages for your own software "nearly free".
I suggest to use a crash reporting system, in my experience we use google's break-pad project for our windows client program, of course you can write your own.
Google break-pad is an open-source multi-platform crash reporting system, it can make mini or full memory dump when exception or crash happen, then you can config it to upload the dump file and any additional files to a specific ftp server or http server, very help to find bug.
Here is the link:
Google Break-pad
Ask the "customer" for a description of what he or she did to make it crash, and try to replicate it yourself with your own version that has debug information.
The hard part is getting correct information from the customer. Often they will say they did nothing special or nothing different than before. If possible, go see the person having the problem, and ask them to do what they do to make the program crash, writing down every step.

Tools/techniques for diagnosing C app crash on Windows

I have written an application in C, which runs as a Windows service. Most users can run the app without any problems, but a significant minority experience crashes caused by an Access Violation, so I know I have a bug somewhere. I have tried setting up virtual machines to mirror the users' configurations as closely as possible, but cannot reproduce the issue.
My background is in Java - when a Java app crashes it will produce a stack trace showing exactly where the problem occurred, but native applications aren't so helpful. What techniques are normally used by C developers for tracking down this type of problem? I have no physical access to the users' machines that experience the crash, but I could send then additional tools to install, to capture information. I also have Windows error reports showing Exception Code/Offset etc but these don't mean much to me. I have compiled my application using gcc - are there some compiler options I can use to generate more information in the event of a crash?
You could try asking the users to run ProcDump to capture a core dump when the program crashes. Unlike using something like Visual Studio it's a single, simple command-line utility so there should be no problem getting the users to run it.
On most modern operating systems your app can install a crash handler that'll walk the stack(s) in the event of a crash. I have no experience doing this on Windows, but this article walks through how to do it.

Resources