First : Appologize for my bad english.
Sorry for this newbie software question, but I got lost with my own logic...
A bit background :
I am working on a C networking project, where I am trying to generate a server that receive gradually increasing UDP message within the increasing time. I am trying just to simple "manager" on this server that is able to send a report to a specific address when it is crashing.
The thing that come in mind is that I set this manager as a listener in the server side. So if the server does not receive any message within the predefined port, I assume the server fails.
But, this thing is not -somehow- a deterministic approach. How long should I specify the time if this server crash? (if in 5 minutes no message is received in the port, does it mean it is crashing? not necessarly true. I can again increase it to 10 mins, buat again, this is unjustiable and inconsistent)
I am thinking how an app like gdb can do this. If the server(framework) crash, it will automatically generate a coredump file. I need to do a similar thing like this, so when the framework crash, it will as easy as print a "hello crash". How to create a "manager" on the server that can give me a report if the server crash (using C )
Any idea would be greatly appreciated
Thank you so much
The exit code of a process tells you if a signal caused it to exit. You can write a C program and use wait() to get the exit code or do it in a shell script:
#!/bin/sh
./server "$#"
EXIT=$?
if [ $EXIT -eq 0 ]
then
echo exit success
else
if [ $EXIT -ge 128 ]
then
echo exited with signal $(($EXIT - 128))
else
echo exited with code $EXIT
fi
fi
You could choose to restart the server for the failure case or the signal case.
Most servers rely on careful debugging and do not expect to automatically catch and restart when they crash.
Related
For a C application accessed via CGI-BIN, documentation online for accessing the process and breaking in GDB relies on manipulating the source code (i.e. adding an infinite loop), in order for the process to be available long enough for a developer to attach, exit the loop, and debug.
Is it feasible that a tool could monitor the process list, and attach via GDB, immediately breaking in order for a developer to achieve this without requiring source code changes?
The rough structure of what I have in mind to develop is something along the lines of:
1. My process monitors the process list on the system.
2. A process matching the name of my application, and owner Apache appears in the list.
3. My process immediately performs a 'pgrep' and 'gdb -p' command, then sending a break-point command to pause the process.
4. The developer can then access the process and look at the flow of execution.
Is this feasible as an idea or not possible due to some constraints (i.e. a race condition which may not always be fufilled?)
Is this feasible
Sure: a trivial shell script will do:
while true; do
PID=$(pgrep my_app)
if [[ -n "$PID" ]]; then
gdb -p "$PID"
fi
done
a race condition
The problem is that between pgrep and gdb -p the application may make significant progress, or even run to completion.
The only way to avoid that is to intercept all execve system calls on the system, as Tom Tromey's preattach.stp does.
I made a simple client-server program in c using sockets and now i want to test it ,by simulating many clients connecting to the server at the same time!I wrote a script to execute the client: ./client 20 times but it didn't work for me since it waited for each client to finish.
Also i wrote another program in c ,this time with threads so it could execute each client with system(./client) and then detach the thread ,but again i had the same problem!
So what is the correct way to implement this?
The easiest solution is to do your shell script, but put an & after the ./clientk call, which will put it in the background and run the next command immediately
Here's a really simple way to launch a number of clients without waiting for each to complete:
#!/bin/bash
for i in $(seq 0 20)
do
./client &
done
wait
I've got what I think is a strange one here. I have the following environment.
A Linux compiled binary which sets up a signal handler to disable things like Ctrl+C, Ctrl+z, etc. This is done by calling signal on: SIGINT, SITTSTP and SIGQUIT. The signal handler simply prints an error message that user is not allowed to abort the program.
After setting up the signal handler, the binary calls an interactive ash script.
This interactive ash script ALSO disables all methods of breaking out of the script. It does this with "trap '' INT TSTP" at the very beginning. This works and if one enters Ctrl+C, etc it simply echoes the control character to the terminal but does not exit.
Individually both the binary and ash script prevent user from exiting.
However, notice what happens below:
Allow control to be returned to the binary by normal completion of the interactive shell script. Once control returns to the binary, entering Ctrl+C works and does not allow user to break out of the program. This is proper behavior.
Where it is wrong is:
Type a few Ctrl+C's during the time the interactive shell script is running and once control returns to the binary, the exit code is changed to something other than what the shell script is doing.
Here is an example:
In C code, let's say I have:
void sigintHandler(int sig_num)
{
fprintf(stderr, "You are not allowed to exit this program.\n");
return;
}
void main(void)
{
signal(SIGINT, sigintHandler);
int ret = system("/etc/scripts/test.sh");
printf("test.sh returned: %d exit status.\n", ret);
}
And in test.sh I have:
#!/bin/ash
# Disable interrupts so that one cannot exit shell script.
trap '' INT TSTP
echo -n "Do you want to create abc file? (y/n): "
read answer
if [ $answer == "y" ];then
touch /tmp/abc
fi
if [ -f /tmp/abc ]; then
echo "Returning 1"
exit 1
else
echo "Returning 2"
exit 2
fi
If I run the C binary normally I get the correct exit status (1 or 2) depending on whether file exists. Actually I get 256 or 512 which indicates it is storing the exit code in the 2nd byte. Point is this works consistently every time.
But now if I hit Ctrl+C while the shell script is running (before answering the question presented) and say I answer "n" which is exit code of 2. In the C binary the code I get back is sometimes 2 (not 512, indicating the exit code is now in the LOWER byte) but MORE often I get back a code of 0! This happens even though I see the message "Returning 2" which is echoed by the shell script.
This is driving me nuts trying to figure out why a simple exit code is being messed up.
Can anyone provide some suggestions?
Thanks much
Allen
I found the issue.
Previously I was using trap '' INT TSTP to disable interrupts in the shell script. Though this works to prevent shell script from being aborted it led to the issue in this post. I suspect that in disabling the ability to abort the shell script in this way, the upper level shell framework was not aware of this and all it knew is that Ctrl+C or whatever was pressed and returned SIGINT as the exit code despite what the shell script itself was exiting with.
The solution is to use:
stty -isig
at the beginning of the shell script.
This not only disables interrupts but ALSO lets the upper level framework know that this is what you've done so that it ignores the fact that Ctrl+C was pressed.
I found this information on the following page:
https://unix.stackexchange.com/questions/80975/preventing-propagation-of-sigint-to-parent-process
Thanks everyone,
Allen
So I have this old, nasty piece of C code that I inherited on this project from a software engineer that has moved on to greener pastures. The good news is... IT RUNS! Even better news is that it appears to be bug free.
The problem is that it was designed to run on a server with a set of start up parameters input on the command line. Now, there is a NEW requirement that this server is reconfigurable (didn't see that one coming...). Basically, if the server receives a command over UDP, it either starts this program, stops it, or restarts it with new start up parameters passed in via the UDP port.
Basically the code that I'm considering using to run the obfuscated program is something like this (sorry I don't have the actual source in front of me, it's 12:48AM and I can't sleep, so I hope the pseudo-code below will suffice):
//my "bad_process_manager"
int manage_process_of_doom() {
while(true) {
if (socket_has_received_data) {
int return_val = ParsePacket(packet_buffer);
// if statement ordering is just for demonstration, the real one isn't as ugly...
if (packet indicates shutdown) {
system("killall bad_process"); // process name is totally unique so I'm good?
} else if (packet indicates restart) {
system("killall bad_process"); // stop old configuration
// start with new parameters that were from UDP packet...
system("./my_bad_process -a new_param1 -b new_param2 &");
} else { // just start
system("./my_bad_process -a new_param1 -b new_param2 &");
}
}
}
So as a result of the system() calls that I have to make, I'm wondering if there's a neater way of doing so without all the system() calls. I want to make sure that I've exhausted all possible options without having to crack open the C file. I'm afraid that actually manipulating all these values on the fly would result in having to rewrite the whole file I've inherited since it was never designed to be configurable while the program is running.
Also, in terms of starting the process, am I correct to assume that throwing the "&" in the system() call will return immediately, just like I would get control of the terminal back if I ran that line from the command line? Finally, is there a way to ensure that stderr (and maybe even stdout) gets printed to the same terminal screen that the "manager" is running on?
Thanks in advance for your help.
What you need from the server:
Ideally your server process that you're controlling should be creating some sort of PID file. Also ideally, this server process should hold an exclusive lock on the PID file as long as it is still running. This allows us to know if the PID file is still valid or the server has died.
Receive shutdown message:
Try to get a lock on the PID file, if it succeeds, you have nothing to kill (the server has died, if you proceed to the kill regardless, you may kill the wrong process), just remove the old PID file.
If the lock fails, read the PID file and do a kill() on the PID, remove the old PID file.
Receive start message:
You'll need to fork() a new process, then choose your flavor of exec() to start the new server process. The server itself should of course recreate its PID file and take a lock on it.
Receive restart message:
Same as Shutdown followed by Start.
I am working on an application where I need to detect a system shutdown.
However, I have not found any reliable way get a notification on this event.
I know that on shutdown, my app will receive a SIGTERM signal followed by a SIGKILL. I want to know if there is any way to query if a SIGTERM is part of a shutdown sequence?
Does any one know if there is a way to query that programmatically (C API)?
As far as I know, the system does not provide any other method to query for an impending shutdown. If it does, that would solve my problem as well. I have been trying out runlevels as well, but change in runlevels seem to be instantaneous and without any prior warnings.
Maybe a little bit late. Yes, you can determine if a SIGTERM is in a shutting down process by invoking the runlevel command. Example:
#!/bin/bash
trap "runlevel >$HOME/run-level; exit 1" term
read line
echo "Input: $line"
save it as, say, term.sh and run it. By executing killall term.sh, you should able to see and investigate the run-level file in your home directory. By executing any of the following:
sudo reboot
sudo halt -p
sudo shutdown -P
and compare the difference in the file. Then you should have the idea on how to do it.
There is no way to determine if a SIGTERM is a part of a shutdown sequence. To detect a shutdown sequence you can either use use rc.d scripts like ereOn and Eric Sepanson suggested or use mechanisms like DBus.
However, from a design point of view it makes no sense to ignore SIGTERM even if it is not part of a shutdown. SIGTERM's primary purpose is to politely ask apps to exit cleanly and it is not likely that someone with enough privileges will issue a SIGTERM if he/she does not want the app to exit.
From man shutdown:
If the time argument is used, 5 minutes before the system goes down
the /etc/nologin file is created to ensure that further logins shall
not be allowed.
So you can test existence of /etc/nologin. It is not optimal, but probably best you can get.
Its a little bit of a hack but if the server is running systemd if you can run
/bin/systemctl list-jobs shutdown.target
... it will report ...
JOB UNIT TYPE STATE
755 shutdown.target start waiting <---- existence means shutting down
1 jobs listed.
... if the server is shutting down or rebooting ( hint: there's a reboot.target if you want to look specifically for that )
You will get No jobs running. if its not being shutdown.
You have to parse the output which is a bit messy as the systemctl doesnt return a different exit code for the two results. But it does seem reasonably reliable. You will need to watch out for a format change in the messages if you update the system however.
Making your application responding differently to some SIGTERM signals than others seems opaque and potentially confusing. It's arguable that you should always respond the same way to a given signal. Adding unusual conditions makes it harder to understand and test application behavior.
Adding an rc script that handles shutdown (by sending a special signal) is a completely standard way to handle such a problem; if this script is installed as part of a standard package (make install or rpm/deb packaging) there should be no worries about control of user machines.
I think I got it.
Source =
https://github.com/mozilla-b2g/busybox/blob/master/miscutils/runlevel.c
I copy part of the code here, just in case the reference disappears.
#include "libbb.h"
...
struct utmp *ut;
char prev;
if (argv[1]) utmpname(argv[1]);
setutent();
while ((ut = getutent()) != NULL) {
if (ut->ut_type == RUN_LVL) {
prev = ut->ut_pid / 256;
if (prev == 0) prev = 'N';
printf("Runlevel: prev=%c current=%c\n", prev, ut->ut_pid % 256);
endutent();
return 0;
}
}
puts("unknown");
see man systemctl, you can determine if the system is shutting down like this:
if [ "`systemctl is-system-running`" = "stopping" ]; then
# Do what you need
fi
this is in bash, but you can do it with 'system' in C
The practical answer to do what you originally wanted is that you check for the shutdown process (e.g ps aux | grep "shutdown -h" ) and then, if you want to be sure you check it's command line arguments and time it was started (e.g. "shutdown -h +240" started at 14:51 will shutdown at 18:51).
In the general case there is from the point of view of the entire system there is no way to do this. There are many different ways a "shutdown" can happen. For example someone can decide to pull the plug in order to hard stop a program that they now has bad/dangerous behaviour at shutdown time or a UPS could first send a SIGHUP and then simply fail. Since such a shutdown can happen suddenly and with no warning anywhere in a system there is no way to be sure that it's okay to keep running after a SIGHUP.
If a process receives SIGHUP you should basically assume that something nastier will follow soon. If you want to do something special and partially ignore SIGHUP then a) you need to coordinate that with whatever program will do the shutdown and b) you need to be ready that if some other system does the shutdown and kills you dead soon after a SIGHUP your software and data will survive. Write out any data you have and only continue writing to append-only files with safe atomic updates.
For your case I'm almost sure your current solution (treat all SIGHUPs as a shutdown) is the correct way to go. If you want to improve things, you should probably add a feature to the shutdown program which does a notify via DBUS or something similar.
When the system shuts down, the rc.d scripts are called.
Maybe you can add a script there that sends some special signal to your program.
However, I doubt you can stop the system shutdown that way.