I'm using unix system() calls to gunzip and gzip files. With very large files sometimes (i.e. on the cluster compute node) these get aborted, while other times (i.e. on the login nodes) they go through. Is there some soft limit on the time a system call may take? What else could it be?
The calling thread should block indefinitely until the task you initiated with system() completes. If what you are observing is that the call returns and the file operation as not completed it is an indication that the spawned operation failed for some reason.
What does the return value indicate?
Almost certainly not a problem with use of system(), but with the operation you're performing. Always check the return value, but even more so, you'll want to see the output of the command you're calling. For non-interactive use, it's often best to write stdout and stderr to log files. One way to do this is to write a wrapper script that checks for the underlying command, logs the commandline, redirects stdout and stderr (and closes stdin if you want to be careful), then execs the commandline. Run this via system() rather than the OS command directly.
My bet is that the failing machines have limited disk space, or are missing either the target file or the actual gzip/gunzip commands.
I'm using unix system() calls to
gunzip and gzip files.
Probably silly question: why not use zlib directly from your application?
And system() isn't a system call. It is a wrapper for fork()/exec()/wait(). Check the system() man page. If it doesn't unblock, it might be that your application interferes somehow with wait() - e.g. do you have a SIGCHLD handler?
If it's a Linux system I would recommend using strace to see what's going on and which syscall blocks.
You can even attach strace to already running processes:
# strace -p $PID
Sounds like I'm running into the same intermittent issue indicating a timeout of some kind. My script runs every day. I'm starting to believe GZIP has a timeout.
gzip -vd filename.txt.gz 2>> tmp/errorcatch.txt 1>> logfile.log
stderr: Error for filename.txt.gz
Moves to next command 'cp filename* new/directory/', resulting in zipped version of filename in new directory
stdout from earlier gzip showing successful unzip of SAME file:
filename.txt.gz: 95.7% -- replaced with filename.txt
Successful out file from gzip is not there in source or new directory.
Following alerts, manual run of 'gzip -vd filename.txt.gz' never fails.
Details:
Only one call in script to unzip that file
Call for unzip is inside a function (for more rebust logging and alerting)
Unable to strace in production
Unable to replicate locally
In occurences over last month, found no consistency among file size, only
I'll simply be working around it with a retry logic and general scripting improvements, but I want the next google-er to know they're not crazy. This is happening to other people!
Related
I have a situation where I submitted jobs that have been running for five days but due to a bug introduced all the work could be lost. I made a 'system' call to compress the data file and then remove the original uncompressed file that could be as big as 4G. So I have this in the C code
strcpy(command,"data"); ////I should added a forward slash here "data/"
sprintf(command,"%scompress -c -i %s -o %s",command,name,out_name);
system(command);
remove(name); /////This is the problem
The bug is in the sprintf line, in which what I wanted to do was to call a program in data/compress, but due to the missing '/' the system command fails. And thus the data produced is not compressed AND then immediately the original file is DELETED leaving me with nothing! If it was compressed it would have been OK.
There are currently five running jobs in such a state. I need to divert this behavior somehow so that I don't lose five days work. I am thinking to create a fake script named 'datacompress' in the current directory to change the behavior of the running program. Can I do this or are there better options, if at all?
You can make datacompress a symbolic link to data/compress. Oops, this won't work unless the process's $PATH includes ..
Another option: remove the user's write permission to the directory containing name. This will cause the remove() function to fail.
If your system has Access Control Lists, remove the process's delete permission on the uncompressed file.
While you're trying to come up with a solution, you can suspend the process with:
kill -STOP <pid>
Create hard links (not symbolic links) to the data files:
ln datafile datafile.bkp
When the program removes the original datafile, the file's contents will remain under the .bkp filename.
And then fix the program to check error status of important things like the compress command.
Here I have one command which is like interactive mode:
obex_test -b $BD_ADDR $CH_NUM
This command is from a script but I want to run this command through a system call in a C program.
obex_test is nothing but obex file transfer library.
Here I want to receive a file from remote device to local device through bluetooth.
This is the manual page of obex_test
Please can anybody tell me how can I put my C program in interactive mode like this command, and I want to use this command also.
I used popen(command,"r") but its not useful; it does not take input from the user.
If I used "w" mode then I don't know what happens; I directly get a message like >Unknown Command. It's the error this command gives when we give different options. So it's taken something as a write mode.
You could have two pairs of pipes (created with the pipe(2) system call); one for data from your program to obex_test's stdin and one from obex_test's stdout to your program. Then you would fork and execve... Beware of deadlocks (your program blocked on writing to obex_test stdin when its output pipe is full and blocking it), you might need to call poll(2) or select(2)...
However, as it man pages explain, "obex_test is a test application for the libopenobex library". So why don't call directly functions inside this libopenobex library, which you would link to your program?
You can use the system command. Check the manual page for more details.
For e.g. system( "obex_test -b 172.16.7.1 1234" );
I'm trying to create a TAR archive from my program and then opening the archive for further processing. I have a 2 second delay between calling system() and open(). So far, it works fine, but I'm not sure why the 2 second delay is necessary or if it's the correct solution.
Without the delay, I get error code 2 (ENOENT "No such file or directory") from the open() call. My first thought was the filesystem wasn't updating itself fast enough and open() couldn't find the file. But what if the system is really busy? Do I need a longer delay? Should I loop until open() succeeds instead of the delay? Is the problem something completely different?
UPDATE
The root filesystem is EXT2. /tmp is mounted in RAM using TMPFS. I'm using tar to create an archive, not extract contents of one. Essentially, my program is supposed to create an archive of some log files and send them over the network (that's why I open the archive after creating it).
int return_value = system("/bin/tar -czf /tmp/logs.tar.gz /var/log/mylogs.log* &> /dev/null");
// error checks on return_value as described here: http://linux.die.net/man/2/wait
if(return_value != 0) {
return return_value;
}
//usleep(2000000);
return_value = open("/tmp/logs.tar.gz", O_RDONLY | O_LARGEFILE, 0);
// success or failure depending on whether there's a delay or not
You could even avoid running an external tar command by using libtar directly in your program.
ADDED
And you should show us your program. I'm pretty sure that if the call to system just extracted some file thru tar, it is available just after a successful system call, e.g. something like:
int err = system("/bin/tar xf /tmp/foo.tar bar");
int fd = -1;
if (err == 0)
fd = open("bar", O_RDONLY);
// fd is available
there is no reason to wait a few seconds in this code. You are probably doing more complex things, or you forgot to test the result of system
You think you are redirecting tar's output with "&>", but actually you are running it in the background, because system() happens to invoke a shell that doesn't support &> and so interprets it as "&" followed by ">". The delay causes your program to wait long enough that tar completes.
The fix is to modify your command to use syntax that your shell supports. Throwing the error output from tar is probably a mistake in any case.
Here's what I would try:
fork/exec tar yourself, and have your parent collect the tar-child. If system is introducing a race condition with the file system, taking control of the child process creating/reaping may help.
touch an empty file (fopen for writing and close) and then tar into into the new file.
Give tar the --verify option; the file has to exist in order to be verified :)
I have written a program which calculates the amount of battery level available in my laptop. I have also defined a threshold value in the program. Whenever the battery level falls below threshold i would like to call another process. I have used system("./invoke.o") where invoke.o is the program that i have to run. I am running a script which runs the battery level checker program for every 5 seconds. Everything is working fine but when i close the bash shell the automatic invocation of invoke.o is not happening. How should i make the invoke.o to be invoked irrespective of whether bash is closed or not??. I am using UBUNTU LINUX
Try running it as: nohup ./myscript.sh, where the nohup command allows you to close the shell without terminating the process.
You could run your script as a cron job. This lets cron set up standard input and output for you, reschedule the job, and it will send you email if it fails.
The alternative is to run a script in the background with all input and output, including standard error output, redirected.
While you could make a proper daemon out of your program that kind of effort is probably not necessary.
man nohup
man upstart
man 2 setsid (more complex, leads to longer trail of breadcrumbs on daemon launching).
I have a Windows batch script (my.bat) which has the following line:
DTBookMonitor.exe 2>&1 > log\cmdProcessLog.txt
So, from my understanding, this runs DTBookMonitor, redirects STDERR to STDOUT and then redirects STDOUT to the file log\cmdProcessLog.txt.
I then run my.bat. DTBookMonitor runs for a significant amount of time, and when I run my.bat a second time (while it is already running), it immediately exits from the second instance of my.bat.
Is this purely because of the redirection to cmdProcessLog?
Better late then never :)
Windows redirection locks the output file so that no other process can open the file for writing at the same time. That is why the second instance fails when it tries to redirect output to the same file.
I'd guess it's either due to that, or because DTBookMonitor only allows one instance of it to run at a time. The following test should shed some light on the situation:
Run the first (long) instance of DTBookMonitor
Run a second instance without redirecting any of its output
Alternatively, run a second instance, but redirect the output to a file other than log\cmdProcessLog.txt
Do you get similar results? Different results?