I'm not an expert in C and I'm looking for some advice to to make my program more robust and reliable. Just to give some context: I've written a program to do some scientific computation that takes quite a long time (about 20h) that I'm executing on a large university HPC linux cluster using a SLRUM scheduling system and NFS mounted file systems. What seems to happen is that some time during the 20h the connection to the file system goes stale (on the entire machine; independent of my program) and the first attempt to open & write a file takes a really long time and that results in a segfault cored dumped error that I have so far not been able to precisely track down. Below is a minimal file that at least conceptually reproduces the error: The program starts, opens a file and everything works. The program does some long computation (simulated by sleep()), tries to open & write to the same file again, and it fails. What are some conventions to make my code more robust and reliably write my results to file without crashing?
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char **argv) {
// Declare variables
FILE *outfile;
char outname[150] = "result.csv";
// Open file for writing
printf("CHECKING if output file '%s' is writable?", outname);
outfile=fopen(outname, "w");
if (outfile == NULL) {
perror("Failed: ");
exit(EXIT_FAILURE);
}
fclose(outfile);
printf(" PASSED.\n");
// Do some computation that takes really long (around 19h)
sleep(3);
// Open file again and Write results
printf("Writing results to %s ...", outname);
outfile=fopen(outname, "w");
if (outfile == NULL) {
perror("Failed writing in tabulate_vector_new: ");
exit(EXIT_FAILURE);
}
fprintf( outfile, "This is the important result.\n");
fclose(outfile);
printf(" DONE.\n");
return 0;
}
It seems odd that your program would segfault due to an NFS issue. I would expect it to hang indefinitely, not crash. That having been said, I would suggest forking a new process to check whether the NFS mount is working. That way, your important code won't be directly involved in testing the problematic file system. Something like the following approach may be useful:
pid_t pid = fork();
if (pid == -1)
{
// error, failed to fork(). should probably give up now. something is really wrong.
}
else if (pid > 0)
{
// if the child exits, it has successfully interacted with the NFS file system
wait(NULL);
// proceed with attempting to write important data
}
else
{
// we are the child; fork df in order to test the NFS file system
execlp("df", "df", "/mnt", (char *)NULL)
// the child has been replaced by df, which will try to statfs(2) /mnt for us
}
The general concept here is that we utilize the df command to check whether the NFS file system (which I assume is at /mnt) is working. If it's temporarily not working, df should hang until it starts working again, and then exit, returning control to your program. If you suspect df might hang forever, you could enhance my example by using alarm(2) to wait a certain period of time, probably at least a few minutes, after which you could retry running df. Note that this could result in zombie df processes sticking around.
In the end, the correct solution is to try to get a more reliable NFS server, but until you can do that, I hope this is helpful.
Related
In Linux, I am finding pid of process by opening pipe with "pidof process_name" command and then reading it's output using fgets function. But it fails to find pid once in a while. Below is my code for finding pid of my process.
int FindPidByProcessName(char *pName)
{
int pid = -1;
char line[30] = { 0 };
char buf[64] = { 0 };
sprintf(buf, "pidof %s", pName);
//pipe stream to process
FILE *cmd = popen(buf, "r");
if (NULL != cmd)
{
//get line from pipe stream
fgets(line, 30, cmd);
//close pipe
pclose(cmd); cmd = NULL;
//convert string to unsigned LONG integer
pid = strtoul(line, NULL, 10);
}
return pid;
}
In output sometimes pid=0 comes even though process is available in "ps" command output.
So, I try to find root cause behind this issue and i found something like input/output buffer mechanism is may creating issue in my scenario.
So I try to use sync() function before opening popen() and strangely my function starts working with 100% accuracy.
Now sync() function is taking too much time(approximately 2min sometime) to complete its execution which is not desirable. So i try to use fflush(), fsync() and fdatasync() but these all are not working appropriately.
So please anyone tell me what was the exact root cause behind this issue And how to solve this issue appropriately?
Ok, the root cause of the error is stored in the errno variable (which btw you do not need to initialize). You can get an informative message using the fucntion
perror("Error: ");
If u use perror the variable errno is interpreted and you get a descriptive message.
Another way (the right way!) of finding the root cause is compiling your program with the -g flag and running the binary with gdb.
Edit: I strongly suggest the use of the gdb debugger so that you can look exactly what path does your code follow, so that you can explain the strange behaviour you described.
Second Edit: Errno stores the last error (return value). Instead of calling the functions as you do, you should write, and check errno immediately:
if ((<function>) <0) {
perror("<function>: ");
exit(1);
}
This is my program:
#include <stdio.h>
int main() {
FILE *logh;
logh = fopen("/home/user1/data.txt", "a+");
if (logh == NULL)
{
printf("error creating file \n");
return -1;
}
// write some data to the log handle and check if it gets written..
int result = fprintf(logh, "this is some test data \n");
if (result > 0)
printf("write successful \n");
else
printf("couldn't write the data to filesystem \n");
while (1) {
};
fclose(logh);
return 0;
}
When i run this program, i see that the file is getting created but it does not contain any data. what i understand i that there is data caching in memory before the data is actually written to the filesystem to avoid multiple IOs to increase performance. and I also know that i can call fsync/fdatasync inside the program to force a sync. but can i force the sync from outside without having to change the program?
I tried running sync command from Linux shell but it does not make the data to appear on the file. :(
Please help if anybody knows any alternative to do the same.
One useful information: I was researching some more on this and finally found this, to remove internal buffering altogether, the FILE mode can be set to _IONBF using int setvbuf(FILE *stream, char *buf, int mode, size_t size)
The IO functions usingFILE pointers cache the data to be written in an internal buffer within the program's memory until they decide to perform a system call to 'really' write it (which is for normal files usually when the size of the data cached reaches BUFSIZ).
Until then, there is no way to force writing from outside the progam.
The problem is that your program does not close the file because of your while statement. Remove these lines:
while (1) {
};
If the intent is to wait forever, then close the file with fclose before executing the while statement.
Is it alright for multiple processes to access (write) to the same file at the same time? Using the following code, it seems to work, but I have my doubts.
Use case in the instance is an executable that gets called every time an email is received and logs it's output to a central file.
if (freopen(console_logfile, "a+", stdout) == NULL || freopen(error_logfile, "a+", stderr) == NULL) {
perror("freopen");
}
printf("Hello World!");
This is running on CentOS and compiled as C.
Using the C standard IO facility introduces a new layer of complexity; the file is modified solely via write(2)-family of system calls (or memory mappings, but that's not used in this case) -- the C standard IO wrappers may postpone writing to the file for a while and may not submit complete requests in one system call.
The write(2) call itself should behave well:
[...] If the file was
open(2)ed with O_APPEND, the file offset is first set to the
end of the file before writing. The adjustment of the file
offset and the write operation are performed as an atomic
step.
POSIX requires that a read(2) which can be proved to occur
after a write() has returned returns the new data. Note that
not all file systems are POSIX conforming.
Thus your underlying write(2) calls will behave properly.
For the higher-level C standard IO streams, you'll also need to take care of the buffering. The setvbuf(3) function can be used to request unbuffered output, line-buffered output, or block-buffered output. The default behavior changes from stream to stream -- if standard output and standard error are writing to the terminal, then they are line-buffered and unbuffered by default. Otherwise, block-buffering is the default.
You might wish to manually select line-buffered if your data is naturally line-oriented, to prevent interleaved data. If your data is not line-oriented, you might wish to use un-buffered or leave it block-buffered but manually flush the data whenever you've accumulated a single "unit" of output.
If you are writing more than BUFSIZ bytes at a time, your writes might become interleaved. The setvbuf(3) function can help prevent the interleaving.
It might be premature to talk about performance, but line-buffering is going to be slower than block buffering. If you're logging near the speed of the disk, you might wish to take another approach entirely to ensure your writes aren't interleaved.
This answer was incorrect. It does work:
So the race condition would be:
process 1 opens it for append, then
later process 2 opens it for append, then
later still 1 writes and closes, then
finally 2 writes and closes.
I'd be impressed if that 'worked' because it isn't clear to me what
working should mean. I assume 'working' means all of the bytes written
by the two processes are inthe log file? I'd expect that they both
write starting at the same byte offset, so one will replace the others
bytes. It will all be okay upto and including step 3. and only show up
as a problem at step 4, Seems like an easy test to write: open getchar
... write close.
Is it critical that they can have the file open simultaneously? A
more obvious solution if the write is quick, is to open exclusive.
For a quick check on your system, try:
/* write the first command line argument to a file called foo
* stackoverflow topic 9880935
*/
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main (int argc, const char * argv[]) {
if (argc <2) {
fprintf(stderr, "Error: need some text to write to the file Foo\n");
exit(1);
}
FILE* fp = freopen("foo", "a+", stdout);
if (fp == NULL) {
perror("Error failed to open file\n");
exit(1);
}
fprintf(stderr, "Press a key to continue\n");
(void) getchar(); /* Yes, I really mean to ignore the character */
if (printf("%s\n", argv[1]) < 0) {
perror("Error failed to write to file: ");
exit(1);
}
fclose(fp);
return 0;
}
I'm very new to C so please bear with me. I am struggling with this for really long time and I had a hard time to narrow down the cause of error.
I noticed that when forking process and writing to a file (only the original process writes to the file a strange thing happens, the output is nearly multiplied by the number of forks, hard to explain, thus I made a small test code where you can run and it recreates the problem.
#include <stdio.h>
#include <stdlib.h>
void foo()
{
FILE* file = fopen("test", "w");
int i=3;
int pid;
while (i>0)
{
pid=fork();
if(pid==0)
{
printf("Child\n");
exit(0);
}
else if(pid > 0)
{
fputs("test\n", file);
i=i-1;
}
}
}
int main()
{
foo();
exit(EXIT_SUCCESS);
}
Compile and run it once the way it is and once with file=stdout. When writing to stdout the output is:
test
test
test
But when writing to the file the output is:
test
test
test
test
test
test
Also if you add indexing and change i to a larger number you can see some kind of a pattern, but that doesn't help me.
Well frankly said I have no idea why could this happen, neither how to fix it. But I am a total novice at C so there might be just a normal logical explanation for all this =).
Thank you for all your time and answers.
stdout is usually unbuffered or line buffered; other files are typically block buffered. You need to fflush() them before fork(), or every child will flush its own copy of the buffer, leading to this multiplication.
I have written a C utility for Linux that checks the contents of /proc/net/dev once every second. I open the file using fopen("/proc/net/dev", "r") and then fclose() when I'm done.
Since I'm using a 'pseudo' file rather than a real one, does it matter if I open/close the file each time I read from it, or should I just open it when my app starts and keep it open the whole time? The utility is launched as a daemon process and so may run for a long time.
It shouldn't matter, no. However, there might be issues with caching/buffering, which would mean it's actually best (safest) to do as you do it, and re-open the file every time. Since you do it so seldom, there's no performance to be gained by not doing it, so I would recommend keeping your current solution.
What you want is unbuffered reading. Assuming you can't just switch to read() calls, open the device, and then set the stream to unbuffered mode. This has the additional advantage that there is no need to close the stream when you're done. Just rewind it, and start reading again.
FILE *f = fopen("/proc/net/dev", "r");
setvbuf(f, NULL, _IONBF, 0);
while (running)
{
rewind(f);
...do your reading...
}
The pseudo files in "/proc" are dangerous for daemons because if the kernel decides to drop them they just vanish leaving you with an invalid FILE * struct. That means that your strategy is the only correct one to treat a file in "/proc" (but no one is going to expect that "/proc/net/dev" is removed by the kernel during runtime).
In general (especially for files in "/proc/[PID]") one should open files in "/proc" before an operation and close them as soon as possible after the operation is done.
See this example code. It forks and reads the "/proc/[PID]/status" file of the child, once before the child has exited and once during the cleanup of the child.
#include <unistd.h>
#include <time.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(int argc, char** argv){
pid_t child=fork();
if(child==0){
sleep(1);
} else {
char path[256],buffer[256]; int status,read_length;
sprintf(path,"/proc/%i/status",child);
//do a read while the child is alive
FILE *fd=fopen(path,"r");
if(fd!=0){
read_length=fread(buffer,1,255,fd);
printf("Read: %i\n",read_length);
fclose(fd);
}
//repeat it while the child is cleaned up
fd=fopen(path,"r");
wait(&status);
if(fd!=0){
read_length=fread(buffer,128,1,fd);
printf("Read: %i\n",read_length);
fclose(fd);
}
}
}
The result is as follows
f5:~/tmp # ./a.out
Read: 255
Read: 0
So you see, you could easily get an unexpected result from files in "/proc" if they get deleted by the kernel during you program runtime.