Piping Concurrency and Files - c

I have a C program which takes a file as an argument, cleans up the file and writes the cleansed data to a new temp file. It then accepts some stdin, cleans it up and sends it stdout.
I have a second file which performs operations on this temp file and on the stdin again.
./file_cleanse <file1.txt> | ./file_operation <temp.txt>
I either get no or nonsensical stdout from the ./file_operation and I believe this is because it is reading from a file that's still being written/doesn't exist at this point.
Is there any way to make ./file_operation wait until ./file_cleanse has returned a value in bash?

What about:
./file_cleanse <file1.txt> > /tmp/temporaryFile
./file_operation <temp.txt> < /tmp/temporaryFile

As I understand the question, file_operation needs to read the standard output of file_cleanse after processing the temporary file, but it should not process the temporary file until file_cleanse has written some data to its standard output (the standard input of file_operation).
If that's correct, then a simple synchronization is for file_operation to read (a byte or any convenient larger amount of data) from its standard input. When this is successful, the file_cleanse must have finished with the temporary file; file_operation can therefore process the temporary file, and then read the rest of its standard input and process that appropriately.

Related

Can an already opened FILE handle reflect changes to the underlying file without re-opening it?

Assuming a plain text file, foo.txt, and two processes:
Process A, a shell script, overwrites the file in regular intervals
$ echo "example" > foo.txt
Process B, a C program, reads from the file in regular intervals
fopen("foo.txt", "r"); getline(buf, len, fp); fclose(fp);
In the C program, keeping the FILE* fp open after the initial fopen(), doing a rewind() and reading again does not seem to reflect the changes that have happened to the file in the meantime. Is the only way to see the updated contents by doing an fclose() and fopen() cycle, or is there a way to re-use the already opened FILE handle, yet reading the most recently written data?
For context, I'm simply trying to find the most efficient way of doing this.
On Unix/Linux, when you create a file with a name which already existed, the old file is not deleted or altered in any way. A new file is created and the directory is updated to point at the new file instead of the old one.
The old file will continue to exist as long as some directory entry points at it (Unix file systems allow the same file to be pointed to by multiple directories) or some program has an open file handle to the file, which is more relevant to your question.
As long as you don't close fp, it continues to refer to the original file, even if that file is no longer referenced by the filesystem. When you close fp, the file will get garbage collected automatically, and the next time you open foo.txt, you'll get a file descriptor for whatever file happens to have that name at that point in time.
In short, with the shell script you indicate, your C program must close and reopen the file in order to see the new contents.
Theoretically, it would be possible for the shell script to overwrite the same file without deleting it, but (a) that's tricky to get right; (b) it's prone to race conditions; and (c) closing and reopening the file is not that time-consuming. But if you did that, you would see the changes. [Note 1]
In particular, it's common (and easy) to append to an existing file, and if you have a shell script which does that, you can keep the file descriptor open and see the changes. However, in that case you would normally have already read to the end of the file before the new data was appended, and the standard C library treats the feof() indicator as sticky; once it gets set, you will continue to get an EOF indication from new reads. If you suspect that some process will be writing more data to the file, you should reset the EOF indication with fseek(fp, 0, SEEK_CUR); before retrying the read.
Notes
As #amadan points out in a comment, there are race conditions with echo text > foo.txt as well, although the window is a bit shorter. But you can definitely avoid race conditions by using the idiom echo text > temporary_file; mv -f temporary_file foo.txt, because the rename operation is atomic. Of course, that would definitely require you to close and reopen the file. But it's a good idea, particularly if the contents being written are long or critical, or if new files are created frequently.

read line of file in realtime in C

I know how to do basic file operations in C, but what I'd like to do if it is possible is somehow create a variable that represents a live running file (for example, an access_log from apache that updates every second). Then I want to be able to read the last line that is entered into the file as it happens regardless of whether any other process currently has the file open or not.
I'm thinking of code like this:
int main(){
FILE *live=fattachto("/path/to/apaches/access_log");
long lastupdated=live->lastwrite();
while(1){
if(live->lastwrite() != lastupdated){printf("File was just updated now\n");}
sleep(1);
}
}
Yes I did put in sleep in my code because I want to ensure my code doesn't oveheat the cpu.
and the code above as-is won't execute because I'm looking for the correct set of functions to use to produce the end result.
Any idea?
Why don't you just set line buffering or use small buffer read size, once you've seeked to the end of the log file? Then simply do blocking reads on the lines from the file as they're added?
Then when the read returns, you can signal, set flags or initiate processing, on any interesting input.

File is not written on disk until program ends

I'm writing a file using a c code on a unix system . I open it , write a few lines and close it. Then i call a shell script, say code B where this file is to be used and then return back to main program. However, when code B tries to read the file, the file is empty.
I checked the file on the file system, its size is shown as 0 and no data is present in file. However after killing the running c code process, file has data present in it.
Here is the piece of code -
void writefile(){
FILE *fp;
fp = fopen("ABC.txt","w");
fputs("Some lines...\n",fp);
fclose(fp);
system("code_B ABC.txt");
}
Please advise how can I read the file in the shell script without stopping the c code process.
If there's some time between the fputs and fclose, add
fflush(fp);
This will cause the contents of the disk file to be written.
You should do fsync() after the fclose(), to guarantee the writing of the file to the disk.
Take a look at this question:
Does Linux guarantee the contents of a file is flushed to disc after close()?
The kernel ensures that data which is written to a file can be read back afterwards from a different process, even if it is not physically written to the disc yet. So, in usual scenarios, there is no need to call fsync() - still, even with fsync(), the filesystem could decide to further delay physical writes.
One common problem is that the C library has not flushed its buffers yet, in which case you would need to call fflush() - however, you are calling fclose() before launching your sub process, and fclose() internally calls fflush().
Actually, since system() is using a shell to launch the command passed as parameter, you can use the following simple SSCCE to verify that it works:
#include <stdio.h>
void writefile(){
FILE *fp;
fp = fopen("ABC.txt","w");
fputs("Some lines...\n",fp);
fclose(fp);
system("cat ABC.txt");
}
int main() {
writefile();
return 0;
}
Here, system() simply calls the cat command to print the file contents. The output is:
$ ./writefile
Some lines...

Read file to standard in for parser

I am trying to implement a shell program in a linux environment. The part I am having trouble with is reading a setup_file inside of a shell before running the shell, to do things like set environment variables.
Currently the shell has a parser_results = parse() function which does a "getchar" and waits until the user types something into stdin, then does an execute(parser_result) which executes the command using the output of the parser.
What I want to do is to read the setup_file which has commands inside of it, have the parser read them in and give me the data structures I need. Then I can run execute.
My question is is how do I redirect the contents of the file to stdin? And how do I call the parser to parse this redirected input? I have been playing with dup and dup2 to no avail.
Short answer (to the question 'how do I redirect the contents of the file to stdin') is "You Don't".
You revise your input function to read from a given file stream instead of stdin, and then for reading from the file, you open it and pass that file pointer to your parsing code (and close when the parsing code is done), and then when you're ready for user input, you call the parsing code with stdin instead of the file. That saves fiddling with stdin.

popen vs. KornShell security

I am writing a C program using some external binaries to achieve a planned goal. I need to run one command which gives me an output, which in turn I need to process, then feed into another program as input. I am using popen, but wonder if that is the same as using a KornShell (ksh) temporary file instead.
For example:
touch myfile && chmod 700
cat myfile > /tmp/tempfile
process_file < /tmp/tempfile && rm /tmp/tempfile
Since that creates a temporary file which can be readable by root, would it be the same if one used popen in C, knowing that pipes are also files? Or is it safe to assume that the Operating System (OS) will not allow any other process to read your pipe?
You say "that creates a temporary file which can be readable by root", which implies that you are attempting to transfer the data in a way in which the root user cannot read it. That's impossible; in general, the root user has total control of the system, and can thus read any data that is on the system, whether it's in a temporary file or not. Even within a single process, the root user can read the memory of that process.
If you use popen(), there will not be an entry for the file on a filesystem; it creates a pipe, which acts like a file, but doesn't actually write that data to disk, instead it just passes it between two programs.
There will be a file descriptor for it; depending on the system, it may be easier or harder to intercept that data, but it will always be possible to do so. For instance, on Linux, you can just look in /proc/<pid>/fd/ to find all of the open file descriptors and manipulate them (read from or write to them).

Resources