Linux ANSI C simultaneous access to files and locking - c

I am writing Linux ANSI C cgi-bin server program with simultaneous access to files.
Is it possible to distinguish between file existence and file locking?
I can't find the answer with Google.
I'd like to write a program which tries to open file for a few seconds if fd<0
(thinking that the file is locked for a while).
But if the file does not exist it's fd also <0. So the program will waste time waiting.
Suppose a few threads try to append to the same file with no locking.
One tries to add "AAAA", another - "BBBB".
Can the result file be like "AABBAABB"?
Or it will always be like AAAABBBB or BBBBAAAA?
Or the result is unpredictable?

Am assuming IEEE Std 1003.1-2001 might defer to the ISO C standard...
In case the fopen fails i.e. fd < 0 then the system sets some error codes...
you can check those error codes. In case of file non-existent, the returned error would be
ENOENT
A component of filename does not name an existing file or filename is an empty string.
For more reference visit:
http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html
For point 2: I have been doing logging of certain data in my system by more than 100 processes writing to single file simultaneously but have never seen a merger of records(file is always opened in append mode). i.e. its always like AAAABBBB

Related

How to know from within a Fortran code if a file is opened by any program (Windows 7)

I have a Fortran code that updates in real time the content of a text file by adding some new real time measuraments at the very bottom of it. This text file is used (reading only) both by a fluid dynamics code (that continuously runs in real time) and by another executable built from matlab code (that performs plotting). I want to add a line in the Fortran code that says: update the text file ONLY IF it is not opened by any other program. I tried using INQUIRE:
do
INQUIRE(FILE = filename,OPENED = ISopen)
if (.not.ISopen) then
ADD NEW MEASUREMENTS HERE
exit
endif
endif
enddo
and before running this fortran program I opened the file with textpad. However, the variable ISopen is false. So I guess maybe INQUIRE only works for testing if file is opened within the fortran program itself. In fact if I add at the beginning of the above snippet of code:
OPEN (33,FILE = filename)
then ISopen is true. I then created an executable from a fortran code containing only:
OPEN (33,FILE = filename)
pause
CLOSE(33)
and I run it and let it in a paused status. I then run the first code I posted above and ISopen is still false. Any idea how to test if a file is open by any other program from within Fortran? My operative system is windows 7.
thanks
At the end I solved in this way. I could not find a way to know if the file is opened, and even if there was a way it could still happen that some other program opens it right between the time I check if its open and then I modify it. Therefore, I just create a temporary copy of the file to modify, I modify this temporary copy and then I move the file back by overwriting the original one. The latter operation is performed only if the file is not locked (i.e. no other program opened the file to read the data), so I keep trying to copy it until it succeeds. I tested in many situations and it works. The code is:
USE IFPORT
IMPLICIT NONE
character*256 :: DOScall
logical :: keepTRYING
integer :: resul
DOScall = 'copy D:\myfile.txt D:\myfile_TMP.txt' !create temporary copy
resul = SYSTEM(DOScall)
open(15,file ='D:\myfile_TMP.txt',form = 'formatted')
!.... perform here some writing operations on myfile_TMP........
close(15)
do
resul = SYSTEM('move D:\myfile_TMP.txt D:\myfile.txt')
if (resul==0) then
exit
else
pause(10)
endif
enddo
Note that works perfectly fine for multiple programs performing reading and one program writing in the file (as in my case where I have 2 programs reading and only one writing). If multiple programs write the same file I guess some other parallel techniques have to be used.
I do not think it is possible to check if a file is opened by an external process. Some programming languages allow you to check if the file is locked or not, and that is still far from telling you if the file is opened or not, both program must acquire and release system lock for it to really work. To my knowledge, the standard fortran does not have that feature directly, but you can use the semaphore from C with the interoperability stuff.
Most user application (editors mostly) however, before updating a file, usually check if the content on the disk has changed since they capture a copy, and alert the user. They also do that if they lost and acquire the focus. If you restrict your goal to updating only if the content has not changed since you opened it, you can do the same or simply open-add and close any time you want to add a new entry. A good editor will notify the user on the other side that the content had been change by another process.
An alternative is to simulate a lock yourself and buffer the data in fortran. By buffering I mean, collect some new data (let say 100, 1000 or whatever number that is convenient) and send them to the file at once. For each update, you open, update and close the file.
Your lock can be made of two simple files (emptys one for example), one created by the reader (matlab) and the other created by writer (fortran program). Let name them reading.ongoing for the reader and writing.ongoing for the writer.
On the fortran side, you do the following anytime you have collected enough data to write:
check for the existence of reading.ongoing (using inquire function), proceed only if it does not exist
create writing.ongoing
check for the existence of reading.ongoing again, if it exists, delete writing.ongoing and go back to step 1. If it does not exist, proceed forward.
open, write the data and close the data file
delete writing.ongoing
On matlab side, do similar thing, inverting the role of reading.ongoing and writing.ongoing.
In an exceptional race condition you could be blocked because they are all trying at the same time. In that case, you could modify the step 1. of matlab to force it to wait for few millisecond before proceeding. This will get you on the road as long as none of the program get killed between step 1 and 5.
You can also use semaphore with the interoperability stuff in fortran. See the following post for example. You can also similar think on the side of matlab, I do not have any example. That will lower your headhache and let the system manage the lock for you.

Using rename to safely overwrite a shared file in Linux

Here is the setup: I have a shared file (lets call it status.csv) that is read by many processes (lets call them consumers) in a read-only fashion. I have one producer that periodically updates status.csv by creating a temp file, writing data to it and using the C function discussed here:
http://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html
to rename the temp file (effectively overwrite) to status.csv so that the consumers can process the new data. It want to try and guarantee (as much as possible in the Linux world) that the consumers won't get a malformed/corrupted/half-old/half-new status.csv file (I want them to get either all of the old data or all of the new). I can't seem to guarantee this by reading the description of rename: it seems to guarantee that the rename action itself is atomic but I want to know if a consumer already has the status.csv file open, he will continue to read the same file as it was when it was opened, even if the file is renamed/overwritten by the producer in the middle of this reading operation.
I attempted to prototype this thinking that the consumers will get some type of error or a half old/half new file but it seems to always be in the state it was when it was open by the consumer even if renamed/overwritten multiple times.
BTW, these processes are running on the same machine (RHEL 6).
Thanks!
In Linux and similar systems, if a process has a file open and the file is deleted, the file itself remains undeleted until all processes close it. All that happens immediately is that the directory entry is deleted so that it cannot be opened again.
The same thing happens if rename is used to replace an open file. The old file descriptor still keeps the old file open. However, new opens will see the new file.
Therefore, for your consumers to see the new file, they must close and reopen the file.
Note: your consumers can discover if the file has been replaced by using the stat(2) call. If either the st_dev or st_ino entries (or both) have changed, then the file has been replaced and must be closed and reopened. This is how tail -F works.

popen vs. KornShell security

I am writing a C program using some external binaries to achieve a planned goal. I need to run one command which gives me an output, which in turn I need to process, then feed into another program as input. I am using popen, but wonder if that is the same as using a KornShell (ksh) temporary file instead.
For example:
touch myfile && chmod 700
cat myfile > /tmp/tempfile
process_file < /tmp/tempfile && rm /tmp/tempfile
Since that creates a temporary file which can be readable by root, would it be the same if one used popen in C, knowing that pipes are also files? Or is it safe to assume that the Operating System (OS) will not allow any other process to read your pipe?
You say "that creates a temporary file which can be readable by root", which implies that you are attempting to transfer the data in a way in which the root user cannot read it. That's impossible; in general, the root user has total control of the system, and can thus read any data that is on the system, whether it's in a temporary file or not. Even within a single process, the root user can read the memory of that process.
If you use popen(), there will not be an entry for the file on a filesystem; it creates a pipe, which acts like a file, but doesn't actually write that data to disk, instead it just passes it between two programs.
There will be a file descriptor for it; depending on the system, it may be easier or harder to intercept that data, but it will always be possible to do so. For instance, on Linux, you can just look in /proc/<pid>/fd/ to find all of the open file descriptors and manipulate them (read from or write to them).

How can I tell if a file is open elsewhere in C on Linux?

How can I tell if a file is open in C? I think the more technical question would be how can I retrieve the number of references to a existing file and determine with that info if it is safe to open.
The idea I am implementing is a file queue. You dump some files, my code processes the files. I don't want to start processing until the producer closes the file descriptor.
Everything is being done in linux.
Thanks,
Chenz
Digging out that info is a lot of work(you'd have to search thorugh /proc/*/fd
You'd be better off with any of:
Save to temp then rename. Either write your files to a temporary filename or directory, when you're done writinh, rename it into the directory where your app reads them. Renaming is atomic, so when the file is present you know it's safe to read.
Maybe a variant of the above , when you're done writing the file foo you create an empty file named foo.finished. You look for the presence of *.finished when processing files.
Lock the files while writing, that way reading the file will just block until the writer unlocks it. See the flock/lockf functions, they're advisory locks though so both the reader and writer have to lock , and honor the locks.
I don't think there is any way to do this in pure C (it wouldn't be cross platform).
If you know what files you are using ahead of time, you can use inotify to be notified when they open.
Use the lsof command. (List Open Files).
C has facilities for handling files, but not much for getting information on them. In portable C, about the only thing you can do is try to open the file in the desired way and see if it works.
generally you can't do that for variuos reasons (e.g. you cannot say if the file is opened with another user).
If you can control the processes that open the file and you are try to avoid collisions by locking the file (there are many libraries on linux in order do that)
If you are in control of both producer and consumer, you could use lockf() of flock() to lock the file.
there is lsof command on most distros, which shows all currently open files, you can ofcourse grep its output if your files are in the same directory or have some recognizable name pattern.

Get `df` to show updated information on FreeBSD

I recently ran out of disk space on a drive on a FreeBSD server. I truncated the file that was causing problems but I'm not seeing the change reflected when running df. When I run du -d0 on the partition it shows the correct value. Is there any way to force this information to be updated? What is causing the output here to be different?
In BSD a directory entry is simply one of many references to the underlying file data (called an inode). When a file is deleted with the rm(1) command only the reference count is decreased. If the reference count is still positive, (e.g. the file has other directory entries due to symlinks) then the underlying file data is not removed.
Newer BSD users often don't realize that a program that has a file open is also holding a reference. The prevents the underlying file data from going away while the process is using it. When the process closes the file if the reference count falls to zero the file space is marked as available. This scheme is used to avoid the Microsoft Windows type issues where it won't let you delete a file because some unspecified program still has it open.
An easy way to observe this is to do the following
cp /bin/cat /tmp/cat-test
/tmp/cat-test &
rm /tmp/cat-test
Until the background process is terminated the file space used by /tmp/cat-test will remain allocated and unavailable as reported by df(1) but the du(1) command will not be able to account for it as it no longer has a filename.
Note that if the system should crash without the process closing the file then the file data will still be present but unreferenced, an fsck(8) run will be needed to recover the filesystem space.
Processes holding files open is one reason why the newsyslog(8) command sends signals to syslogd or other logging programs to inform them they should close and re-open their log files after it has rotated them.
Softupdates can also effect filesystem freespace as the actual inode space recovery can be deferred; the sync(8) command can be used to encourage this to happen sooner.
This probably centres on how you truncated the file. du and df report different things as this post on unix.com explains. Just because space is not used does not necessarily mean that it's free...
Does df --sync work?

Resources