While daemonizing my program in C using code "stolen" from this webpage upon initialisation the daemon creates a lockfile which stores the process pid thusly:
:
lfp=open(LOCK_FILE,O_RDWR|O_CREAT,0640);
if (lfp<0) exit(1); /* can not open */
if (lockf(lfp,F_TLOCK,0)<0) exit(0); /* can not lock */
sprintf(str,"%d\n",getpid());
write(lfp,str,strlen(str)); /* store pid in file */
:
The webpage doesn't seem to bother with cleaning up after the daemon terminates. In fact, searching the web I cannot find a way to handle this.
All examples of daemonizing C-code will create a lockfile, but none remove it afterwards.
How should I unlock and then remove the pidfile assuming that I can catch SIGTERM and exit gracefully?
The lock itself is automatically released:
reference:
File locks are released as soon as the process holding the locks closes some
file descriptor for the file.
To remove the file you can use unlink. I suspect that the lock file is kept around since future invocations of the program will re-recreate it, thus reducing overhead.
You can unlock explicitly using F_ULOCK. By the way from fcntl manpage (which is used on linux by lockf) it is indicated that locks are removed on file close or process end.
So after dying the daemon do not hold any lock. It can so open the file and set a new lock.
Related
Can batch or Windows cmd file create or release mutex, without custom exe?
Similar questions (like this) tell only to write custom c exe.
Can I work with mutex from bare batch script, or only with use a some Windows standard utils?
UPD
I do not need the single instance bat, I need to work with mutex to send the message to already exist parent-process with known mutex. Temp file is not an answer.
UPD 2
Parent program have the mutex (I use the hMutex:=CreateMutex( nil, TRUE, 'VCSClient2r123r123refqwe' ); in program startups, to prevent the multiple instances of parent program.
Parent program run svn.exe diff "some_modified_file" --diff-cmd "diff.bat" --force
diff.bat should send some messages to parent program. Because mutex is already exist and mutex name are fixed, I want to use it to send info from diff.bat to parent process.
I have an application that creates multiple instances (processes) of itself and these processes have a shared data structure. In that struct there is a file descriptor used for logging data to file. There is a check in the logging function that checks to see if the file descriptor is -1 and if it is then it opens the file and sets the value of the shared file descriptor.
Other processes / threads do the same check but at this time the fd is != -1. So the file does not get opened. They then continue to writing to the file. The write fails most of the time and returns -1. When the write did not fail I checked the file path of the fd using readlink. The path was some other file than the log file.
I am assuming that this is because even though the file descriptor value was always 11, even in subsequent runs, that value refers to a different file for each process. So it is the eleventh file that process has open? So the log file is not even regarded as open for these processes and even if they do open the file the fd would be different.
So my question is this correct? My second question is how do I then re-implement this method given that multiple processes need to write to this log file. Would each process need to open that file.. or is there another way that is more efficient.. do I need to close the file so that other processes can open and write to it..?
EDIT:
The software is an open source software called filebench.
The file can be seen here.
Log method is filebench_log. Line 204 is the first check I mentioned where the file is opened. The write happens at line 293. The fd value is eleven among all processes and the value is the same: 11. It is actually shared through all processes and setup mostly here. The file is only opened once (verified via print statements).
The shared data struct that has the fd is called
filebench_shm
and the fd is
filebench_shm->shm_log_fd
EDIT 2:
The error message that I get is Bad file descriptor. Errno is 9.
EDIT 3:
So it seems that each process has a different index table for the fds. Wiki:
On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier.
So the issue that I am having is that for two processes with process IDs 101, 102 the file descriptor 11 is not the same for the two processes:
/proc/101/fd/11
/proc/102/fd/11
I have a shared data structure between these processes.. is there another way I can share an open file between them other than an fd since that doesn't work?
It seems that it would be simplest to open the file before spawning the new processes. This avoids all the coordination complexity regarding opening the file by centralizing it to one time and place.
I originally wrote this as a solution:
Create a shared memory segment.
Put the file descriptor variable in the segment.
Put a mutex semaphore in the segment
Each process accesses the file descriptor in the segment. If it is not open, lock the semaphore, check if it is open, and if not open the
file. Release the mutex.
That way all processes share the same file descriptor.
But this assumes that the the underlying file descriptor object is also in the shared memory, which I think it is not.
Instead, use the open then fork method mentioned in the other answer, or have each process open the file and use flock to serialize access when needed.
I wondering if there is a way to prevent user to launch many time the program to avoid some problems.
when start my program with
/etc/init.d/myprog start
Next time when the user execute the same command it will not run.
The best way is for the launcher to attempt a launch, capturing the pid of the launch into /var/run
Then on subsequent launches, you read the pid file, and do a process listing (ps) to see if a process with that pid is running. If so, the subsequent launch will report that the process is already running and do nothing.
Read up on pid and lock files to get an idea of what is considered standard under the init.d system.
You need to open a .lock file and lock it with flock.
int fd = open("path/to/file.lock", O_RDWR);
if (fd == -1) {
/* error opening file, abort */
}
if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
/* other process already open, abort */
}
The Linux Standard Base supports a start_daemon function that delivers this feature. Use it from your init script.
The start_daemon, killproc and pidofproc functions shall use this algorithm for determining the status and the pid(s) of the specified program. They shall read the pidfile specified or otherwise /var/run/basename.pid and use the pid(s) herein when determining whether a program is running. The method used to determine the status is implementation defined, but should allow for non-binary programs. 1 Compliant implementations may use other mechanisms besides those based on pidfiles, unless the -p pidfile option has been used. Compliant applications should not rely on such mechanisms and should always use a pidfile. When a program is stopped, it should delete its pidfile. Multiple pid(s) shall be separated by a single space in the pidfile and in the output of pidofproc.
This runs the specified program as a daemon. start_daemon shall check if the program is already running using the algorithm given above. If so, it shall not start another copy of the daemon unless the -f option is given. The -n option specifies a nice level. See nice(1). start_daemon should return the LSB defined exit status codes. It shall return 0 if the program has been successfully started or is running and not 0 otherwise.
I'm using flock() for inter-process named mutexes (i.e. some process can decide to hold a lock on "some_name", which is implemented by locking a file named "some_name" in a temp directory:
lockfile = "/tmp/some_name.lock";
fd = open(lockfile, O_CREAT);
flock(fd, LOCK_EX);
do_something();
unlink(lockfile);
flock(fd, LOCK_UN);
The lock file should be removed at some point, to avoid filling the temp directory with hundreds of files.
However, there is an obvious race condition in this code; example with processes A, B and C:
A opens file
A locks file
B opens file
A unlinks file
A unlocks file
B locks file (B holds a lock on the deleted file)
C opens file (a new file one is created)
C locks file (two processes hold the same named mutex !)
Is there a way to remove the lock file at some point without introducing this race condition ?
Sorry if I reply to a dead question:
After locking the file, open another copy of it, fstat both copies and check the inode number, like this:
lockfile = "/tmp/some_name.lock";
while(1) {
fd = open(lockfile, O_CREAT);
flock(fd, LOCK_EX);
fstat(fd, &st0);
stat(lockfile, &st1);
if(st0.st_ino == st1.st_ino) break;
close(fd);
}
do_something();
unlink(lockfile);
flock(fd, LOCK_UN);
This prevents the race condition, because if a program holds a lock on a file that is still on the file system, every other program that has a leftover file will have a wrong inode number.
I actually proved it in the state-machine model, using the following properties:
If P_i has a descriptor locked on the filesystem then no other process is in the critical section.
If P_i is after the stat with the right inode or in the critical section it has the descriptor locked on the filesystem.
In Unix it is possible to delete a file while it is opened - the inode will be kept until all processes have ended that have it in their file descriptor list
In Unix it is possible to check that a file has been removed from all directories by checking the link count as it becomes zero
So instead of comparing the ino-value of the old/new file paths you can simply check the nlink count on the file that is already open. It assumes that it is just an ephemeral lock file and not a real mutex resource or device.
lockfile = "/tmp/some_name.lock";
for(int attempt; attempt < timeout; ++attempt) {
int fd = open(lockfile, O_CREAT, 0444);
int done = flock(fd, LOCK_EX | LOCK_NB);
if (done != 0) {
close(fd);
sleep(1); // lock held by another proc
continue;
}
struct stat st0;
fstat(fd, &st0);
if(st0.st_nlink == 0) {
close(fd); // lockfile deleted, create a new one
continue;
}
do_something();
unlink(lockfile); // nlink :=0 before releasing the lock
flock(fd, LOCK_UN);
close(fd); // release the ino if no other proc
return true;
}
return false;
If you use these files for locking only, and do not actually write to them, then I suggest you treat the existence of the directory entry itself as an indication for a held lock, and avoid using flock altogether.
To do so, you need to construct an operation which creates a directory entry and reports an error if it already existed. On Linux and with most file systems, passing O_EXCL to open will work for this. But some platforms and some file systems (older NFS in particular) do not support this. The man page for open therefore suggests an alternative:
Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same file system (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
So this looks like a locking scheme which is officially documented and therefore indicates a certain level of support and best practice suggestion. But I have seen other approaches as well. bzr for example uses directories instead of symlinks in most places. Quoting from its source code:
A lock is represented on disk by a directory of a particular name,
containing an information file. Taking a lock is done by renaming a
temporary directory into place. We use temporary directories because
for all known transports and filesystems we believe that exactly one
attempt to claim the lock will succeed and the others will fail. (Files
won't do because some filesystems or transports only have
rename-and-overwrite, making it hard to tell who won.)
One downside to the above approaches is that they won't block: a failed locking attempt will result in an error, but not wait till the lock becomes available. You will have to poll for the lock, which might be problematic in the light of lock contention. In that case, you might want to further depart from your filesystem-based approach, and use third party implementations instead. But general questions on how to do ipc mutexes have already been asked, so I suggest you search for [ipc] [mutex] and have a look at the results, this one in particular. By the way, these tags might be useful for your post as well.
I have written a simple program helping me to test fcntl file locking. Argument 'set' locks my test file. Argument 'get' tells me if the file is locked or not. Argument 'un' tries to unlock the file.
In one shell, I run the program to lock the file. I keep the file open and wait for enter.
$ ./lock set
file is locked
hit enter to release lock with a call to fcntl
In the other shell, I run the program to lock the file. It does not work because the file is already locked. I run the program to check the lock. I'm told the lock is not available. All this works as expected.
$ ./lock get
locking is not possible
$ ./lock set
locking file failed
What happens if I try unlocking my file in shell two? It seems fcntl never returns me an error when called with l_type = F_UNLCK.
$ ./lock un
unlocked
file either was successfully unlocked or was already unlocked
$ ./lock get
locking is not possible
$ ./lock set
locking file failed
I know my unlocking code is good. If I go back in shell one and let the program unlock:
$ ./lock set
file is locked
hit enter to release lock with a call to fcntl
unlocked
hit enter to close the file
I can confirm the result in shell two:
$ ./lock get
locking is possible
I am only using exclusive write lock on the entire file:
fl.l_type = F_WRLCK;
fl.l_whence = SEEK_SET;
fl.l_start = 0;
fl.l_len = 0;
fl.l_pid = -1; // used by F_GETLK only
result = fcntl(fd, F_SETLK, &fl);
Here's how I'm doing the unlock:
fl.l_type = F_UNLCK;
fl.l_whence = SEEK_SET;
fl.l_start = 0;
fl.l_len = 0;
fl.l_pid = -1; // used by F_GETLK only
result = fcntl(fd, F_SETLK, &fl);
if (!result)
{
printf("unlocked\n");
}
I am working on RHEL 5.5.
Can you explain this fcntl behavior? Thanks!
EDIT: The man page seems to imply that the unlocking operation is meant to be used by the lock owner only: "As well as being removed by an explicit F_UNLCK, record locks are automatically released when the process terminates or if it closes any file descriptor referring to a file on which locks are held."
Because ... that's the way it works?
On Unix systems, locks are advisory. You use them by attempting to obtain a lock, and only doing what you want if you succeed.
This includes unlocking the file. Since the whole system is voluntary, nothing prevents you from unlocking a file from another process, and therefore your program always succeeds at doing so.
More to the point, that's what POSIX specifies them doing. It's far from the only, or even the most dubious, decision they made concerning file locks (the latter would be that all locks, system-wide, on a given file must be dropped when any process with a file descriptor open on the file close()s it).