I am working on a multithreaded system where a file can be shared among different threads based on the file access permissions.
How can I check if file is already opened by another thread?
To find out if a named file is already opened on linux, you can scan the /proc/self/fd directory to see if the file is associated with a file descriptor. The program below sketches out a solution:
DIR *d = opendir("/proc/self/fd");
if (d) {
struct dirent *entry;
struct dirent *result;
entry = malloc(sizeof(struct dirent) + NAME_MAX + 1);
result = 0;
while (readdir_r(d, entry, &result) == 0) {
if (result == 0) break;
if (isdigit(result->d_name[0])) {
char path[NAME_MAX+1];
char buf[NAME_MAX+1];
snprintf(path, sizeof(path), "/proc/self/fd/%s",
result->d_name);
ssize_t bytes = readlink(path, buf, sizeof(buf));
buf[bytes] = '\0';
if (strcmp(file_of_interest, buf) == 0) break;
}
}
free(entry);
closedir(d);
if (result) return FILE_IS_FOUND;
}
return FILE_IS_NOT_FOUND;
From your comment, it seems what you want to do is to retrieve an existing FILE * if one has already been created by a previous call to fopen() on the file. There is no mechanism provided by the standard C library to iterate through all currently opened FILE *. If there was such a mechanism, you could derive its file descriptor with fileno(), and then query /proc/self/fd/# with readlink() as shown above.
This means you will need to use a data structure to manage your open FILE *s. Probably a hash table using the file name as the key would be the most useful for you.
If you tend to do it in shell, you can simply use lsof $filename.
You can use int flock(int fd, int operation); to mark a file as locked and also to check if it is locked.
Apply or remove an advisory lock on the open file specified by fd.
The argument operation is one of the following:
LOCK_SH Place a shared lock. More than one process may hold a
shared lock for a given file at a given time.
LOCK_EX Place an exclusive lock. Only one process may hold an
exclusive lock for a given file at a given time.
LOCK_UN Remove an existing lock held by this process.
flock should work in a threaded app if you open the file separately in each thread:
multiple threads able to get flock at the same time
There's more information about flock and it's potential weaknesses here.
I don't know much in the way of multithreading on Windows, but you have a lot of options if you're on Linux. Here is a FANTASTIC resource. You might also take advantage of any file-locking features offered inherently or explicitly by the OS (ex: fcntl). More on Linux locks here. Creating and manually managing your own mutexes offers you more flexibility than you would otherwise have. user814064's comment about flock() looks like the perfect solution, but it never hurts to have options!
Added a code example:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
FILE *fp;
int counter;
pthread_mutex_t fmutex = PTHREAD_MUTEX_INITIALIZER;
void *foo() {
// pthread_mutex_trylock() checks if the mutex is
// locked without blocking
//int busy = pthread_mutex_trylock(&fmutex);
// this blocks until the lock is released
pthread_mutex_lock(&fmutex);
fprintf(fp, "counter = %d\n", counter);
printf("counter = %d\n", counter);
counter++;
pthread_mutex_unlock(&fmutex);
}
int main() {
counter = 0;
fp = fopen("threads.txt", "w");
pthread_t thread1, thread2;
if (pthread_create(&thread1, NULL, &foo, NULL))
printf("Error creating thread 1");
if (pthread_create(&thread2, NULL, &foo, NULL))
printf("Error creating thread 2");
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
fclose(fp);
return 0;
}
If you need to determine whether another thread opened a file instead of knowing that a file was already opened, you're probably doing it the wrong way.
In a multithreaded application, you want to manage resources used in common in a list accessible by all the threads. That list needs to be managed in a multithread safe manner. This just means you need to lock a mutex, do things with the list, then unlock the mutex. Further, reading/writing to the files by more than one thread can be very complicated. Again, you need locking to do that safely. In most cases, it's much easier to mark the file as "busy" (a.k.a. a thread is using that file) and wait for the file to be "ready" (a.k.a. no thread is using it).
So assuming you have a form of linked list implementation, you can have a search of the list in a way similar to:
my_file *my_file_find(const char *filename)
{
my_file *l, *result = NULL;
pthread_mutex_lock(&fmutex);
l = my_list_of_files;
while(l != NULL)
{
if(strcmp(l->filename, filename) == 0)
{
result = l;
break;
}
l = l->next;
}
pthread_mutex_unlock(&fmutex);
return result;
}
If the function returns NULL, then no other threads had the file open while searching (since the mutex was unlocked, another thread could have opened the file before the function executed the return). If you need to open the file in a safe manner (i.e. only one thread can open file filename) then you need to have a my_file_open() function which locks, searches, adds a new my_file if not found, then return that new added my_file pointer. If the file already exists, then the my_file_open() probably returns NULL meaning that it could not open a file which that one thread can use (i.e. another thread is already using it).
Just remember that you can't unlock the mutex between the search and the add. So you can't use the my_file_find() function above without first getting a lock on your mutex (in which case you probably want to have recursive mutexes).
In other words, you can search the exiting list, grow the existing list, and shrink (a.k.a. close a file) only if you first lock the mutex, do ALL THE WORK, then unlock the mutex.
This is valid for any kind of resources, not just files. It could be memory buffers, a graphical interface widget, a USB port, etc.
Related
While executing a C program,A debug print should be inserted to a log file (say log.txt),
but this C program is executed from various places.
So, multiple logs were found in log.txt
I want that debug to be present only once in the log.txt.
How to insert a check in C program to achieve the above requirement
I have tried with static variable, but it works only if for the function called first time
not the file.
whether access function would help this scenario??
You need some way for the programs to store that they've done a thing, know as a semaphore.
Simplest thing to do is use a shared location like /var/run/ to place a semaphore file. Then each program attempts to create that file. If it already exists, they can assume someone else already performed the action.
To do that we use the open system call (not fopen) for fine control. Using both the O_CREAT and O_EXCL flags says to create the file, and error if it already exists. This is atomic, we're checking if the file exists and creating it in one action. There's no way two programs could accidentally check & create at the same time.
#include <stdio.h>
#include <fcntl.h>
#include <stdbool.h>
bool check_semaphore(const char *file) {
if( open(file, O_CREAT|O_EXCL) >= 0 ) {
return true;
}
else {
return false;
}
}
int main() {
if( check_semaphore("did_debug_log") ) {
FILE *log = fopen("log.txt", "a");
if( !log ) {
perror("Couldn't open log.txt");
}
fprintf(log, "Some debug stuff\n");
}
}
You can use more sophisticated shared storage as a semaphore, like shared memory or a shared database such as SQLite, but the basic technique is the same. Check if the operation has been complete while also declaring you're going to do it in an atomic operation.
I need to read a single directory containing 100K files. Every time when i do readdir this is taking lot of time.
Can someone suggest me the logic of how to read a single directory using multiple threads. Consider this directory is not having any sub-dir, only files.
Below is what i am trying to make it work but this is taking ~5 min per invocation
void dirwalk(char *dir, void (*fcn)(char *))
{
char name[MAX_PATH];
Dirent *dp;
DIR *dfd;
if ((dfd = opendir(dir)) == NULL) {
fprintf(stderr, "dirwalk: can't open %s\n", dir);
return;
}
while ((dp = readdir(dfd)) != NULL) {
if (strcmp(dp->name, ".") == 0
|| strcmp(dp->name, ".."))
continue; /* skip self and parent */
if (strlen(dir)+strlen(dp->name)+2 > sizeof(name))
fprintf(stderr, "dirwalk: name %s %s too long\n",
dir, dp->name);
else {
sprintf(name, "%s/%s", dir, dp->name);
(*fcn)(name);
}
}
closedir(dfd);
}
You can try the following in the order below to see if it improves the performance:
Spawn a different thread using pthread_create() primitive to perform the action in fcn() to remove any possibility of an expensive operation that could come along with the function callback. Based on your need, you could create joinable or detached threads. If this does not help, try 2 below.
Write a modified dirwalk() function as part of a thread routine. Create a bunch of threads (using pthread_create() primitive) that call the same thread routine from outside. The threads would run until they reach the end of the directory stream. Remember the directory stream is always shared, and readdir() is not a reentrant function. So use readdir_r() instead to your advantage. Also use the pthread_mutex to lock the directory stream. Remember to lock and unlock before and after the readdir_r() respectively, so that the rest of the work is done outside the critical section.
Locking would have a bearing on the performance, but it should take care of the concurrency issues and you cant avoid locking. However, I think Linux (I hope you are running Linux) would provide a little more opportunity for the dirwalk() to run with more threads but I am not sure if it would be as substantial as you might expect.
I am studying mutexes and I am stuck in an exercise. For each file in a given directory, I have to create a thread to read it and display its contents (no problem if order is not correct).
So far, the threads are running this function:
void * reader_thread (void * arg)
{
char * file_path = (char*)arg;
FILE * f;
char temp[20];
int value;
f=fopen(file_path, "r");
printf("Opened %s.\n",file_path);
while (fscanf(f, "%s",temp)!=EOF)
if (!get_number (temp, &value)) /*Gets int value from given string (if numeric)*/
printf("Thread %lu -> %s: %d\n", pthread_self(), file_path, value );
fclose(f);
pthread_exit(NULL);
}
Being called by a function that receives a DIR pointer, previously created by opendir().
(I have omitted some error checking here to make it cleaner, but I get no error at all.)
int readfiles (DIR * dir, char * path)
{
struct dirent * temp = NULL;
char * file_path;
pthread_t thList [MAX_THREADS];
int nThreads=0, i;
memset(thList, 0, sizeof(pthread_t)*MAX_THREADS);
file_path=malloc((257+strlen(path))*sizeof(char));
while((temp = readdir (dir))!=NULL && nThreads<MAX_THREADS) /*Reads files from dir*/
{
if (temp->d_name[0] != '.') /*Ignores the ones beggining with '.'*/
{
get_file_path(path, temp->d_name, file_path); /*Computes rute (overwritten every iteration)*/
printf("Got %s.\n", file_path);
pthread_create(&thList[nThreads], NULL, reader_thread, (void * )file_path)
nThreads++;
}
}
printf("readdir: %s\n", strerror (errno )); /*Just in case*/
for (i=0; i<nThreads ; i++)
pthread_join(thList[i], NULL)
if (file_path)
free(file_path);
return 0;
}
My problem here is that, although paths are computed perfectly, the threads don't seem to receive the correct argument. They all read the same file. This is the output I get:
Got test/testB.
Got test/testA.
readdir: Success
Opened test/testA.
Thread 139976911939328 -> test/testA: 3536
Thread 139976911939328 -> test/testA: 37
Thread 139976911939328 -> test/testA: -38
Thread 139976911939328 -> test/testA: -985
Opened test/testA.
Thread 139976903546624 -> test/testA: 3536
Thread 139976903546624 -> test/testA: 37
Thread 139976903546624 -> test/testA: -38
Thread 139976903546624 -> test/testA: -985
If I join the threads before the next one begins, it works OK. So I assume there is a critical section somewhere, but I don't really know how to find it. I have tried mutexing the whole thread function:
void * reader_thread (void * arg)
{
pthread_mutex_lock(&mutex_file);
/*...*/
pthread_mutex_unlock(&mutex_file);
}
And also, mutexing the while loop in the second function. Even both at the same time. But it won't work in any way. By the way, mutex_file is a global variable, which is init'd by pthread_mutex_init() in main().
I would really appreciate a piece of advice with this, as I don't really know what I'm doing wrong. I would also appreciate some good reference or book, as mutexes and System V semaphores are feeling a bit difficult to me.
Thank you very much.
Well, you are passing exactly the same pointer as file path to both threads. As a result, they read file name from the same string and end up reading the same file. Actually, you get a little bit lucky here because in reality you have a race condition — you update the contents of the string pointer by file_path while firing up threads that read from that pointer, so you may end up with a thread reading that memory while it is being changed. What you have to do is allocate an argument for each thread separately (i.e. call malloc and related logic in your while loop), and then free those arguments once thread is exited.
Looks like you're using the same file_path buffer for all threads, just loading it over and over again with the next name. You need to allocate a new string for each thread, and have each thread delete the string after using it.
edit
Since you already have an array of threads, you could just make a parallel array of char[], each holding the filename for the corresponding thread. This would avoid malloc/free.
I'm writting an app and its in the specification that I need to lock
a file everytime I write on it (this file will be read for other apps
that other team is working on):
I made the following function:
int lock_file (int fd)
{
if (fd == -1)
return -1;
struct flock file_locker;
file_locker.l_type = F_WRLCK;
file_locker.l_whence = SEEK_SET;
file_locker.l_start = 0;
file_locker.l_len = 0; //lock the entire file
int locked = fcntl(fd, F_SETLK, &file_locker);
if (locked == -1){
/*handle errors*/
return 0;
}
return 1;
}
I can get the 1 return (means everything is ok) but when I made a test case
I could write in the locked file Oo
the test code was:
char *file = "lock_test_ok";
int fd = open(file, O_RDWR);
int locked = lock_file(fd);
/* call popen and try write 'ERROR' in the file */
/* if the file contains ERROR, than fail */
Locking in Unix is advisory: only programs testing the lock will not write in it. (Some offers mandatory locking, but not that way. It usually involves setting up special properties on the locked file.)
The lock is released when the first process exists and its file descriptors are all closed.
Edit: I think I misunderstood the test scenario -- a popen() call won't be following the locking protocol (which is only advisory, and not enforced by the OS), so the write occurs even if the process that called lock_file() still exists and is holding the lock.
In addition to what Jim said, fcntl locks are advisory. They do not prevent anyone from opening and writing to the file. The only thing they do is prevent other processes from acquiring their own fcntl locks.
If you control all writers to the file, this is fine, because you can just have every writer try to lock the file first. Otherwise you're hosed. Unix does not offer any "mandatory" locks (locks that cause open or write to fail).
I'm looking at some legacy Linux code which uses pthreads.
In one thread a file is read via fgets(). The FILE variable is a global variable shared across all threads. (Hey, I didn't write this...)
In another thread every now and again the FILE is closed and reopened with another filename.
For several seconds after this has happened, the thread fgets() acts as if it is continuing to read the last record it read from the previous file: almost as if there was an error but fgets() was not returning NULL. Then it sorts itself out and starts reading from the new file.
The code looks a bit like this (snipped for brevity so I hope it's still intelligible):
In one thread:
while(gRunState != S_EXIT){
nanosleep(&timer_delay,0);
flag = fgets(buff, sizeof(buff), gFile);
if (flag != NULL){
// do something with buff...
}
}
In the other thread:
fclose(gFile);
gFile = fopen(newFileName,"r");
There's no lock to make sure that the fgets() is not called at the same time as the fclose()/fopen().
Any thoughts as to failure modes which might cause fgets() to fail but not return NULL?
How the described code goes wrong
The stdio library buffers data, allocating memory to store the buffered data. The GNU C library dynamically allocates file structures (some libraries, notably on Solaris, use pointers to statically allocated file structures, but the buffer is still dynamically allocated unless you set the buffering otherwise).
If your thread works with a copy of a pointer to the global file pointer (because you passed the file pointer to the function as an argument), then it is conceivable that the code would continue to access the data structure that was orginally allocated (even though it was freed by the close), and would read data from the buffer that was already present. It would only be when you exit the function, or read beyond the contents of the buffer, that things start going wrong - or the space that was previously allocated to the file structure is reallocated for a new use.
FILE *global_fp;
void somefunc(FILE *fp, ...)
{
...
while (fgets(buffer, sizeof(buffer), fp) != 0)
...
}
void another_function(...)
{
...
/* Pass global file pointer by value */
somefunc(global_fp, ...);
...
}
Proof of Concept Code
Tested on MacOS X 10.5.8 (Leopard) with GCC 4.0.1:
#include <stdio.h>
#include <stdlib.h>
FILE *global_fp;
const char etc_passwd[] = "/etc/passwd";
static void error(const char *fmt, const char *str)
{
fprintf(stderr, fmt, str);
exit(1);
}
static void abuse(FILE *fp, const char *filename)
{
char buffer1[1024];
char buffer2[1024];
if (fgets(buffer1, sizeof(buffer1), fp) == 0)
error("Failed to read buffer1 from %s\n", filename);
printf("buffer1: %s", buffer1);
/* Dangerous!!! */
fclose(global_fp);
if ((global_fp = fopen(etc_passwd, "r")) == 0)
error("Failed to open file %s\n", etc_passwd);
if (fgets(buffer2, sizeof(buffer2), fp) == 0)
error("Failed to read buffer2 from %s\n", filename);
printf("buffer2: %s", buffer2);
}
int main(int argc, char **argv)
{
if (argc != 2)
error("Usage: %s file\n", argv[0]);
if ((global_fp = fopen(argv[1], "r")) == 0)
error("Failed to open file %s\n", argv[1]);
abuse(global_fp, argv[1]);
return(0);
}
When run on its own source code, the output was:
Osiris JL: ./xx xx.c
buffer1: #include <stdio.h>
buffer2: ##
Osiris JL:
So, empirical proof that on some systems, the scenario I outlined can occur.
How to fix the code
The fix to the code is discussed well in other answers. If you avoid the problem I illustrated (for example, by avoiding global file pointers), that is simplest. Assuming that is not possible, it may be sufficient to compile with the appropriate flags (on many Unix-like systems, the compiler flag '-D_REENTRANT' does the job), and you will end up using thread-safe versions of the basic standard I/O functions. Failing that, you may need to put explicit thread-safe management policies around the access to the file pointers; a mutex or something similar (and modify the code to ensure that the threads use the mutex before using the corresponding file pointer).
A FILE * is just a pointer to the various resources. If the fclose does not zero out those resource, it's possible that the values may make enough sense that fgets does not immediately notice it.
That said, until you add some locking, I would consider this code completely broken.
Umm, you really need to control access to the FILE stream with a mutex, at the minimum. You aren't looking at some clever implementation of lock free methods, you are looking at really bad (and dusty) code.
Using thread local FILE streams is the obvious and most elegant fix, just use locks appropriately to ensure no two threads operate on the same offset of the same file at once. Or, more simply, ensure that threads block (or do other work) while waiting for the file lock to clear. POSIX advisory locks would be best for this, or your dealing with dynamically growing a tree of mutexes... or initializing a file lock mutex per thread and making each thread check the other's lock (yuck!) (since files can be re-named).
I think you are staring down the barrel of some major fixes .. unfortunately (from what you have indicated) there is no choice but to make them. In this case, its actually easier to debug a threaded program written in this manner than it would be to debug something using forks, consider yourself lucky :)
You can also put some condition-wait (pthread_cond_wait) instead of just some nanosleep which will get signaled when intended e.g. when a new file gets fopened.