I'd like to write a small C program that reads from a file while it is actively being written to. Any ideas?
If you have control over the writing process you should use mmap() with MAP_SHARED in both reader and writer. This way the reader will see the changes done by the writer practically immediately.
Also, note that Linux does not make any snapshot of the data in the file when you open the file, so you should see the changes that are being made in the file even if you just use read() and lseek().
In order to determine whether a file was modified/opened/accessed/etc in Linux you can use inotify API (see inotify manpage). This allows you to make your process wait for an event you're interested in until it occurs (as opposed to polling it regularly). You can also use epoll() or more traditional select() to accomplish similar result.
I think that tail -f is exactly what you want, isn't it? Take a look at the source code:
http://www.gnu.org/s/coreutils/
Or this one (not sure if updated): http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c
Related
Background: I'm working with VxWorks for the first time in my life, and I'm having to translate some inherited code so that it will work in VxWorks.
Problem & Solution: The codebase calls sync(). There's no definition for sync() in VxWorks headers (at least, not the ones I have). I have fsync(), which requires a file descriptor to work. The function which calls sync() is writing to the file with a ofstream object before calling sync... and there's no obvious way to recover a file descriptor from an ofstream object.
After an embarrassingly long time rooting around for options, I discover the ofstream::flush() function, which ought to work for what I'm trying to do.
However, in my rooting it was pointed out to me that ofstream::flush() and sync/fsync are associated with different things (IIUC, the C library and the operating system, respectively). That suggests that ofstream::flush() won't achieve quite the same thing that sync() or fsync() will.
Can someone lay out the difference for me? My understanding is still far too fuzzy to be relied upon in future.
Followup: The original code appends a std::endl to the stream and calls sync. My understanding is that appending endl will have the effect of flush(). If the original code has both, that suggests to me either that
the code is redundant
flush() and sync()/fsync() are both required because they do similar but not identical things.
which is it?
If you know the device name being used for the file system, it would be easy to develop a stub fonction that would perform the sync() operation on that device:
fd = open(...);
fsync(fd);
close(fd);
Ideally the code could parse all open fd to the file system and do an ioctl() operation, by passing the command FIOFLUSH or FIOSYNC.
For example:
status = ioctl(fd, FIOSYNC, 0); /* see definition in ioLib.h */
You can have a look at iosFdShow() code for going through all file descriptor.
flush() flushes the application level buffer. That is, it ensures all data is written using write.
sync() is a system call that makes the operating system synchronize its file cache with the underlying storage. In other words, you call this if you want to make sure the file is written to disk and not just on some page in memory.
I want to implement a C program in Linux (Ubuntu distro) that mimics tail -f. Note that I do not want to actually call tail -f from my C code, rather implement its behaviour. At the moment I can think of two ways to implement it.
When the program is called, I seek to the end of file. Afterwards, I would read to the end of file periodically and print whatever I read if it is not empty.
The second method which can potentially be more efficient is to again, seek to the end of file. But, this time I "somehow" listen for changes to that file and read to the end of file, only if I it is changed.
With that being said, my question is how to implement the second approach and if someone can share if it is worth the effort. Also, are these the only two options?
NOTE: Thanks for the comments, the question is changed based on them.
There is no standardized mechanism for monitoring changes to a file, so you'll need to implement a "polling" solution anyway (that is, when you hit the end of file, wait a short amount of time and try again.)
On Linux, you can use the inotify family of system calls, but be aware that it won't always work. It doesn't work for special files or remote filesystems, for example, and it may not work for some local filesystems. It is complicated in the case of symlinks. And so on. There is a Windows equivalent, but I believe it suffers from some of the same issues.
So even if you use a notification system, you'll need the polling solution as a backup, and since OS notifications are not guaranteed to be reliable (that is, if the system is under load, notifications might be dropped), you'll need to poll on timeout even if you are using a notification system.
You might want to take a look at the implementation of the GNU tail utility (http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c) to see how the special cases are handled.
You can implement the requirement by following steps:
1) fopen with 'a+' mode;
2) select the file discriptor opened (need do convert from FILE * to file descriptor) and do the read.
I have several C-programs, which are accessing (read: fprintf/ write fopen) at the same time different files on the file system. What is the best way to do this concurrent access save? should I write some sort of file locks (and whats the best way to do this?) or are there any better reading methods (preferably in the C99 standard lib, additional dependencies would be a problem)? or should I use something like SQLite?
edit:
I am using Linux as operating system.
edit:
I don't really want to write with different processes in same files, I'm dealing with a legacy monolith code, which saves intermediate steps in files for recycling. I want a way to speed the calculations up by running several calculations at the same time, which have the same intermediate results.
You could use fcntl() with F_SETLK or F_SETLKW:
struct flock lock;
...
fcntl( fd, F_SETLKW, &lock );
See more from man page fcntl(3) or this article.
You can make sure that your files do not get corrupted on concurrent writes from multiple threads/processes by using copy-on-the-write technique:
A writer opens the file it would like to update for reading.
The writer creates a new file with a unique name (mkostemps) and copies the original file into the copy.
The writer modifies the copy.
The writer renames the copy to the original name using rename. This happens atomically, so that users of the file either see the old version of it or the new, but never a partially updated file.
See Things UNIX can do atomically for more details.
I have a program that will always be running when the computer is. It interfaces with serial over USB device. At times the device may not be present when the computer is on.
My question is a good method to acknowledge when the device file becomes present. I could make a infinite loop that continuously checks for the file assuming I know what its name will be then break when it gets an fd. But is there a better way than this?
Additionally, assuming the device gets unplugged while the program is running my fd now becomes invalid. Is some event or error thrown when this happens so I can again begin checking until the device file is present again?
I read from the fd using a select loop.
On Linux your application can be notified when a device is plugged in and or removed by using udev.
For a great example take a look at this notifier: udev-notify
The libudev API will allow you to listen to kernel events and be notified when a device is available or when it gets removed and then you can decide to do when such events happen.
Historically, the solution of polling (a loop that determines if said "file" exists at some interval) would be the "best" one. If you're using modern Linux (2.6.12+ but you didn't specify), you can use inotify which probably does exactly what you want -- monitor a directory for the creation of a file (IN_CREATE). The Wikipedia article to which I linked gives a very good overview and helpful links.
The part where it becomes unplugged is tricky. When a file is deleted in *nix (dating back to forever-ago), its file descriptors are still valid. Therefore I would guess that extern int errno; may be your solution; your post was tagged with "C" so there is nothing to "throw" but if you check this when your read (or whatever) call fails, you can gain insight.
Also check out libudev, which is of course more specific to devices rather than the general "everything is a file" philosophy of *nix.
I want to write a c program that keeps reading a file and whenever a new line is added to that file, its sent across network (lets assume via tcp) to recipient.
What is the best way to do it?
o keep the file open and do something like tail -F on it to keep reading?
o read file on my own?
I am not worried about sending on network part, I need to have best way of getting a new line out of the file. On that line I might do some filtering too before sending.
Linux has inotify, OS X has kqueue/kevent, Windows has similar stuff. All of these let your process block until the kernel notifies you of a change to the file. This is very efficient because your process can sleep until a change actually happens, but obviously these interfaces are not portable.
The only portable approach is to poll the file periodically to see if it changes, but obviously that is not as efficient.