After doing tons of research and nor being able to find a solution to my problem i decided to post here on stackoverflow.
Well my problem is kind of unusual so I guess that's why I wasn't able to find any answer:
I have a program that is recording stuff to a file. Then I have another one that is responsible for transferring that file. Finally I have a third one that gets the file and processes it.
My problem is:
The file transfer program needs to send the file while it's still being recorded. The problem is that when the file transfer program reaches end of file on the file doesn't mean that the file actually is complete as it is still being recorded.
It would be nice to have something to check if the recorder has that file still open or if it already closed it to be able to judge if the end of file actually is a real end of file or if there simply aren't further data to be read yet.
Hope you can help me out with this one. Maybe you have another idea on how to solve this problem.
Thank you in advance.
GeKod
Simply put - you can't without using filesystem notification mechanisms, windows, linux and osx all have flavors of this. I forget how Windows does it off the top of my head, but linux has 'inotify' and osx has 'knotify'.
The easy way to handle this is, record to a tmp file, when the recording is done then move the file into the 'ready-to-transfer-the-file' directory, if you do this so that the files are on the same filesystem when you do the move it will be atomic and instant ensuring that any time your transfer utility 'sees' a new file, it'll be wholly formed and ready to go.
Or, just have your tmp files have no extension, then when it's done rename the file to an extension that the transfer agent is polling for.
Have you considered using stream interface between the recorder program and the one that grabs the recorded data/file? If you have access to a stream interface (say an OS/stack service) which also provides a reliable end of stream signal/primitive you could consider that to replace the file interface.
There is no functions/libraries available in C to do this. But a simple alternative is to rename the file once an activity is over. For example, recorder can open the file with name - file.record and once done with recording, it can rename the file.record to file.transfer and the transfer program should look for file.transfer to transfer and once the transfer is done, it can rename the file to file.read and the reader can read that and finally rename it to file.done!
you can check if file is open or not as following
FILE_NAME="filename"
FILE_OPEN=`lsof | grep $FILE_NAME`
// if [ -z $FILE_NAME ] ;then
// "File NOT open"
// else
// "File Open"
refer http://linux.about.com/library/cmd/blcmdl8_lsof.htm
I think an advisory lock will help. Since if one using the file which another program is working on it, the one will get blocked or get an error. but if you access it in force,the action is Okey, but the result is unpredictable, In order to maintain the consistency, all of the processes who want to access the file should obey the advisory lock rule. I think that will work.
When the file is closed then the lock is freed too.Other processes can try to hold the file.
Related
I have a set of configuration files (10 or more), and if user opens any of these file using any editor (e.g vim,vi,geany,qt,leafpad..). How would I come to know that which file is opened and if some writing process is done, then it is saved or not (using C code).
For the 1st part of your question, please refer e.g. to How to check if a file has been opened by another application in C++?
One way described there is to use a system tool like lsof and call this via a system() call.
For the 2nd part, about knowing whether a file has been modified, you will have to create a backup file to check against. Most editors already do that, but their naming scheme is different, so you might want to take care of that yourself. How to do that? Just automatically create a (hidden) file .mylogfile.txt if it does not exist by simply copying mylogfile.txt. If .mylogfile.txt exists, is having an older timestamp than mylogfile.txt, and differs in size and/or hash-value (using e.g. md5sum) your file was modified.
But before re-implementing this, take a look at How do I make my program watch for file modification in C++?
Here is the setup: I have a shared file (lets call it status.csv) that is read by many processes (lets call them consumers) in a read-only fashion. I have one producer that periodically updates status.csv by creating a temp file, writing data to it and using the C function discussed here:
http://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html
to rename the temp file (effectively overwrite) to status.csv so that the consumers can process the new data. It want to try and guarantee (as much as possible in the Linux world) that the consumers won't get a malformed/corrupted/half-old/half-new status.csv file (I want them to get either all of the old data or all of the new). I can't seem to guarantee this by reading the description of rename: it seems to guarantee that the rename action itself is atomic but I want to know if a consumer already has the status.csv file open, he will continue to read the same file as it was when it was opened, even if the file is renamed/overwritten by the producer in the middle of this reading operation.
I attempted to prototype this thinking that the consumers will get some type of error or a half old/half new file but it seems to always be in the state it was when it was open by the consumer even if renamed/overwritten multiple times.
BTW, these processes are running on the same machine (RHEL 6).
Thanks!
In Linux and similar systems, if a process has a file open and the file is deleted, the file itself remains undeleted until all processes close it. All that happens immediately is that the directory entry is deleted so that it cannot be opened again.
The same thing happens if rename is used to replace an open file. The old file descriptor still keeps the old file open. However, new opens will see the new file.
Therefore, for your consumers to see the new file, they must close and reopen the file.
Note: your consumers can discover if the file has been replaced by using the stat(2) call. If either the st_dev or st_ino entries (or both) have changed, then the file has been replaced and must be closed and reopened. This is how tail -F works.
I am designing a logger plugin for my tool.I have a busybox syslog on a target board, and i want to get syslog data from it so i can forward to my host(not via remote port forwarding of syslog) via my own communication framework.Initially i had made use of syslog's ability to forward messages it receives to a named pipe but this only works via a patch addition which is not feasible in my case.So now my idea is to write a configuration file in syslog to forward all log messages it receives to a file and track the file to get my data.I can use tail function to monitor my file changes but my busybox tail does not support "--follow" option since syslog performs logrotate which causes "tail -f" to fail.And also i am not sure if this is a good method to do it.So what i wanted to ask is there another way in which i can get modified data from a file.I can use inotify, but that can only be used to track file changes.So is there a way to do this?
You could try the "diff" utility (or git-diff, which has more facilities).
You may write a script/program which can receive an inotify event. And the script reopens the file and starts to read till EOF, from the previously saved last read file position.
I'm intending to create a programme that can permanently follow a large dynamic set of log files to copy their entries over to a database for easier near-realtime statistics. The log files are written by diverse daemons and applications, but the format of them is known so they can be parsed. Some of the daemons write logs into one file per day, like Apache's cronolog that creates files like access.20100928. Those files appear with each new day and may disappear when they're gzipped away the next day.
The target platform is an Ubuntu Server, 64 bit.
What would be the best approach to efficiently reading those log files?
I could think of scripting languages like PHP that either open the files theirselves and read new data or use system tools like tail -f to follow the logs, or other runtimes like Mono. Bash shell scripts probably aren't so well suited for parsing the log lines and inserting them to a database server (MySQL), not to mention an easy configuration of my app.
If my programme will read the log files, I'd think it should stat() the file once in a second or so to get its size and open the file when it's grown. After reading the file (which should hopefully only return complete lines) it could call tell() to get the current position and next time directly seek() to the saved position to continue reading. (These are C function names, but actually I wouldn't want to do that in C. And Mono/.NET or PHP offer similar functions as well.)
Is that constant stat()ing of the files and subsequent opening and closing a problem? How would tail -f do that? Can I keep the files open and be notified about new data with something like select()? Or does it always return at the end of the file?
In case I'm blocked in some kind of select() or external tail, I'd need to interrupt that every 1, 2 minutes to scan for new or deleted files that shall (no longer) be followed. Resuming with tail -f then is probably not very reliable. That should work better with my own saved file positions.
Could I use some kind of inotify (file system notification) for that?
If you want to know how tail -f works, why not look at the source? In a nutshell, you don't need to periodically interrupt or constantly stat() to scan for changes to files or directories. That's what inotify does.
How can I tell if a file is open in C? I think the more technical question would be how can I retrieve the number of references to a existing file and determine with that info if it is safe to open.
The idea I am implementing is a file queue. You dump some files, my code processes the files. I don't want to start processing until the producer closes the file descriptor.
Everything is being done in linux.
Thanks,
Chenz
Digging out that info is a lot of work(you'd have to search thorugh /proc/*/fd
You'd be better off with any of:
Save to temp then rename. Either write your files to a temporary filename or directory, when you're done writinh, rename it into the directory where your app reads them. Renaming is atomic, so when the file is present you know it's safe to read.
Maybe a variant of the above , when you're done writing the file foo you create an empty file named foo.finished. You look for the presence of *.finished when processing files.
Lock the files while writing, that way reading the file will just block until the writer unlocks it. See the flock/lockf functions, they're advisory locks though so both the reader and writer have to lock , and honor the locks.
I don't think there is any way to do this in pure C (it wouldn't be cross platform).
If you know what files you are using ahead of time, you can use inotify to be notified when they open.
Use the lsof command. (List Open Files).
C has facilities for handling files, but not much for getting information on them. In portable C, about the only thing you can do is try to open the file in the desired way and see if it works.
generally you can't do that for variuos reasons (e.g. you cannot say if the file is opened with another user).
If you can control the processes that open the file and you are try to avoid collisions by locking the file (there are many libraries on linux in order do that)
If you are in control of both producer and consumer, you could use lockf() of flock() to lock the file.
there is lsof command on most distros, which shows all currently open files, you can ofcourse grep its output if your files are in the same directory or have some recognizable name pattern.