A way to make a file contents snapshot in Linux - c

What is the best way to create an "atomic" snapshot of file contents in Linux? Emphasis is not on performance, but on getting contents as a whole.
I may think of using sendfile(2) (since 2.6.33) or splice(2), but neither have any indication of operation atomicity. Both are run in the kernel-space entirely, but at least sendfile(2) implies it's using mmap(2) and mmap gives no guarantees that writes to the same mmaped (as MAP_SHARED) region in other processes won't be visible even with MAP_PRIVATE (probably they will, because that are the same pages).
Taking that this functions are writing with performance in mind and sendfile(2) is optimized to be used with DMA, I may only assume that they just copy memory in some background kernel thread and it's quite possible that other operations may also affect the data being copied.
So the only possible solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?

So the only possible [filesystem-independent] solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?
Almost; there is also fanotify. Plus, as mentioned in a comment, there are some filesystem-specific options, and some possibilities only available in certain configurations.
The lease break timer is configurable, /proc/sys/fs/lease_break_time in seconds, and the default is 45 seconds.
"Just give up and try later" is also a bit defeatist; you do have ways to monitor when the snapshot might work. Consider placing an inotify IN_CLOSE_WRITE and IN_CLOSE_NOWRITE watch on the file, and try the snapshot whenever you receive such an event.
fanotify:
For a few years now, I've been monitoring the progress of Linux fanotify, in the hopes that it would grow enough features that it could be used for automagic file versioning. Essentially, whenever someone opens the file with write permissions, the current file would be snapshot to temporary storage, marked with some metadata (timestamp, real human user (backtracked through sudo/su), and so on). When that descriptor is closed, another snapshot is taken, and a helper thread/process diffs the two, annotating the changes (or even pushing it to git).
It is limited to local filesystems, but with 2.6.37 and later kernels (including 3.x), the interface is sufficient for specific files, or an entire mount. In your case, the fanotify interface allows similar features to file leases, except for local filesystems only, but you can simply deny any accesses during the snapshot. (One can argue whether that is a good idea at all, especially if the file to be snapshotted is a system or configuration file; many programmers overlook error checking, because "some files just have to be always accessible, or your system is broken".)
As far as my change monitoring goes, fanotify should now have all sufficient features, but only if an entire mount is monitored. I was hoping to monitor configuration files on multi-admin clusters, but those files reside on the same mount as all system libraries and binaries do, so the monitoring causes considerable overhead. So much so, that it seems more appropriate to just modify SSH configuration, console configuration (getty etc.), sudo configuration, and possibly su, to always include a dynamic library that interposes file access syscalls, and basically does the versioning on behalf of the user. This way service binaries are not affected, only user actions are monitored.

This might work under some circumstances:
(Optional) Do something to prevent new processes to open the file:
a/ rename the file
b/ restrict file permissions
Find all existing file readers/writers via lsof and kill -STOP them
Do your snapshot
kill -CONT all readers/writers
(Optional) Restore action 1.

Related

mkstemp and hard disk stress

Are temporary files created with mkstemp synced to disk?
Here is what I have:
Program creates temporary file using mkstemp and sends fd to another program.
This temporary file is mmap-ped by both programs and used heavily (up to 400 MB/sec of writes and 400 MB/sec of reads; up to 60 reads and writes per second).
I can't use memfd_create (may not be supported on target devices).
Lets also assume (and this is almost true) that I can't create this file on tmpfs (like in /tmp).
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
Additional info (not important):
I am writing wayland compositor for Android devices. Currently temporary files (wayland surfaces actually) are created on tmpfs. And everything works fine as long as SELinux is not enabled. But if I enable SELinux, it prevents fd's from being transferred from client to compositor. Only solution I currently know is to create temporary files in app's home dir. But if such way is dangerous, I will find another.
Are temporary files created with mkstemp synced to disk?
The mkstemp function does not impart any special properties to files it opens that would prevent them from being synced to disk. The filesystem on which they are created might have such a property, but that's independent of file creation. In particular, files created via mkstemp() will persist indefinitely if not removed.
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
As far as I am aware, even tmpfs filesystems do not guarantee that their contents will remain locked in memory, as opposed to being paged out. They are backed by virtual memory. But if the actual file is comparatively small and all its pages are hot, then they are likely to remain in memory only.
With regard to the larger problem,
everything works fine as long as SELinux is not enabled. But if I
enable SELinux, it prevents fd's from being transferred from client to
compositor. Only solution I currently know is to create temporary
files in app's home dir.
By default, newly-created files inherit the SELinux type of their parent directory. Your Wayland clients presumably do not have sufficient privilege to modify the SELinux labels of the files they create, but you should be able to administratively create a directory wherever you like with a label conducive to your needs. For example, you could cause a subdirectory of /dev/shm to be created for the purpose (at every boot), and chconned to have an appropriate label. If the clients create their temp files there then they should inherit the SELinux type you choose.

FileSystemWatcher handling moving file - another solution

Hi
I was trying to use FileSystemWatcher to detect if some files or directories has been moved to another location. The problem was, i had to use onCreated and onDeleted events to handle this, but there are many issues using this solution
how could i detect change if i will select more than one file and press Ctrl+C, Ctrl+V, or right-click and select Copy and then Paste in the same directory?
how could i detect, if i will select more than one directory?
the last one, what if i simulate moving file? I could delete file and create with same name in different place.
I know i could use, Timers, process locking detection, verification which process uses file (if explorer.exe then it could be moving file), but this solution is not perfect and it's very ineffective. I was whinking about this how to solve this issue, and i have decided to implement this in low-level language. Is this possible to do this using C, or assembler? I know that every thing is possible to do using assembler, so is it possible to implement this in asm? I would like to create my own FileSystemWatcher using assembler or C but where should i looking for info how to do this?
File movement within the same filesystem can be detected easily using a filesystem filter driver, as the filesystem received the corresponding request from the OS. Other scenarios such as moving to the other disk or moving by copy/delete sequence are hardly traceable even with the filter driver because you would need to match between the file which have been created/written to and the file which is being deleted (possibly on the other disk).
If you plan to write some security mechanism (like a DRM), then I need to remind that the data can be altered during copying (eg. encrypted or compressed), which makes your task even harder.
Still you can look at filesystem filter drivers - should you decide to go on with detection of filesystem events, such driver is a much more reliable and powerful mechanism than FileSystemWatcher.

Exclusive\Synchronized access to a file (Queuing)

We have a shared folder which contains some files that needs to be processed.
We also have 3 UNIX servers which runs a shell script which take and process one file each time. At the end of the script the file it's moved away.
The 3 UNIX server doesn't communicate each other, and they are not aware of each other.
In your opinion what is the best way to guarantee that each file will be processed one time, without raising concurrent access issues\error ?
So or so you need some type of a file locking mechanism. Some of the possibilities:
You can create a temporary lock file for every files on work. For example, for file name.ext you will need to create a name.ext.lock, just before you start its processing. If this file already exists - also, the creation fails with a "file exists", it means somebody is already working on it, thus you shouldn't do anything with it.
Second, you could use advisory locks. Advisory locking doesn't already work on every type of file sharing, and they have only libc-level interface, so you can't use them from shell scripting. I suggest to dig into the manual of the flock libc api call.
Third, it were the hardest and it is deeply unix-specific. It were the mandatory lock. Mandatory locking means that the locks are effective even against the processes, who don't know anything from them. You can read more about them here: Mandatory file lock on linux
In your place I did the first, if I can modify the workings of the processing (for example, if I can hook them with a script or even I am developing the processing script). If not, you need probably the third, although it doesn't always work.

Are intermediate files bad practice?

I was recently downvoted (which only bugged me a little :) ) for an answer I gave to this question. The person offered no explanation for the down vote which started me thinking: "Why would you avoid producing intermediate files?" Especially in a language like Python where File IO is laughably easy.
There seemed to be consensus that it was a bad idea, but I know for a fact that intermediate files are used regularly in practice. I worked for a very well respected research firm (let's just say S.O. wouldn't exist without this firm) where it was assumed that your programs would produce files as output. We did this because if your program indeed deserved to be a standalone program then it would need debuggable output and some way of passing its output between processes that could later be examined in case we discovered an error in our output further downstream.
Is it considered bad practice (in cases like the question linked above) to use intermediate files? Why?
One problem with intermediate files happens when multi-threading.
If Clients C1 and C2 are handled simultaneously by server process S (which may or not have forked into seperate processes, used threads, or whatever concurrency system..), you may get weird issues when both try to create the same intermediate file.
I believe one of Unix philosophies is that all programs should act as filters, however this doesn't necessarily mean creating files on the disk, and using intermediate files leads to unwieldly behaviour in my opinion. One should also treat the disk as a last resort and only use it for storing/retrieving data that should be available after powering off the computer, and maybe even take care to allow programs to run on read-only media.
Well, there are some issues when you use files, especially there may be many unexpected failures while accessing or creating the files. The following listed are all the issues that I personally have experienced.
1) The file location is on the remote machine and the network is down. (NFS mounted).
2) There is not enough free space while creating the file.
3) In between the process the user press Ctrl-C to cancel the process the file is not deleted.
4) The file is mounted on the NFS and the network is slow.
5) The folder in which file was created was a soft link and the original link was deleted.
But still we have to use file because there are hardly any options while working in bash. But in C,C++ i think disk access should be considered as the last resort. Program producing files as output is ok, if that is the only way to communicate with the user. But atleast for intermediate savings use of disk files should be minimized.
If you create temporary files properly (with setting platform-specific 'temporary' flag meaning do not flush cache to disk when no urgent need) they are perfectly good if task requires them.
There are almost no things in IT that you can't use while having a good reason to. :-)

Daemon logging in Linux

So I have a daemon running on a Linux system, and I want to have a record of its activities: a log. The question is, what is the "best" way to accomplish this?
My first idea is to simply open a file and write to it.
FILE* log = fopen("logfile.log", "w");
/* daemon works...needs to write to log */
fprintf(log, "foo%s\n", (char*)bar);
/* ...all done, close the file */
fclose(log);
Is there anything inherently wrong with logging this way? Is there a better way, such as some framework built into Linux?
Unix has had for a long while a special logging framework called syslog. Type in your shell
man 3 syslog
and you'll get the help for the C interface to it.
Some examples
#include <stdio.h>
#include <unistd.h>
#include <syslog.h>
int main(void) {
openlog("slog", LOG_PID|LOG_CONS, LOG_USER);
syslog(LOG_INFO, "A different kind of Hello world ... ");
closelog();
return 0;
}
This is probably going to be a was horse race, but yes the syslog facility which exists in most if not all Un*x derivatives is the preferred way to go. There is nothing wrong with logging to a file, but it does leave on your shoulders an number of tasks:
is there a file system at your logging location to save the file
what about buffering (for performance) vs flushing (to get logs written before a system crash)
if your daemon runs for a long time, what do you do about the ever growing log file.
Syslog takes care of all this, and more, for you. The API is similar the printf clan so you should have no problems adapting your code.
One other advantage of syslog in larger (or more security-conscious) installations: The syslog daemon can be configured to send the logs to another server for recording there instead of (or in addition to) the local filesystem.
It's much more convenient to have all the logs for your server farm in one place rather than having to read them separately on each machine, especially when you're trying to correlate events on one server with those on another. And when one gets cracked, you can't trust its logs any more... but if the log server stayed secure, you know nothing will have been deleted from its logs, so any record of the intrusion will be intact.
I spit a lot of daemon messages out to daemon.info and daemon.debug when I am unit testing. A line in your syslog.conf can stick those messages in whatever file you want.
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/040/4036/4036s1.html has a better explanation of the C API than the man page, imo.
Syslog is a good option, but you may wish to consider looking at log4c. The log4[something] frameworks work well in their Java and Perl implementations, and allow you to - from a configuration file - choose to log to either syslog, console, flat files, or user-defined log writers. You can define specific log contexts for each of your modules, and have each context log at a different level as defined by your configuration. (trace, debug, info, warn, error, critical), and have your daemon re-read that configuration file on the fly by trapping a signal, allowing you to manipulate log levels on a running server.
As stated above you should look into syslog. But if you want to write your own logging code I'd advise you to use the "a" (write append) mode of fopen.
A few drawbacks of writing your own logging code are: Log rotation handling, Locking (if you have multiple threads), Synchronization (do you want to wait for the logs being written to disk ?). One of the drawbacks of syslog is that the application doesn't know if the logs have been written to disk (they might have been lost).
If you use threading and you use logging as a debugging tool, you will want to look for a logging library that uses some sort of thread-safe, but unlocked ring buffers. One buffer per thread, with a global lock only when strictly needed.
This avoids logging causing serious slowdowns in your software and it avoids creating heisenbugs which change when you add debug logging.
If it has a high-speed compressed binary log format that doesn't waste time with format operations during logging and some nice log parsing and display tools, that is a bonus.
I'd provide a reference to some good code for this but I don't have one myself. I just want one. :)
Our embedded system doesn't have syslog so the daemons I write do debugging to a file using the "a" open mode similar to how you've described it. I have a function that opens a log file, spits out the message and then closes the file (I only do this when something unexpected happens). However, I also had to write code to handle log rotation as other commenters have mentioned which consists of 'tail -c 65536 logfile > logfiletmp && mv logfiletmp logfile'. It's pretty rough and maybe should be called "log frontal truncations" but it stops our small RAM disk based filesystem from filling up with log file.
There are a lot of potential issues: for example, if the disk is full, do you want your daemon to fail? Also, you will be overwriting your file every time. Often a circular file is used so that you have space allocated on the machine for your file, but you can keep enough history to be useful without taking up too much space.
There are tools like log4c that you can help you. If your code is c++, then you might consider log4cxx in the Apache project (apt-get install liblog4cxx9-dev on ubuntu/debian), but it looks like you are using C.
So far nobody mentioned boost log library which has nice and easy way to redirect your
log messages to files or syslog sink or even Windows event log.

Resources