I'd like to create shared library in C for linux, some abstract implementation of database management. Shared library whould be responsible for read file containing database and write differences into it. But I have no idea how to handle multiprocessing problems of file handling for this case eg.: App1 try to write differences into database file and App2 has currently opened file with database to read it. In the case of this example I'd like to inform app1 that file is currently open and delay write sequence until App2 will finish database file read.
I was thinking of using some mutual exclusion mechanisms or by using global enum variable to manage current file status, but after I read some of posts I understood that every application that uses shared library create it's own copy in memory and they don't share any memory section during work.
Shared library whould be responsible for read file containing database and write differences into it.
That is possible but quit complicated solution.
While you would need to make sure that multiple processes do not interfere with each other. It's possible to do so with file logs (see man flock), and record locking in man fcntl but you must insure that multiple processes update disjoint file chunks "in place" (without resizing the file).
This is also prone to deadlocks -- if one of the processes takes a lock on a region and then goes into infinite loop, other processes may get stuck as well.
A much simpler solution involves client-server, where the server implements all writes and clients send it read and modify requests.
P.S. There are existing libraries implementing either approach. You will likely save yourself several months of development time using existing solutions.
I have a set of independent programs that I wrote in C. I would like all of them to write their log to the same file. Obviously comes the issue of control access. Two or more of them could end up writing simultaneously.
What is the most pragmatic way to achieve this?
I came across solutions using pthread/mutexes/etc but that sounds overkill implementation for something like that.
I am also looking at syslog but wonder if this is really for the purpose of what I need to do?
I feel that I need a daemon service taking the message and control when they are written. I wonder if that already exists.
I am also looking at syslog but wonder if this is really for the purpose of what I need to do?
Yes
I feel that I need a daemon service taking the message and control when they are written. I wonder if that already exists.
It exists in the Unix derivatives (including Linux) and is called... syslogd
More seriously, the syslog function is intended to pass a message to a syslogd daemon that will route it according to its configuration file. Most common uses include writing it down to a file or to the system console (specially for panic level messages when nobody can be sure whether the file system is still accessible). The syslog system may come with more features than what you are asking for, but it is an extremely robust and extensively tested piece of software. In addition, it is certainly already active on your system, so you should have a strong reason to roll your own instead of using it.
You have two way :
First : Using something that already exist.
For the logging part, syslog (and syslog-ng) are well-know and well-used.
From that point, you can parametre syslog-ng to listen to an ip connection, and scan a dir for new file.
Your program can, when they will want to log, either connect to syslogng directly and send the log, and if the connection fail, write a new file in the directory that syslogng watch.
That allow to not have the loss of the log is syslog-ng are interrupted for a reason or another.
Second : Develop something really similar to syslog-ng.
In that case, it's up to you.
We have a shared folder which contains some files that needs to be processed.
We also have 3 UNIX servers which runs a shell script which take and process one file each time. At the end of the script the file it's moved away.
The 3 UNIX server doesn't communicate each other, and they are not aware of each other.
In your opinion what is the best way to guarantee that each file will be processed one time, without raising concurrent access issues\error ?
So or so you need some type of a file locking mechanism. Some of the possibilities:
You can create a temporary lock file for every files on work. For example, for file name.ext you will need to create a name.ext.lock, just before you start its processing. If this file already exists - also, the creation fails with a "file exists", it means somebody is already working on it, thus you shouldn't do anything with it.
Second, you could use advisory locks. Advisory locking doesn't already work on every type of file sharing, and they have only libc-level interface, so you can't use them from shell scripting. I suggest to dig into the manual of the flock libc api call.
Third, it were the hardest and it is deeply unix-specific. It were the mandatory lock. Mandatory locking means that the locks are effective even against the processes, who don't know anything from them. You can read more about them here: Mandatory file lock on linux
In your place I did the first, if I can modify the workings of the processing (for example, if I can hook them with a script or even I am developing the processing script). If not, you need probably the third, although it doesn't always work.
I am currently working on a file processing service that looks at a fileshare, where files are uploaded to via FTP.
For scalability I've been asked to make this service to be able to be load balanced, so the service has to expect that other services on different machines may also be trying to process these files.
OK, so I thought I should be able achieve this by obtaining an exclusive lock for my process before processing a file, and skipping any files that may already be locked by another process.
The crux of this approach is shown below (I've left out the error handling for simplicity):
using(FileStream fs = File.Open(myFile, FileMode.Open, FileAccess.ReadWrite, (FileShare.Read | FileShare.Delete))
{
//Do work
}
Q1: My process now has a lock on this file. I thought this would mean I could then access the same file (without using the stream) and still have the correct access to it, but based on testing it seems I only have the benefits of the lock through the stream. Is this correct?
(For example, before I included FileShare.Delete, File.Delete(myFile) failed)
The above lock ultimately uses the 'Write' permission to determine which service has the file, but is intended to allow other processes to still Read the file. This is because the process that has the lock attempts to verify if the file is a valid zip file , which uses a third party library (Xceed.Zip). However this fails saying the file "is being used by another process". Using reflector I ultimately found the problematic call is:
stream = this.m_info.Open(FileMode.Open, FileAccess.Read, FileShare.Read);
Now I would have expected this to work as it only wants to read the file, but it fails. The reason appears to be outlined in a similar question. However, as this is a 3rd party API I can't change their code to use ReadWrite.
Q2: Is there a way I can correctly lock the file so it will not be picked up by the other services, but it can still be verified as a zip file using the external API?
I feel like there should be a 'correct' way to do this, but at the moment the best I can come up with is to lock the file, move it away from the shared directory, and then verify it at the new location.
If you're planning to reactively handle this situation by handling UnauthorizedAccessException I think you're making a serious mistake.
This can be handled by proactively renaming files. For example you can configure your service to only read files whose name is in the format 'Filename.YYYYMMDD.txt'. Prior to processing the file, you can rename it to 'Filename.YYYYMMDD.processing'. Then after processing the file you rename it to 'Filename.YYYYMMDD.done'.
You can even take it a step further by making another service that enqueues the filenames. This service will be a FileSystemWatcher that listens for FileAdd operations. Once it receives that event it proceeds to queueing the Filename to a global message queue. Then, each of your service will just be dequeueing filenames and no longer have to worry about concurrent access.
HTH
So I have a daemon running on a Linux system, and I want to have a record of its activities: a log. The question is, what is the "best" way to accomplish this?
My first idea is to simply open a file and write to it.
FILE* log = fopen("logfile.log", "w");
/* daemon works...needs to write to log */
fprintf(log, "foo%s\n", (char*)bar);
/* ...all done, close the file */
fclose(log);
Is there anything inherently wrong with logging this way? Is there a better way, such as some framework built into Linux?
Unix has had for a long while a special logging framework called syslog. Type in your shell
man 3 syslog
and you'll get the help for the C interface to it.
Some examples
#include <stdio.h>
#include <unistd.h>
#include <syslog.h>
int main(void) {
openlog("slog", LOG_PID|LOG_CONS, LOG_USER);
syslog(LOG_INFO, "A different kind of Hello world ... ");
closelog();
return 0;
}
This is probably going to be a was horse race, but yes the syslog facility which exists in most if not all Un*x derivatives is the preferred way to go. There is nothing wrong with logging to a file, but it does leave on your shoulders an number of tasks:
is there a file system at your logging location to save the file
what about buffering (for performance) vs flushing (to get logs written before a system crash)
if your daemon runs for a long time, what do you do about the ever growing log file.
Syslog takes care of all this, and more, for you. The API is similar the printf clan so you should have no problems adapting your code.
One other advantage of syslog in larger (or more security-conscious) installations: The syslog daemon can be configured to send the logs to another server for recording there instead of (or in addition to) the local filesystem.
It's much more convenient to have all the logs for your server farm in one place rather than having to read them separately on each machine, especially when you're trying to correlate events on one server with those on another. And when one gets cracked, you can't trust its logs any more... but if the log server stayed secure, you know nothing will have been deleted from its logs, so any record of the intrusion will be intact.
I spit a lot of daemon messages out to daemon.info and daemon.debug when I am unit testing. A line in your syslog.conf can stick those messages in whatever file you want.
http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/040/4036/4036s1.html has a better explanation of the C API than the man page, imo.
Syslog is a good option, but you may wish to consider looking at log4c. The log4[something] frameworks work well in their Java and Perl implementations, and allow you to - from a configuration file - choose to log to either syslog, console, flat files, or user-defined log writers. You can define specific log contexts for each of your modules, and have each context log at a different level as defined by your configuration. (trace, debug, info, warn, error, critical), and have your daemon re-read that configuration file on the fly by trapping a signal, allowing you to manipulate log levels on a running server.
As stated above you should look into syslog. But if you want to write your own logging code I'd advise you to use the "a" (write append) mode of fopen.
A few drawbacks of writing your own logging code are: Log rotation handling, Locking (if you have multiple threads), Synchronization (do you want to wait for the logs being written to disk ?). One of the drawbacks of syslog is that the application doesn't know if the logs have been written to disk (they might have been lost).
If you use threading and you use logging as a debugging tool, you will want to look for a logging library that uses some sort of thread-safe, but unlocked ring buffers. One buffer per thread, with a global lock only when strictly needed.
This avoids logging causing serious slowdowns in your software and it avoids creating heisenbugs which change when you add debug logging.
If it has a high-speed compressed binary log format that doesn't waste time with format operations during logging and some nice log parsing and display tools, that is a bonus.
I'd provide a reference to some good code for this but I don't have one myself. I just want one. :)
Our embedded system doesn't have syslog so the daemons I write do debugging to a file using the "a" open mode similar to how you've described it. I have a function that opens a log file, spits out the message and then closes the file (I only do this when something unexpected happens). However, I also had to write code to handle log rotation as other commenters have mentioned which consists of 'tail -c 65536 logfile > logfiletmp && mv logfiletmp logfile'. It's pretty rough and maybe should be called "log frontal truncations" but it stops our small RAM disk based filesystem from filling up with log file.
There are a lot of potential issues: for example, if the disk is full, do you want your daemon to fail? Also, you will be overwriting your file every time. Often a circular file is used so that you have space allocated on the machine for your file, but you can keep enough history to be useful without taking up too much space.
There are tools like log4c that you can help you. If your code is c++, then you might consider log4cxx in the Apache project (apt-get install liblog4cxx9-dev on ubuntu/debian), but it looks like you are using C.
So far nobody mentioned boost log library which has nice and easy way to redirect your
log messages to files or syslog sink or even Windows event log.