I think when file consistency can be damaged when multiple application writes in the same file. Any other case?
Related
I'd like to create shared library in C for linux, some abstract implementation of database management. Shared library whould be responsible for read file containing database and write differences into it. But I have no idea how to handle multiprocessing problems of file handling for this case eg.: App1 try to write differences into database file and App2 has currently opened file with database to read it. In the case of this example I'd like to inform app1 that file is currently open and delay write sequence until App2 will finish database file read.
I was thinking of using some mutual exclusion mechanisms or by using global enum variable to manage current file status, but after I read some of posts I understood that every application that uses shared library create it's own copy in memory and they don't share any memory section during work.
Shared library whould be responsible for read file containing database and write differences into it.
That is possible but quit complicated solution.
While you would need to make sure that multiple processes do not interfere with each other. It's possible to do so with file logs (see man flock), and record locking in man fcntl but you must insure that multiple processes update disjoint file chunks "in place" (without resizing the file).
This is also prone to deadlocks -- if one of the processes takes a lock on a region and then goes into infinite loop, other processes may get stuck as well.
A much simpler solution involves client-server, where the server implements all writes and clients send it read and modify requests.
P.S. There are existing libraries implementing either approach. You will likely save yourself several months of development time using existing solutions.
We have a shared folder which contains some files that needs to be processed.
We also have 3 UNIX servers which runs a shell script which take and process one file each time. At the end of the script the file it's moved away.
The 3 UNIX server doesn't communicate each other, and they are not aware of each other.
In your opinion what is the best way to guarantee that each file will be processed one time, without raising concurrent access issues\error ?
So or so you need some type of a file locking mechanism. Some of the possibilities:
You can create a temporary lock file for every files on work. For example, for file name.ext you will need to create a name.ext.lock, just before you start its processing. If this file already exists - also, the creation fails with a "file exists", it means somebody is already working on it, thus you shouldn't do anything with it.
Second, you could use advisory locks. Advisory locking doesn't already work on every type of file sharing, and they have only libc-level interface, so you can't use them from shell scripting. I suggest to dig into the manual of the flock libc api call.
Third, it were the hardest and it is deeply unix-specific. It were the mandatory lock. Mandatory locking means that the locks are effective even against the processes, who don't know anything from them. You can read more about them here: Mandatory file lock on linux
In your place I did the first, if I can modify the workings of the processing (for example, if I can hook them with a script or even I am developing the processing script). If not, you need probably the third, although it doesn't always work.
I am developing linux kernel module to perform read/write operations.
It reads an input file and write the content to an output file.
I have to introduce atomic mode to my code.
I wanted to know if there is a way to revert changes from a written file in case of partial write for atomic mode.
I want to delete all content I have written to an output file in case my programs gives an error.
Please reply.
I want to delete all content I have written to an output file in case my programs gives an error.
I would avoid developing a kernel module for that purpose.
You can simply do that in the shell or in the application code: write(2) into some temporary file, then rename(2) the file on success or unlink(2) it on failure. Or you could do that in some shell script (e.g. redirecting stdout to a temporary file, then mv or rm it). You need to understand more what inodes are.
If you insist on having something kernel related, consider FUSE
NB: kernel code is usually not expected to write files. Only application code are writing files, using some filesystem code in the kernel.
PS: You might be perhaps interested by inotify(7).
I want to read a file from a driver for 3rd party application
(simple C absed DLL runnning in user space but under control of the 3rd party application)
This file will be written to by an separate C# application.
What shall I use so that I do not face any problems?
What is advantage of using _sopen_s over fopen, I understand the former is more secure but what is the feature of 'sharing' it supports?
I did Google it out number of tims but could not find it.
_sopen_s is a secure version of open() with sharing. It uses unbuffered I/O. It works with file handles (int). This is Microsoft specific. open() is cross platform. There's also sopen() which is the shared/access version.
fopen uses buffering and no file sharing. Works with FILE* structures.
File sharing means that you allow other processes to access the file (or not). E.g. when read sharing is denied, another process will not be open the file for reading.
All are legitimate to use. The unbuffered I/O versions work faster if you read the file in large chunks.
I want to open/create a file and write some data in it in hadoop environment. The distributed file system I am using is hdfs.
I want to do it in pseudo distributed mode. Is there any way I can do this. Please give the code.
I think this post fits to your problem :-)
Writing data to hadoop