atomic create file if not exists from bash script - c

In system call open(), if I open with O_CREAT | O_EXCL, the system call ensures that the file will only be created if it does not exist. The atomicity is guaranteed by the system call. Is there a similar way to create a file in an atomic fashion from a bash script?
UPDATE:
I found two different atomic ways
Use set -o noclobber. Then you can use > operator atomically.
Just use mkdir. Mkdir is atomic

A 100% pure bash solution:
set -o noclobber
{ > file ; } &> /dev/null
This command creates a file named file if there's no existent file named file. If there's a file named file, then do nothing (but return a non-zero return code).
Pros of > over the touch command:
Doesn't update timestamp if file already existed
100% bash builtin
Return code as expected: fail if file already existed or if file couldn't be created; success if file didn't exist and was created.
Cons:
need to set the noclobber option (but it's okay in a script, if you're careful with redirections, or unset it afterwards).
I guess this solution is really the bash counterpart of the open system call with O_CREAT | O_EXCL.

Here's a bash function using the mv -n trick:
function mkatomic() {
f="$(mktemp)"
mv -n "$f" "$1"
if [ -e "$f" ]; then
rm "$f"
echo "ERROR: file exists:" "$1" >&2
return 1
fi
}
Examples:
$ mkatomic foo
$ wc -c foo
0 foo
$ mkatomic foo
ERROR: file exists: foo

You could create it under a randomly-generated name, then rename (mv -n random desired) it into place with the desired name. The rename will fail if the file already exists.
Like this:
#!/bin/bash
touch randomFileName
mv -n randomFileName lockFile
if [ -e randomFileName ] ; then
echo "Failed to acquired lock"
else
echo "Acquired lock"
fi

Just to be clear, ensuring the file will only be created if it doesn't exist is not the same thing as atomicity. The operation is atomic if and only if, when two or more separate threads attempt to do the same thing at the same time, exactly one will succeed and all others will fail.
The best way I know of to create a file atomically in a shell script follows this pattern (and it's not perfect):
create a file that has an extremely high chance of not existing (using a decent random number selection or something in the file name), and place some unique content in it (something that no other thread would have - again, a random number or something)
verify that the file exists and contains the contents you expect it to
create a hard link from that file to the desired file
verify that the desired file contains the expected contents
In particular, touch is not atomic, since it will create the file if it's not there, or simply update the timestamp. You might be able to play games with different timestamps, but reading and parsing a timestamp to see if you "won" the race is harder than the above. mkdir can be atomic, but you would have to check the return code, because otherwise, you can only tell that "yes, the directory was created, but I don't know which thread won". If you're on a file system that doesn't support hard links, you might have to settle for a less ideal solution.

Another way to do this is to use umask to try to create the file and open it for writing, without creating it with write permissions, like this:
LOCK_FILE=only_one_at_a_time_please
UMASK=$(umask)
umask 777
echo "$$" > "$LOCK_FILE"
umask "$UMASK"
trap "rm '$LOCK_FILE'" EXIT
If the file is missing, the script will succeed at creating and opening it for writing, despite the file being created without writing permissions. If it already exists, the script won't be able to open the file for writing. It would be possible to use exec to open the file and keep the file descriptor around.
rm requires you to have write permissions to the directory itself, without regards to file permissions.

touch is the command you are looking for. It updates timestamps of the provided file if the file exists or creates it if it doesn't.

Related

using a shared library : custom myfopen(), myfrwrite()

Hello I have created a shared library named logger.so. This library is my custom fopen() and fwrite(). It collects some data and produce a file with these data.
Now I am writing a bash script and I want to use this library as well. Producing a txt file I would be able to see the extra file that produce my custom fopen(). So when I use the command fopen() in a c file this extra file is produced.
My Question is which commands are using fopen() and fwrite() functions in bash?
I have already preloaded my shared library but It doesn't work. Maybe these commands don't use fopen(),fwrite()
export LD_PRELOAD=./logger.so
read -p 'Enter the number of your files to create: [ENTER]: ' file_number
for ((i=1; i<=file_number; i++))
do
echo file_"$i" > "file_${i}"
done
This may require some trial and error. I see two ways to do this. One is to run potential commands that deal with files in bash and see if when traced they call fopen:
strace bash -c "read a < /dev/null"`
or
strace bash -c "read a < /dev/null"` 2&>1 | fgrep fopen
This shows that that read uses open, not fopen.
Another way is to grep through the source code of bash as #oguz suggested. When I did this I found several places where fopen is called, but I did not investigate further:
curl https://mirrors.tripadvisor.com/gnu/bash/bash-5.1-rc3.tar.gz|tar -z -x --to-stdout --wildcards \*.c | fgrep fopen
You'll want to unarchive the whole package and search through the .c files one by one, e.g.:
curl https://mirrors.tripadvisor.com/gnu/bash/bash-5.1-rc3.tar.gz|tar -z -v -x --wildcards \*.c
You can also FTP the file or save it via a browser and do the tar standalone if you don't have curl (wget will also work).
Hopefully you can trace the relevant commands but it might not be easy.
Not necessarily every program uses C standard library stdio functions fopen and fwrite.
But every program uses open and write syscalls to open and write files, which you can interpose / monkey-patch.
Modern programs that use io_uring require a different method of interposing.

Potential Dangers of Running Code in Parallel

I am working in OSX and using bash for my shell. I have a script which calls an executable hundreds of times, and each call is independent of the other. Therefore I am going to run this code in parallel. However, each call to the executable appends output to a community text file on a new line.
The ordering of the text file is not of importance (although it would be nice, but totally not worth over complicating since I can just use unix sort command), but what is, is that every call of the executable properly printed to the file. My concern is that if I run the script in parallel that the by some freak accident, two threads will check out the text file, print to it and then save different copies back to the original directory of the text file. Thus nullifying one of the writes to the file.
Does this actually happen, or is my understanding of printing to a file flawed? I don't fully know if this would also be a case by case bases so I will provide some mock code of what is being done in my program below.
Script:
#!/bin/sh
abs=$1
input=$(echo "$abs" | awk '{print 0.004 + 0.005*$1 }')
./program input
"./program":
~~Normal .c file stuff here~~
~~VALUE magically calculated here~~
~~run number is pulled out of input and assigned to index for sorting~~
FILE *fpp;
fpp = fopen("Doc.txt","a");
fprintf(fpp,"%d, %.3f\n", index, VALUE);
fclose(fpp);
~Closing events of program.c~~
Commands to run script in parallel in bash:
printf "%s\n" {0..199} | xargs -P 8 -n 1 ./program
Thanks for any help you guys can offer.
A write() call (like fwrite()) with the append flag set in open() (like during fopen()) is guaranteed to avoid the race condition you describe.
O_APPEND
If set, the file offset shall be set to the end of the file prior to each write.
From: POSIX specifications for open:
opengroup.org open
Race conditions are what you are thinking of.
Not 100% sure but if you simple append to the end of the file rather than opening it and editing it should be right
If you have the option, make your program write to standard output instead of directly to a file. Then you can let the shell merge the output of your programs:
printf "%s\n" {0..199} | parallel -P 8 -n 1 ./program > merged_output.txt
Yeah, that looks like a recipe for disaster. If those processes both hit opening the file at the roughly the same time, only one will "take".
I suggest either (easier) writing to separate files then catting them together when the processing is done, or (harder) sending all results to a consumer process that will write the file for everyone.

How do I add an operator to Bash in Linux?

I'd like to add an operator ( e.g. ^> ) to handle prepend instead append (>>). Do I need to modify Bash source or is there an easier way (plugin, etc)?
First of all, you'd need to modify bash sources and quite heavily. Because, above all, your ^> would be really hard to implement.
Note that bash redirection operators usually do a very simple writes, and work on a single file (or program in case of pipes) only. Excluding very specific solutions, you usually can't write to a beginning of a file for the very simple reason you'd need to move all remaining contents forward after each write. You could try doing that but it will be hard, very ineffective (since every write will require re-writing the whole file) and very unsafe (since with any error you will end up with random mix of old and new version).
That said, you are indeed probably better off with a function or any other solution which would use a temporary file, like others suggested.
For completeness, my own implementation of that:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
mv "${tmp}" "${1}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Note that you unlike #jpa suggested, you should be writing the concatenated data to a temporary file as that operation can fail and if it does, you don't want to lose your original file. Afterwards, you just replace the old file with new one, or delete the temporary file and handle the failure any way you like.
Synopsis the same as with the other solution:
echo test | prepend file.txt
And a bit modified version to retain permissions and play safe with symlinks (if that is necessary) like >> does:
prepend() {
local tmp=$(tempfile)
if cat - "${1}" > "${tmp}"; then
cat "${tmp}" > "${1}"
rm -f "${tmp}"
else
rm -f "${tmp}"
# some error reporting
fi
}
Just note that this version is actually less safe since if during second cat something else will write to disk and fill it up, you'll end up with incomplete file.
To be honest, I wouldn't personally use it but handle symlinks and resetting permissions externally, if necessary.
^ is a poor choice of character, as it is already used in history substitution.
To add a new redirection type to the shell grammar, start in parse.y. Declare it as a new %token so that it may be used, add it to STRING_INT_ALIST other_token_alist[] so that it may appear in output (such as error messages), update the redirection rule in the parser, and update the lexer to emit this token upon encountering the appropriate characters.
command.h contains enum r_instruction of redirection types, which will need to be extended. There's a giant switch statement in make_redirection in make_cmd.c processing redirection instructions, and the actual redirection is performed by functions throughout redir.c. Scattered throughout the rest of source code are various functions for printing, copying, and destroying pipelines, which may also need to be updated.
That's all! Bash isn't really that complex.
This doesn't discuss how to implement a prepending redirection, which will be difficult as the UNIX file API only provides for appending and overwriting. The only way to prepend to a file is to rewrite it entirely, which (as other answers mention) is significantly more complex than any existing shell redirections.
Might be quite difficult to add an operator, but perhaps a function could be enough?
function prepend { tmp=`tempfile`; cp $1 $tmp; cat - $tmp > $1; rm $tmp; }
Example use:
echo foobar | prepend file.txt
prepends the text "foobar" to file.txt.
I think bash's plugin architecture (loading shared objects via the 'enable' built-in command) is limited to providing additional built-in commands. The redirection operators are part of they syntax for running simple commands, so I think you would need to modify the parser to recognize and handle your new ^> operator.
Most Linux filesystems do not support prepending. In fact, I don't know of any one that has a stable userspace interface for it. So, as stated by others already, you can only rely on overwriting, either just the initial parts, or the entire file, depending on your needs.
You can easily (partially) overwrite initial file contents in Bash, without truncating the file:
exec {fd}<>"$filename"
printf 'New initial contents' >$fd
exec {fd}>&-
Above, $fd is the file descriptor automatically allocated by Bash, and $filename is the name of the target file. Bash opens a new read-write file descriptor to the target file on the first line; this does not truncate the file. The second line overwrites the initial part of the file. The position in the file advances, so you can use multiple commands to overwrite consecutive parts in the file. The third line closes the descriptor; since there is only a limited number available to each process, you want to close them after you no longer need them, or a long-running script might run out.
Please note that > does less than you expected:
Remove the > and the following word from the commandline, remembering the redirection.
When the commandline is processed and the command can be launched, calling fork(2) (or clone(2)), to create a new process.
Modify the new process according to the command. That includes things like modified environment variables (SOMEVAR=foo yourcommand), but also changed filedescriptors. At this point, a > yourfile from the cmdline will have the effect that the file is open(2)'ed at the stdout filedescriptor (that is #1) in write-only mode truncating the file to zero bytes. A >> yourfile would have the effect that the file is oppend at stdout in write-only mode and append mode.
(Only now launch the program, like execv(yourprogram, yourargs)
The redirections could, for a simple example, be implemented like
open(yourfile, O_WRONLY|O_TRUNC);
or
open(yourfile, O_WRONLY|O_APPEND);
respectively.
The program then launched will have the correct environment set up, and can happily write to fd1. From here, the shell is not involved. The real work is not done by the shell, but by the operating system. As Unix doesn't have a prepend mode (and it would be impossible to integrate that feature correctly), everything you could try would end up in a very lousy hack.
Try to re-think your requirements, there's always a simpler way around.

Stdout to file from which is reading content

When I want to modify files using Bash, I usually redirect stdout to another file and then remove the original. Is there a faster way?
For example:
cut -d':' -f2 FILE > FILE
This makes my source file empty (it empty the file before writing and then starts reading)
All I can do is:
cut -d':' -f2 FILE > FILE2
rm -f FILE
mv FILE2 FILE
Is there a way to redirect the (modified) output to the original file in just one step?
If you install Colin Watson's sponge utility (the link on Colin's blog seems to be dead, but you can get it as part of Joey Hess' moreutils package), you can use it to "soak up" the output before dumping it back into the file:
cat FILE | cut -d':' -f2 | sponge FILE
I'm afraid I don't have an answer for you in bash, but it's very easy in zsh:
cut -d':' -f2 =(cat FILE) > FILE
The =(...) construct creates a temporary file containing the output of the program mentioned which is removed after the command containing it is finished.
I'm afraid there is no convenient general way of doing that in bash. The problem is that before running any commands, bash opens the FILE for output, truncating it, so when cat opens the file, all it sees is the new, empty file. sed has a -i option to do the replacement inline (for which it actually uses a temporary file), and zsh has =(...), as mentioned in other answers, and sort manually checks whether the input and output files match (when using -o, as I mentioned, using the shell redirect will wipe the file before the program ever sees it), but cut and many other utilities don't. You have to use a temporary file for those.

How to redirect output away from /dev/null

I have an application that runs the a command as below:
<command> <switches> >& /dev/null
I can configure <command>, but I have no control over <switches> . All the output generated by this command goes to /dev/null. I want the output to be visible on screen or redirected to a log file.
I tried to use freopen() and related functions to reopen /dev/null to another file, but could not get it working.
Do you have any other ideas? Is this possible at all?
Thanks for your time.
PS: I am working on Linux.
Terrible Hack:
use a text editor in binary mode open the app, find '/dev/null/' and replace it with a string of the same length
e.g '~/tmp/log'
make a backup first
be carefull
be very carefull
did I mention the backup?
Since you can modify the command you run you can use a simple shell script as a wrapper to redirect the output to a file.
#!/bin/bash
"$#" >> logfile
If you save this in your path as capture_output.sh then you can add capture_output.sh to the start of your command to append the output of your program to logfile.
Append # at the end of your command so it becomes <command> # >& /dev/null, thus commenting out the undesired part.
Your application is probably running a shell and passing it that command line.
You need to make it run a script written by you. That script will replace >/dev/null in the command line with >>/your/log and call the real shell with the modified command line.
The first step is to change the shell used by the application. Changing the environment variable SHELL should suffice, i.e., run your application as
SHELL=/home/user/bin/myshell theApp
If that doesn't work, try momentarily linking /bin/sh to your script.
myshell will call the original shell, but after pattern-replacing the parameters:
#!/bin/bash
sh ${1+"${#/\>\/dev\/null/>>\/your\/log}"}
Something along these lines should work.
You can do this with an already running process by using gdb. See the following page: http://etbe.coker.com.au/2008/02/27/redirecting-output-from-a-running-process/
Can you create an alias for that command? If so, alias it to another command that dumps output to a file.
The device file /dev/tty references your application's controlling terminal - if that hasn't changed, then this should work:
freopen("/dev/tty", "w", stdout);
freopen("/dev/tty", "w", stderr);
Alternatively, you can reopen them to point to a log file:
freopen("/var/log/myapp.log", "a", stdout);
freopen("/var/log/myapp.err", "a", stderr);
EDIT: This is NOT a good idea and certainly not worth trying unless you know what this can break. It works for me, may work for you as well.
Ok, This is a really bad hack and probably not worth doing. Assuming that none of the other commands works, and you simply do not have access to the binary/application (which contains the command with /dev/null) and you cannot re-direct the output to other file (by replacing /dev/null).
Then, you can delete /dev/null ($> rm /dev/null) and create your own file at its place (preferably with a soft link) where all the data can be directed. When you are done, you can create the /dev/null once again using following command:
$> mknod -m 666 /dev/null c 1 3
Just to be very clear, this is a bad hack and certainly requires root permissions to work. High chances that your re-directed file may contain data from many other applications/binaries which are running and use /dev/null as sink.
It may not exactly redirect, but it allows to get the output wherever it's being sent
strace -ewrite -p $PID
It's not that cleen (shows lines like: write(#,) ), but works! (and is single-line :D ) You might also dislike the fact, that arguments are abbreviated. To control that use -s parameter that sets the maxlength of strings displayed.
It catches all streams, so You might want to filter that somehow.
You can filter it:
strace -ewrite -p $PID 2>&1 | grep "write(1"
shows only descriptor 1 calls. 2>&1 is to redirect stderr to stdout, as strace writes to stderr by default.
In perl, if you just want to redirect STDOUT to something slightly more useful, you can just do something like:
open STDOUT, '>>', '/var/log/myscript.log';
open STDERR, '>>', '/var/log/myscript.err';
at the beginning of your script, and that'll redirect it for the rest of your script.
Along the lines of e-t172's answer, can you set the last switch to (or append to it):
; echo
If you can put something inline before passing things to /dev/null (not sure if you are dealing with a hardcoded command), you could use tee to redirect to something of your choice.
Example from Wikipedia which allows escalation of a command:
echo "Body of file..." | sudo tee root_owned_file > /dev/null
http://en.wikipedia.org/wiki/Tee_(command)

Resources