replace file with hardlink to another file atomically

replace file with hardlink to another file atomically - filesystems

I have two directory entries, a and b.
Before, a and b point to different inodes.
Afterwards, I want b to point to the same inode as a does.
I want this to be safe - by which I mean if I fail somewhere, b either points to its original inode or the a inode. most especially I don't want to end up with b disappearing.
mv is atomic when overwriting.
ln appears to not work when the destination already exists.
so it looks like i can say:
ln a tmp
mv tmp b
which in case of failure will leave a 'tmp' file around, which is undesirable but not a disaster.
Is there a better way to do this?
(what I'm actually trying to do is replace files that have identical content with a single inode containing that content, shared between all directory entries)

ln a tmp ; mv tmp b
is in fact the fastest way to do it atomically, as you stated in your question.
(Nitpickers corner: faster to place both system calls in one program)

ln a tmp && mv tmp b || rm tmp
seems better, as then if ln fails, the mv will not get executed (and clutter up stderr when it fails).

Related

Can a tar file be efficiently randomly edited?

Is there a way to modify an individual file within a tar file without having to rewrite the entire archive? I recognize this would probably result in fragmentation.
Is there any other archive format that does this?

First off, you should only ask exactly one question on StackOverflow. If you truly want to do frequent writes to the "archive", then you might be better off simply creating a large file, formatting it with some file system of your choice and then mounting it:
truncate -s $(( 512*1024*1024 )) 512MiB-filesystem.ext4
mkfs.ext4 512MiB-filesystem.ext4
sudo mount -o loop 512MiB-filesystem.ext4 mountpoint
sudo chmod a+w mountpoint/
echo foo > mountpoint/bar
sudo umount mountpoint
As for your question about TAR. It is possible and a fun exercise but it might lack the tools that actually implement this. First off, TAR is a very simple file format, it consists of 512 B blocks that can either contain metadata or actual file contents simply copied from the original file without any compression.
A TAR can actually contain multiple files for the same path and by convention, the last duplicate path wins. This means, in order to "modify" a file, you can simply append a newer version of that file to the TAR:
tar --append --file archive.tar modified-file
This should be fast, but it would grow the archive with every file change, so it should be used sparingly.
If you want even more in-place modifications, they should be possible but there is no tooling yet for that as far as I know. I would like to implement that into ratarmount but I'm not sure when I'll get to it.
File system operations and how to implement them:
Modifying a file:
File size is constant: As long as the file size does not change, we could simply change the file inside the TAR if we know the offset for the file contents in the TAR archive, which ratarmount does have stored in an SQLite database.
File size is quasi constant: Actually, the file size might even change by up to 511 B and it still would be possible to simply update the file inside the TAR as long as it doesn't change the number of required TAR blocks (512 B). This would also require updating the file size in the TAR metadata block and updating the checksum of that metadata block, though.
Required TAR blocks shrink: If the required TAR blocks become fewer than before, then it still would be rather easy to modify the TAR on the fly as outlined above. But we would have to somehow format the unused blocks. We could simply fill them with zeros, but in this case, we would have to call tar with the --ignore-zeros option to still get a valid tar. Without that, all files after that position would suddenly appear lost, so it might be unsuited in some circumstances. But we could also simply fill the empty blocks with dummy data, e.g., a directory metadata entry for the / (root) folder. As long as it contains the same metadata as the actual root folder, it basically is a no-op. It might even be possible to create dummy metadata blocks for invalid paths like . or .. to effectively create blocks that are ignored even without the --ignore-zeros option.
*Required TAR blocks grow:` This is the most difficult case. If there is simply no space to put the added data to the file, then we might have to delete it and move it to the end of the file (if it isn't already at the end). Removing the file without rewriting everything else in the TAR would be implemented as mentioned above by either filling the parts with zeros or dummy metadata blocks. At this point, however, we could implement defragmentation techniques, e.g., by keeping track of all empty / dummy blocks in the TAR and looking for fitting places. Or if we want to append 1 KiB to a 1 GiB file, then it might avoid fragmentation better if we move a small file right after the 1 GiB file to the end of the TAR to make space for the 1 KiB to append.
Modifying file metadata:
In General: In general, metadata can be changed by simply changing it in the metadata block and updating the block checksum. This does not require rewriting anything else in the archive
Removals: This is basically the same as file modifications for shrinking block counts. Simply overwrite the space for this file entry with zeros or dummy blocks and maybe keep track of it for writing files into this space at a later time.
Renames: Renames can actually be more tricky than one might think. In most cases, it can also simply be updated, however, there are two problematic cases:
The file name becomes too long: If the file name becomes too long, then the GNU long name extension will allocate further blocks right after the TAR metadata block, which will contain the very long filename. This however would require one more block, which might require moving around blocks inside the TAR as outlined for file modifications
There are file name collisions: If the target path already exists, then simply updating the metadata might not suffice depending on the order the files appear in the TAR. The last one with the same path wins. This might be easy to circumvent by simply forbidding to move to an existing path or by calling remove on the existing file beforehand.
Create: This is simple. Simply append the file to the end of the archive. If implemented manually, then we might have to find the actual end of the data because TAR archives have at least 2 (often more) zero-byte blocks after the last valid data and simply appending new files after those zero blocks would require the --ignore-zero-bytes option.

how to replace a file with an other?

I am writing a simple encryption program, that takes any given file, writes encrypted data to a temporary file and I am now looking for the most efficient way to replace the original file with its encrypted counterpart.
I know I could just fopen the original with w and copy line by line the encrypted file, but I was wondering if there was any more efficient way to do it, like overwriting the original file hard-link to point to the ciphered file sparing me the need to rewrite the entirety of the file?

on linux, you could use mv.
And if the two files are not in the same directory, mv would be the better choice for several reasons, including that an option can be given to mv so no prompt output when a file is overwritten I.E
mv -f tempfile original_newfile
the result will be that tempfile no longer exists and the original file now contains the tempfile with the original name
Note: mv manipulates the 'hardlinks' to do its work

As suggested by #Chris-Turner and explained by #Jabberwocky using rename works fine

How to force deletion of file in C?

How can I remove opened file in linux?
In shell I can do this:
rm -rf /path/to/file_or_directory
But how can I do it in C?
I don't want to using system() function.
I have seen unlink and remove method but it's haven't any flags to set force deletion.

The unlink and remove functions force deletion. The rm command is doing extra checks before it calls one of those functions. But once you answer y, it just uses that function to do the real work.

Well, I hope this answers your question.. This program searches the current directory for the filename, you have to add the feature of opening a different directory, which shouldn't be too hard... I don't understand the last line of your question, can you elaborate? But flags aren't necessary for remove and unlink (They force delete)...
#include<stdio.h>
int main()
{
int status;
char file_name[25];
printf("Enter the name of file you wish to delete\n");
fgets(file_name,25,stdin);
status = remove(file_name);
if( status == 0 )
printf("%s file deleted successfully.\n",file_name);
else
{
printf("Unable to delete the file\n");
perror("Error");
}
return 0;
}

To perform a recursive removal, you have to write a moderately complicated program which performs a file system walk. ISO C has no library features for this; it requires platform-specific functions for scanning the directory structure recursively.
On POSIX systems you can use opendir, readdir and closedir to walk individual directories, and use programming language recursion to handle subdirectories. The functions ftw and its newer variant nwft perform an encapsulated file system walk; you just supply a callback function to process the visited paths. nftw is better because it has a flags argument using which you can specify the FTW_DEPTH flag to do the search depth first: visit the contents of a directory before reporting the directory. That, of course, is what you want for recursive deletion.
On MS Windows, there is FindFirstFile and FindNextFile to cob together a recursive traversal.
About -f, that only suppresses certain checks done by the rm program above and beyond what the operating system requires. Without -f, you get prompted if you want to delete a read-only file, but actually, in a Unix-like system, only the directory write permission is relevant, not that of the file, for deletion. The remove library function doesn't have such a check.
By the way, remove is in ISO C, so it is platform-independent. On POSIX systems, it calls rmdir for directories and unlink for other objects. So remove is not only portable, but lets you not worry about what type of thing you're deleting. If a directory is being removed, it has to be empty though. (Not a requirement of the remove function itself, but of mainstream operating systems that support it).

remove or unlink is basically equivalent to rm -f already--that is, it removes the specified item without prompting for further input.
If you want something equivalent to rm -r, you'll need to code up walking through the directory structure and deleting items individually. Boost Filesystem (for one example) has code to let you do that fairly simply while keeping the code reasonably portable.

Easiest way to overwrite a series of files with zeros

I'm on Linux. I have a list of files and I'd like to overwrite them with zeros and remove them. I tried using
srm file1 file2 file3 ...
but it's too slow (I have to overwrite and remove ~50 GB of data) and I don't need that kind of security (I know that srm does a lot of passes instead of a single pass with zeros).
I know I could overwrite every single file using the command
cat /dev/zero > file1
and then remove it with rm, but I can't do that manually for every single file.
Is there a command like srm that does a single pass of zeros, or maybe a script that can do cat /dev/zero on a list of files instead of on a single one? Thank you.

Something like this, using stat to get the correct size to write, and dd to overwrite the file, might be what you need:
for f in $(<list_of_files.txt)
do
read blocks blocksize < <(stat -c "%b %B" ${f})
dd if=/dev/zero bs=${blocksize} count=${blocks} of=${f} conv=notrunc
rm ${f}
done
Use /dev/urandom instead of /dev/zero for (slightly) better erasure semantics.
Edit: added conv=notrunc option to dd invocation to avoid truncating the file when it's opened for writing, which would cause the associated storage to be released before it's overwritten.

I use shred for doing this.
The following are the options that I generally use.
shred -n 3 -z <filename> - This will make 3 passes to overwrite the file with random data. It will then make a final pass overwriting the file with zeros. The file will remain on disk though, but it'll all the 0's on disk.
shred -n 3 -z -u <filename> - Similar to above, but also unlinks (i.e. deletes) the file. The default option for deleting is wipesync, which is the most secure but also the slowest. Check the man pages for more options.
Note: -n is used here to control the number of iterations for overwriting with random data. Increasing this number, will result in the shred operation taking longer to complete and better shredding. I think 3 is enough but maybe wrong.

The purpose of srm is to destroy the data in the file before releasing its blocks.
cat /dev/null > file is not at all equivalent to srm because
it does not destroy the data in the file: the blocks will be released with the original data intact.
Using /dev/zero instead of /dev/null does not even work because /dev/zero never ends.
Redirecting the output of a program to the file will never work for the same reason given for cat /dev/null.
You need a special-purpose program that opens the given file for writing, writes zeros over all bytes of the file, and then removes the file. That's what srm does.

Is there a command like srm that does a single pass of zeros,
Yes. SRM does this with the correct parameters. From man srm:
srm -llz
-l lessens the security. Only two passes are written: one mode with
0xff and a final mode random values.
-l -l for a second time lessons the security even more: only one
random pass is written.
-z wipes the last write with zeros instead of random data
srm -llzr will do the same recursively if wiping a directory.
You can even use 'srm -llz [file1] [file2] [file3] to wipe multiple files i this way with a single command

cleaning the files in directory in linux

I want to clean all files in a directory on Linux (not deleteing them, only clear their content)
I need to do it in C.

Actually, you really don't need to do it in C. UNIX includes tools that can just about do any task that you want.
find . -type f -exec cp /dev/null {} ';'
That particular snippet above will find all files under the current directory and attempt to copy the null device to it, effectively truncating the file to 0 bytes.
You can change the starting (top level) directory, restrict names (with -name '*.jpg' for example) and even restrict it to the current directory (no subdirectories) with -maxdepth 0.
There are many other options with find that you can discover by entering man find into your command line shell. Just don't enter it into Google, you may get more than you bargained for :-)
If the need to use C is an absolutely non-negotiable one, I would still do it this way but with:
system ("find . -type f -exec cp /dev/null {} ';'");
I'm not keen on re-writing software that someone's already put a bucketload of effort into providing for free :-)
If, after my advice, you still want to do it the hard way, you need to look into opendir, readdir and closedir for processing directories, then just use fopen in write mode followed by fclose on each candidate file.
If you want to navigate whole directory structures rather than just the current directory, you'll have to detect directories from readdir and probably recurse through them.

scandir to list them, then for every file:
fopen(, w+)
fstat to get the size
fwrite the whole file with zeroes? (this is what you mean by clear?)
fclose
A nice shell variant would be: shred -z directory/*

In Bash:
for i in directory/*; do > $i; done
This will preserve ownership and permissions of the file.
Don't do shell work in C! Save a huge amount of time by using the best tool for the job. If this is homework, mark it as such.

You can open the file in write mode and then close it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight