How to "defragment" a directory on ext3? - filesystems

I am running a daemon which analyses files in a directory and then deletes them. In case the daemon is not running for whatever reason, files get stacked there. Today I had 90k files in that directory. After starting the daemon again, it processed all the files.
However, the directory remains large; "ls -dh ." returns a size of 5.6M. How can I "defragment" that directory? I already figured out that renaming that directory, and creaing a new one with the same name and permissions solves the problem. However, as files get written in there at any time, there doesn't seem to be a safe way to rename the directory and create a new one as for a moment, the target directory does not exist.
So a) is there a way/a (shell) program which can defragment directories on an ext3 filesystem? or b) is there a way to create a lock on a directory so that trying to write files blocks until the rename/create has finished?

"Optimize directories in filesystem. This option causes e2fsck to try to optimize all directories, either by reindexing them if the filesystem supports directory indexing, or by sorting and compressing directories for smaller directories, or for filesystems using traditional linear directories." -- fsck.ext3 -D
Of course this should not be done on a mounted filesystem.

Not really applicable for Ext3, but maybe useful for users of other filesystems:
according to https://wiki.archlinux.org/index.php/Btrfs#Defragmentation, with Btrfs it is apparently possible to defragment the metadata of a directory: btrfs filesystem defragment / will defragment the metadata of the root folder. This uses the online defragmentation support of Btrfs.
while Ext4 does support online defragmentation (with e4defrag), this doesn't seem to apply to directory metadata (according to http://sourceforge.net/p/e2fsprogs/bugs/326/).
I haven't tried either of these solutions, though.

I'm not aware of a way to reclaim free space from within a directory.
5MB isn't very much space, so it may be easiest to just ignore it. If this problem (files stacking up in the directory) occurs on a regular basis, then that space will be reused anytime the directory fills up again.
If you desperately need the ability to shrink the directory here's a (ugly) hack that might work.
Replace the directory with a symbolic link to an empty directory. If this problem reoccurs, you can create a new empty directory, and then change the symlink to point to the new directory. Changing the symlink should be atomic, so you won't loose any incoming files. Then you can safely empty and delete the old directory.
[Edited to add: It turns out that this does not work. As Bada points out in the comments you can't atomically change a symlink in the way I suggested. This leaves me with my original point. File systems I'm familiar with don't provide a mechanism to reclaim free space within directory blocks.]

Related

How to get a unique file identity cross-platform and filesystems in Go?

I'm looking for a way to get a file's inode on Linux and a file identity on Windows, or any other unique file identity as cross-platform as possible and more efficient than hashing file contents. Platforms are Linux,Windows,Mac.
The purpose of this is to be able track files as efficiently as possible so the program can locate them after the user has renamed or moved them. If I have a file path + file ID and the file is not at the path, I need a way to find the new file path on all mounted volumes (or fail if it has been deleted). Hashing all files is prohibitively expensive and creates a whole bunch of new error possibilities since files are going to be updated and renamed regularly.
I'm hoping there is an existing library but haven't found anything cross-platform yet. If there is none, I'd be very grateful for help on how to use syscalls in x/sys, and to hear about any limitations inodes and file identities have.

What is the meaning of the STAGING_DIR variable in makefile

I know this is a noob question but I couldn't find a good explanation...
I am currently learning how to work with makefiles and I get sometimes an error like: 'STAGING_DIR not set' when running make.
I know I can export a directory as staging dir and the warning will go away but what is the meaning of it? and what make do with it?
If it is in any documentation a link will be more then enough thanks :)
There is no meaning for STAGING_DIR built into make, but the name STAGING_DIR suggests the designer of the makefile intended it to be the name of a directory where some sort of intermediate files would be put.
Most likely you can assign to it the path of any directory you create where files built during the build may be placed. However, inspect of the makefile would be necessary to determine how it is used. If it is used solely for intermediate files, and not permanent files, then you can use a temporary directory and remove it when you are done building, or you might use a semi-permanent directory with the benefit of faster builds when you build incrementally. (What constitutes a temporary or semi-permanent directory is mostly a matter of convenience: You might put a temporary directory in a common path used for such things, hopefully on a disk volume with available space and good performance, possibly a directory pointed to by $TMPDIR. Some areas designated for temporary directories may be automatically cleaned up by the operating system at times. You might put a semi-permanent directory somewhere in your own home directory and remove it at your convenience.)
In spite of the name, STAGING_DIR might be used for final output files of the build, in which case using a temporary directory that is removed frequently might not be suitable. Again, inspect of the makefile is necessary to see how it is used.

mkstemp and hard disk stress

Are temporary files created with mkstemp synced to disk?
Here is what I have:
Program creates temporary file using mkstemp and sends fd to another program.
This temporary file is mmap-ped by both programs and used heavily (up to 400 MB/sec of writes and 400 MB/sec of reads; up to 60 reads and writes per second).
I can't use memfd_create (may not be supported on target devices).
Lets also assume (and this is almost true) that I can't create this file on tmpfs (like in /tmp).
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
Additional info (not important):
I am writing wayland compositor for Android devices. Currently temporary files (wayland surfaces actually) are created on tmpfs. And everything works fine as long as SELinux is not enabled. But if I enable SELinux, it prevents fd's from being transferred from client to compositor. Only solution I currently know is to create temporary files in app's home dir. But if such way is dangerous, I will find another.
Are temporary files created with mkstemp synced to disk?
The mkstemp function does not impart any special properties to files it opens that would prevent them from being synced to disk. The filesystem on which they are created might have such a property, but that's independent of file creation. In particular, files created via mkstemp() will persist indefinitely if not removed.
What I need is guarantee that such file will not stress hard disk. I can't allow it to be written to disk even if this only happens once every 5 seconds. If I can't get such guarantee, I will look for another way.
As far as I am aware, even tmpfs filesystems do not guarantee that their contents will remain locked in memory, as opposed to being paged out. They are backed by virtual memory. But if the actual file is comparatively small and all its pages are hot, then they are likely to remain in memory only.
With regard to the larger problem,
everything works fine as long as SELinux is not enabled. But if I
enable SELinux, it prevents fd's from being transferred from client to
compositor. Only solution I currently know is to create temporary
files in app's home dir.
By default, newly-created files inherit the SELinux type of their parent directory. Your Wayland clients presumably do not have sufficient privilege to modify the SELinux labels of the files they create, but you should be able to administratively create a directory wherever you like with a label conducive to your needs. For example, you could cause a subdirectory of /dev/shm to be created for the purpose (at every boot), and chconned to have an appropriate label. If the clients create their temp files there then they should inherit the SELinux type you choose.

Why should Hashicorp Packer file provisioner use temp directory?

I wonder what are the benefits of indirect file upload within /tmp directory Packer file provisioner doc suggests?
It's reasonable to assume uploading file/directory to the destination place without any intermediates.
This is probably recommended because the /tmp directory on any machine will likely fit the requirements mentioned in the docs for the destination parameter:
This value must be a writable location and any parent directories must already exist.
The /tmp directory usually exists and is usually readable and writeable by any process on your machine, so it's a good suggestion as a standard destination folder.

Apply file structure diff/patch on remote system?

Is there a tool that creates a diff of a file structure, perhaps based on an MD5 manifest. My goal is to send a package across the wire that contains new/updated files and a list of files to remove. It needs to copy over new/updated files and remove files that have been deleted on the source file structure?
You might try rsync. Depending on your needs, the command might be as simple as this:
rsync -az --del /path/to/master dup-site:/path/to/duplicate
Quoting from rsync's web site:
rsync is an open source utility that
provides fast incremental file
transfer. rsync is freely available
under the GNU General Public License
and is currently being maintained by
Wayne Davison.
Or, if you prefer wikipedia:
rsync is a software application for
Unix systems which synchronizes files
and directories from one location to
another while minimizing data transfer
using delta encoding when appropriate.
An important feature of rsync not
found in most similar
programs/protocols is that the
mirroring takes place with only one
transmission in each direction. rsync
can copy or display directory contents
and copy files, optionally using
compression and recursion.
#vfilby I'm the process of implementing something similar.
I've been using rsync for a while, but it gets funky when deploying to remote server with permission changes that are out of my control. With rsync you can choose to not include permissions, but they still endup being considered for some reason.
I'm now using git diff. This works very well for text files. Diff generates patches, rather then a MANIFEST that you have to include with your files. The nice thing about patches is that there is already an established framework for using and testing these patches before they're applied.
For example, with patch utility that comes standard on any *unix box, you can run the patch in dry-run mode. This will tell you if the patch that you're going to apply is actually going to apply before you run it. This helps you to make sure that the files that you're updating have not changed while you were preparing the patch.
If this is similar to what you're looking for, I can elaborate on my process.

Resources