Development environment: mobile app in Android
I'm looking for a way to uniquely identify files in a FAT32/VFAT file system (which has no inodes).
I thought about hashing (SHA1?) the full path. The problem with this solution is that it doesn't support moving/renaming.
Is there something better, that will hold even when moving/renaming the file?
Thanks
Unfortunately FAT doesn't have Unique file IDs and when they are needed, various system components emulate them by maintaining the list of all files of the filesystem in memory (thus the ID is unique and valid only when the system is running).
Depending on what you control (either you have a filesystem driver, a filter or just a user-mode application) potentially you can do the same - have a list of files and provide some unique ID based on that list.
Related
I'm developing a change tracking software to monitor files of a specific volume. I tried FileSystemWatcher (.NET) and AlternateDataStreams but they all have some limitations (ie. the change tracking software has to be on 24/7, alternate data streams to not work for ReadOnly files, etc.).
After some investigations I thought that I could directly read the NTFS change journal. This works great if the file is moved/renamed, etc. on the same volume. For identifying the file I'm using the File Reference Number.
But if the file is moved to another volume, the File Reference Number naturally changes.
My question:
Is there a unique ID (GUID or something else) that doesn't change even if the file is moved to another volume?
Well...there can be a file GUID, but it's not there by default.
If you have the necessary permissions, you can race through the files and assign a GUID which will be preserved across NTFS volume moves. Your stated goal is exactly why the feature exists. It uses a somewhat unwieldy API called DeviceIOControl...which is used for a gazillion purposes...but one of it's control codes is FSCTL_CREATE_OR_GET_OBJECT_ID. Check here for details.
It only creates the GUID if one hasn't already been assigned...which is just how you want it to work. Of course, if the file moves to a non-NTFS volume, you're still outta luck.
Is there any good way to storing lucene index in db without any external library, that touches connection layer (like JDBCDirectory) and also without using file system (even temporary). RAMDirectory would be fine for me if I could get from it specific parts of the index - .cfs "file" and segment. Don't know if it's doable. Will be thankful for any help.
While building web applications often we have files associated with database entries, eg: we have a user table and each category has a avatar field, which holds the path to associated image.
To make sure there are no conflicts in filenames we can either:
rename files upon upload to ID.jpg; the path would be then /user-avatars/ID.jpg
or create a sub-directory for each entity, and leave the original filename intact; the path would be then /user-avatars/ID/original_filename.jpg
where ID is users's unique ID number
Both perfectly valid from application logic's point of view.
But which one would be better from filesystem performance point of view? We have to keep in mind that the number of category entries can be very high (milions).
Is there any limit to a number of sub-directories a directory can hold?
It's going to depend on your file system, but I'm going to assume you're talking about something simple like ext3, and you're not running a distributed file system (some of which are quite good at this). In general, file systems perform poorly over a certain number of entries in a single directory, regardless of whether those entries are directories or files. So no matter whether if you're creating one directory per image or one image in the root directory, you will run into scaling problems. If you look at this answer:
How many files in a directory is too many (on Windows and Linux)?
You'll see that ext3 runs into limits at about 32K entries in a directory, far fewer than you're proposing.
Off the top of my head, I'd suggest doing some rudimentary sharding into a multilevel directory tree, something like /user-avatars/1/2/12345/original_filename.jpg. (Or something appropriate for your type of ID, but I am interpreting your question to be about numeric IDs.) Doing that will also make your life easier later when you decide you want to distribute across a storage cluster, since you can spread the directories around.
Millions of entries (either files or directories) in one parent directory would be hard to deal with for any filesystem. While modern filesystems use sorting and various tree algorithms for quick search for the needed files, even navigating to the folder with Windows Explorer or Midnight Commander or any other file manager will be complicated as the file manager would have to read contents of the directory. The same applies to file search. So subdirectories are preferred for this.
Yet I need to notice that access to particular file would be a bit faster when all files are in one directory than when they are separated into subdirectories at least on NTFS (measured this myself several times with 400K files).
I've been having a very similar issue with html files not images. Trying to store millions of them in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:
Reference: article
If you really want to use files, maybe your best bet is to partition the files off into several subdirectories so that you don't hit a limit. For example, if you have an ID 123456, you can put it in /12/34/56.jpg.
However, I would recommend just using the database to store this data since you are already using one. You can store the image data and ID in the same table, and you don't have to worry about some of the pesky business of dealing with files like making sure the permissions are set right, etc.
Yes, I know. This question have been already replied in Where to store the Core Data file? and in Store coredata file outside of documents directory?.
#Kendall Helmstetter Gelner and #Matthias Bauch provided very good replies. I upvoted for them.
Now my question is quite conceptual and I'll try to explain it.
From Where You Should Put Your App’s Files section in Apple doc, I've read the following:
Handle support files — files your application downloads or generates and
can recreate as needed — in one of two ways:
In iOS 5.0 and earlier, put support files in the /Library/Caches directory to prevent them from being
backed up
In iOS 5.0.1 and later, put support files in the /Library/Application Support directory and apply the
com.apple.MobileBackup extended attribute to them. This attribute
prevents the files from being backed up to iTunes or iCloud. If you
have a large number of support files, you may store them in a custom
subdirectory and apply the extended attribute to just the directory.
Apple says that for handling support files you can follow two different ways based on the installed iOS. In my opinion (but maybe I'm wrong) a Core Data file is a support file and so it falls in these categories.
Said this, does the approach by Matthias and Kendall continue to be valid or not? In particular, if I create a directory, say Private, within the Library folder, does this directory continue to remain hidden both in iOS 5 version (5.0 and 5.0.1) or do I need to follow Apple solution? If the latter is valid, could you provide any sample or link?
Thank you in advance.
I would say that a Core Data file is not really a support file - unless you have some way to replicate the data stored, then you would want it backed up.
The support files are more things like images, or databases that are only caches for a remote web site.
So, you could continue to place your Core Data databases where you like (though it should be under Application Support).
Recent addition as of Jan 2013: Apple has started treating pre-loaded CoreData data stores that you copy from a bundle into a writable area, as if they were a support file - even if you write user data into the same databases also. The solution (from DTS) is to make sure when you copy the databases into place, set the do-not-backup flag, and then un-set that if user data is written into the database.
If your CoreData store is purely a cache of downloaded network data, continue to make sure it goes someplace like Caches or has the Do Not Backup flag set.
I have a very little idea about what database file system is.
Can somebody out here explain to me what actually a database file system is, and what its applications are?
How is it different from a conventional file system?
How I can build it?
Typical file systems (*nix, ms-dos, etc) organize files hierarchically. For example,
c:\ represents the top of a hierarchy
c:\foo is the next level in the hierarchy
c:\foo\bar is a sub-node of \foo
etc..
Each file exists in one and only one location in this hierarchy.
By contrast, a database file system organizes files by metadata attributes. For example, topic, type, author, etc.. Rather than existing in one particular place in a hierarchy, the file exists in multiple "places" depending on its attributes.
The last question you ask is unanswerable.
Found some good links
DBFS (This one is really good)
Towards A Single Folder Filesystem
It's a file system where files have significant amounts of metadata. For example, the iTunes library might count as a database file system; not only do you have files on disk and know where they are, but you have tags (genres) and other metadata like author (artist).
It's a file system that stores files as blobs in a database, rather than in a hierarchy of directories. Imagine a web-site with no "directory-like" hierarchy in the URL - just loads of tags and categories and a big "search" field - something like that, only on your hard-drive.
Pros & cons? Ask yourself, how many database filesystems have I ever seen? Do you need to ask more?