What is a database file system?

What is a database file system? - database

I have a very little idea about what database file system is.
Can somebody out here explain to me what actually a database file system is, and what its applications are?
How is it different from a conventional file system?
How I can build it?

Typical file systems (*nix, ms-dos, etc) organize files hierarchically. For example,
c:\ represents the top of a hierarchy
c:\foo is the next level in the hierarchy
c:\foo\bar is a sub-node of \foo
etc..
Each file exists in one and only one location in this hierarchy.
By contrast, a database file system organizes files by metadata attributes. For example, topic, type, author, etc.. Rather than existing in one particular place in a hierarchy, the file exists in multiple "places" depending on its attributes.
The last question you ask is unanswerable.

Found some good links
DBFS (This one is really good)
Towards A Single Folder Filesystem

It's a file system where files have significant amounts of metadata. For example, the iTunes library might count as a database file system; not only do you have files on disk and know where they are, but you have tags (genres) and other metadata like author (artist).

It's a file system that stores files as blobs in a database, rather than in a hierarchy of directories. Imagine a web-site with no "directory-like" hierarchy in the URL - just loads of tags and categories and a big "search" field - something like that, only on your hard-drive.
Pros & cons? Ask yourself, how many database filesystems have I ever seen? Do you need to ask more?

Related

What is the most appropriate way to store files?

I am dealing with one problem, before starting developing, I decided to do some research.
So the problem is what would be the efficient solution when storing files?
I read about it, and some people was against storing files within database, as it will have negative impact on backups / restorations, it will add more processing time when reading database for large files and etc...
Good option would be to use S3 or any other cloud solution to store the files, but for this current customer cloud won't be good.
Another option would be to store files under file system. The concept is clear, but I try to understand what I need to understand before implementing that solution.
For example we need to consider how we structure directories, if we would store 100 000 files in one directory it can be come slow to open and etc. As well there is like maximum amount of files that can be stored in one directory.
Is there any 3rd party tools that helps to manage files in file system? that automatically partitions files and places them in directories?

I work with a software that have more than 10 million files in file system, how you will structure the folders depends, but what I did was:
Create a folder to each entity (Document, Image...)
Into each folder create a folder to each ID object with ID beign the name of the folder, and put their files inside, but this could vary.
Example to Person that have the ID 15:
ls /storage/Person/15/Image/
Will give me this 4 images that in the database I linked to person with the ID 15:
Output:
1.jpg
2.png
3.bmp
4.bmp
If you have a HUGE amount of elements, you cold separate each digit of an ID into a subfolder, that is: Person wih ID 16549 will have this path: /storage/Person/1/6/5/4/9/Image/
About limits of files in folder I suggest you to read this thread: https://stackoverflow.com/a/466596/12914069
About a 3rd party tool I don't know, but for sure you could build this logic into your code.
English isn't my first language, tell me if you didn't understand something.

Database Tables for files in filesystem

i needed to save images to my back-end, and finally went with storing them in the file system instead of in the database as blobs. So now i have a different issue, i want to make my database as optimized as possible. Here are my needs, and my approaches:
I have these entities:
User
Image
In my file system, i can store the images in directories named after the user id. So basically:
16
asd.jpg
blaBla.jpg
Would represent the images about the user with id 16.
Now, i know i will have a lot of directories and a lot of images, and i know that storing their paths in a database would be better than querying the file system. (or would the OS know the locations of all the directories, making these tables not needed?)However i was wondering should i make a table such as (userId,imagePath), connecting every image to a userid, or (userId,directoryPath), connecting every-user with the path to his directory, then use something like Files.walk(directoryPath) to list all of the paths of the images inside that directory. What would be a better approach, or is this way to opinion-based ? A completely different approach or any tips would also be appreciated.

File Management for Large Quantity of Files

Before I begin, I would like to express my appreciation for all of the insight I've gained on stackoverflow and everyone who contributes. I have a general question about managing large numbers of files. I'm trying to determine my options, if any. Here it goes.
Currently, I have a large number of files and I'm on Windows 7. What I've been doing is categorizing the files by copying them into folders based on what needs to be processed together. So, I have one set that contains the files by date (for long term storage) and another that contains the copies by category (for processing and calculations). Of course this doubles my data each time. Now I'm having to create more than one set of categories; 3 copies to be exact. This is quadrupling my data.
For the processing side of things, the data ends up in excel. Originally, all the data was brough into excel. Then all organization and filtering was performed in excel. This was time consuming and not easily maintainable over the long term. Later the work load was shifted to the file system itself, which lightened the work in excel.
The long and short of it is that this is an extremely inefficient use of disk space. What would be a better way of handling this?
Things that have come to mind:
Overlapping Folders
Is there a way to create a folder that only holds the addresses of a file, rather than copying the file. This way I could have two folders reference the same file.
To my understanding, a folder is a file listing the memory addresses of the files inside of it, but on Windows a file can only be contained in one folder.
Microsoft SQL Server
Not sure what could be done here.
Symbolic Links
I'm not an administrator, so I cannot execute the mklink command.
Also, I'm uncertain about any performance issues with this.
A Junction
Apparently not allowed for individual files, only folders in windows.
Search folders (*.search-ms)
Maybe I'm missing something, but to my knowledge there is no way to specify individual files to be listed.
Hashing the files
Creating hash tags for all the files, would allow for the files to be stored once. But then I have no idea how I would handle the hash tags.
XML
Maybe I could use xml files to attach meta data to the files and somehow search using them.
Database File System
I recently came across this concept in my search. Not sure how it would apply Windows.

I have found a partial solution. First, I discovered that the laptop I'm using is actually logged in as Administrator. As an alternative to options 3 and 4, I have decided to use hard-links, which are part of the NTFS file system. However, due to the large number of files, this is unmanageable using the following command from an elevated command prompt:
mklink /h <source\file> <target\file>
Luckily, Hermann Schinagl has created the Link Shell Extension application for Windows Explorer and a very insightful reading of how Junctions, Symbolic Links, and Hard Links work. The only reason that this is currently a partial solution, is due to a separate problem with Windows Explorer, which I intend to post as a separate question. Thank you Hermann.

What is better for performance - many files in one directory, or many subdirectories each with one file?

While building web applications often we have files associated with database entries, eg: we have a user table and each category has a avatar field, which holds the path to associated image.
To make sure there are no conflicts in filenames we can either:
rename files upon upload to ID.jpg; the path would be then /user-avatars/ID.jpg
or create a sub-directory for each entity, and leave the original filename intact; the path would be then /user-avatars/ID/original_filename.jpg
where ID is users's unique ID number
Both perfectly valid from application logic's point of view.
But which one would be better from filesystem performance point of view? We have to keep in mind that the number of category entries can be very high (milions).
Is there any limit to a number of sub-directories a directory can hold?

It's going to depend on your file system, but I'm going to assume you're talking about something simple like ext3, and you're not running a distributed file system (some of which are quite good at this). In general, file systems perform poorly over a certain number of entries in a single directory, regardless of whether those entries are directories or files. So no matter whether if you're creating one directory per image or one image in the root directory, you will run into scaling problems. If you look at this answer:
How many files in a directory is too many (on Windows and Linux)?
You'll see that ext3 runs into limits at about 32K entries in a directory, far fewer than you're proposing.
Off the top of my head, I'd suggest doing some rudimentary sharding into a multilevel directory tree, something like /user-avatars/1/2/12345/original_filename.jpg. (Or something appropriate for your type of ID, but I am interpreting your question to be about numeric IDs.) Doing that will also make your life easier later when you decide you want to distribute across a storage cluster, since you can spread the directories around.

Millions of entries (either files or directories) in one parent directory would be hard to deal with for any filesystem. While modern filesystems use sorting and various tree algorithms for quick search for the needed files, even navigating to the folder with Windows Explorer or Midnight Commander or any other file manager will be complicated as the file manager would have to read contents of the directory. The same applies to file search. So subdirectories are preferred for this.
Yet I need to notice that access to particular file would be a bit faster when all files are in one directory than when they are separated into subdirectories at least on NTFS (measured this myself several times with 400K files).

I've been having a very similar issue with html files not images. Trying to store millions of them in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:
Reference: article

If you really want to use files, maybe your best bet is to partition the files off into several subdirectories so that you don't hit a limit. For example, if you have an ID 123456, you can put it in /12/34/56.jpg.
However, I would recommend just using the database to store this data since you are already using one. You can store the image data and ID in the same table, and you don't have to worry about some of the pesky business of dealing with files like making sure the permissions are set right, etc.

Make use of an unknown database file?

I have some database files I'd like to pull data from (and push to).
The first problem is that I don't know what format the database is in.
Each table (or object) seems to have a separate pair of files, such as ACCOUNT.FS5 and ACCOUNT.IDX. Some of them also have .SAV files.
A friend suggested that they are likely to be Flagship database files, presumably because of the FS5 extension. Edit: this is incorrect, they are not Flagship files, they are database files for the software 'EXACT'.
If this is the case, the second problem is that I don't know how I'd go about querying on these files. I have no schema per se, although the application is capable of exporting the data in csv format. Judging by the unfriendly nature of the csv, I'd imagine it to be pretty closely aligned to the database schema.
Any ideas?

If what you think is in these files, is not confidential, I would create the project on one of freelance sites, like "vWorker", and ask for a complete data extraction there.
You can as well specify the destination file format (say, .sqlite) you know how to deal with.
Hope it helped.
Regards

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight