What is the most appropriate way to store files? - file

I am dealing with one problem, before starting developing, I decided to do some research.
So the problem is what would be the efficient solution when storing files?
I read about it, and some people was against storing files within database, as it will have negative impact on backups / restorations, it will add more processing time when reading database for large files and etc...
Good option would be to use S3 or any other cloud solution to store the files, but for this current customer cloud won't be good.
Another option would be to store files under file system. The concept is clear, but I try to understand what I need to understand before implementing that solution.
For example we need to consider how we structure directories, if we would store 100 000 files in one directory it can be come slow to open and etc. As well there is like maximum amount of files that can be stored in one directory.
Is there any 3rd party tools that helps to manage files in file system? that automatically partitions files and places them in directories?

I work with a software that have more than 10 million files in file system, how you will structure the folders depends, but what I did was:
Create a folder to each entity (Document, Image...)
Into each folder create a folder to each ID object with ID beign the name of the folder, and put their files inside, but this could vary.
Example to Person that have the ID 15:
ls /storage/Person/15/Image/
Will give me this 4 images that in the database I linked to person with the ID 15:
Output:
1.jpg
2.png
3.bmp
4.bmp
If you have a HUGE amount of elements, you cold separate each digit of an ID into a subfolder, that is: Person wih ID 16549 will have this path: /storage/Person/1/6/5/4/9/Image/
About limits of files in folder I suggest you to read this thread: https://stackoverflow.com/a/466596/12914069
About a 3rd party tool I don't know, but for sure you could build this logic into your code.
English isn't my first language, tell me if you didn't understand something.

Related

VB.NET Copying Database template files to selected folder location during installation

net project as well as a setup project. I also have it so that during installation it asks the users to enter a file location to store their database. the plan is to have an empty .mdf file, with all the tables setup, copied into that folder and I store the folder path in a config file.
this is mainly because I am planning on having multiple separate applications that all need the ability to access the same database. I have it storing the folder path in my config file the only thing I'm having trouble with is
storing the template files I don't know if i should do this in the setup project or main project
how to copy said template files into a new folder
so far I have been unable to find a solution so any help is appreciated
Well here is what I do this in a few of my projects - something that has proven reliable enough for me over the years (which you may or may want to do as well):
I have the program itself create the database files in an initialization routine. First however, it creates the sub folders in which the database files will be stored, if they don't already exist.
To do this, the program just checks if the folder exists and if the database file exists and if they do not, it creates them on the spot:
If Directory.Exists(gSQLDatabasePathName) Then
Else
Directory.CreateDirectory(gSQLDatabasePathName)
End If
If File.Exists(gSQLiteFullDatabaseName) Then
Else
...
I also have the program do some other stuff in the initialization routine, like creating an encryption key to be used when storing / retrieving the data - but that may be more than you need (also, for full disclosure, this has some rare issues that I haven't been able to pin down).
Here too are some addition considerations:
I appreciate you have said that you want to give the user the choice of where to store their database files. However, I would suggest storing them in the standard locations
Where is the correct place to store my application specific data?
and only allowing the users to move them if the really need to (for example if the database needs to be shared over the network) as it will make the support of your app harder if every user has their data stored in different places.
I have found letting the user see in their options/settings windows where their database is stored is a good idea.
Also to encourage them to back those files /directories up.
Also to create automatic backups of several generations for the user.
Hope this helps.

Storing files in multiple directories vs single directory

I am working on a web server where I have to store many image files of different cities.
There are two choices :
All files can be stored in the same directory or
Files of city1 can be stored in city1 directory, files of city2 in city2 directory and so on. Around 20 such directories will be created.
Does it cause any difference in the response time of the server, robustness of the file system, or any other factor? Which is the better choice?
I would argue for choice 2 because it is better for overview and extenibility. Additionally an existing filename in folder city1 can also be used in folder city2. In solution 1 you might have a collision of filenames then. There is no difference in response time worth mentioning. If you do read/write often the same files the server's cache will have a significant role for access time, which is the same either in solution 1 or 2.

File Management for Large Quantity of Files

Before I begin, I would like to express my appreciation for all of the insight I've gained on stackoverflow and everyone who contributes. I have a general question about managing large numbers of files. I'm trying to determine my options, if any. Here it goes.
Currently, I have a large number of files and I'm on Windows 7. What I've been doing is categorizing the files by copying them into folders based on what needs to be processed together. So, I have one set that contains the files by date (for long term storage) and another that contains the copies by category (for processing and calculations). Of course this doubles my data each time. Now I'm having to create more than one set of categories; 3 copies to be exact. This is quadrupling my data.
For the processing side of things, the data ends up in excel. Originally, all the data was brough into excel. Then all organization and filtering was performed in excel. This was time consuming and not easily maintainable over the long term. Later the work load was shifted to the file system itself, which lightened the work in excel.
The long and short of it is that this is an extremely inefficient use of disk space. What would be a better way of handling this?
Things that have come to mind:
Overlapping Folders
Is there a way to create a folder that only holds the addresses of a file, rather than copying the file. This way I could have two folders reference the same file.
To my understanding, a folder is a file listing the memory addresses of the files inside of it, but on Windows a file can only be contained in one folder.
Microsoft SQL Server
Not sure what could be done here.
Symbolic Links
I'm not an administrator, so I cannot execute the mklink command.
Also, I'm uncertain about any performance issues with this.
A Junction
Apparently not allowed for individual files, only folders in windows.
Search folders (*.search-ms)
Maybe I'm missing something, but to my knowledge there is no way to specify individual files to be listed.
Hashing the files
Creating hash tags for all the files, would allow for the files to be stored once. But then I have no idea how I would handle the hash tags.
XML
Maybe I could use xml files to attach meta data to the files and somehow search using them.
Database File System
I recently came across this concept in my search. Not sure how it would apply Windows.
I have found a partial solution. First, I discovered that the laptop I'm using is actually logged in as Administrator. As an alternative to options 3 and 4, I have decided to use hard-links, which are part of the NTFS file system. However, due to the large number of files, this is unmanageable using the following command from an elevated command prompt:
mklink /h <source\file> <target\file>
Luckily, Hermann Schinagl has created the Link Shell Extension application for Windows Explorer and a very insightful reading of how Junctions, Symbolic Links, and Hard Links work. The only reason that this is currently a partial solution, is due to a separate problem with Windows Explorer, which I intend to post as a separate question. Thank you Hermann.

What is better for performance - many files in one directory, or many subdirectories each with one file?

While building web applications often we have files associated with database entries, eg: we have a user table and each category has a avatar field, which holds the path to associated image.
To make sure there are no conflicts in filenames we can either:
rename files upon upload to ID.jpg; the path would be then /user-avatars/ID.jpg
or create a sub-directory for each entity, and leave the original filename intact; the path would be then /user-avatars/ID/original_filename.jpg
where ID is users's unique ID number
Both perfectly valid from application logic's point of view.
But which one would be better from filesystem performance point of view? We have to keep in mind that the number of category entries can be very high (milions).
Is there any limit to a number of sub-directories a directory can hold?
It's going to depend on your file system, but I'm going to assume you're talking about something simple like ext3, and you're not running a distributed file system (some of which are quite good at this). In general, file systems perform poorly over a certain number of entries in a single directory, regardless of whether those entries are directories or files. So no matter whether if you're creating one directory per image or one image in the root directory, you will run into scaling problems. If you look at this answer:
How many files in a directory is too many (on Windows and Linux)?
You'll see that ext3 runs into limits at about 32K entries in a directory, far fewer than you're proposing.
Off the top of my head, I'd suggest doing some rudimentary sharding into a multilevel directory tree, something like /user-avatars/1/2/12345/original_filename.jpg. (Or something appropriate for your type of ID, but I am interpreting your question to be about numeric IDs.) Doing that will also make your life easier later when you decide you want to distribute across a storage cluster, since you can spread the directories around.
Millions of entries (either files or directories) in one parent directory would be hard to deal with for any filesystem. While modern filesystems use sorting and various tree algorithms for quick search for the needed files, even navigating to the folder with Windows Explorer or Midnight Commander or any other file manager will be complicated as the file manager would have to read contents of the directory. The same applies to file search. So subdirectories are preferred for this.
Yet I need to notice that access to particular file would be a bit faster when all files are in one directory than when they are separated into subdirectories at least on NTFS (measured this myself several times with 400K files).
I've been having a very similar issue with html files not images. Trying to store millions of them in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:
Reference: article
If you really want to use files, maybe your best bet is to partition the files off into several subdirectories so that you don't hit a limit. For example, if you have an ID 123456, you can put it in /12/34/56.jpg.
However, I would recommend just using the database to store this data since you are already using one. You can store the image data and ID in the same table, and you don't have to worry about some of the pesky business of dealing with files like making sure the permissions are set right, etc.

Efficiency of searching files in a directory?

I am building a website with a user authentication system allowing each user to upload images to their account, essentially I am doing this as an experience in web development so please forgive my ignorance on the topic.
My question involves the efficiency of placing files into a directory. Is it more efficient to create a deeper directory structure or to place all files into one folder? The former seems obvious, but does it not depend on the search algorithm implemented by the file system?
For example:
root/user/2012/----------------A/
/2013/---------- A/ B/
/2014/------A/ B/ C/
B/ C/ D/
C/ D/
D/
Or dump all files into a single folder?
root/user/
When an image is retrieved, for example by an <img> tag, which way provides a more efficient result? I have searched Google for information on the topic, but couldn't find anything definitive or at my level of understanding.
Accessing a single file should be roughly equivalent. A single directory or multiple choice really depends on how you are trying to use the file listing. If you expect the user to have thousands of files and you only display a single year at a time, it may make sense to break up the directory structure into multiple sections to keep file listings manageable. If you always show all the files, I suspect the single folder may be faster, since you will have to run through the whole directory listing doing multiple file listings. I would do a few tests based on what you expect your app to have to deal with. My guess would be a single directory should be fine, unless you expect large numbers of files and you can break the listing down.
i dont know what OS you intend to run on, but i'd go with the multiple directories approach as some FSs (NTFS on windows, for example) slow down horribly when dealing with 10000+ files in a single directory

Resources