Im thinking of storing a large number of files in a folder and load them into my C# program. The problem I thought about was the performance when loading a file to read from, from a folder that contains very many files. Will the time to load and read from a file be about the same when there is one file in the chosen folder or one million? Does anyone know the complexity? ( O(1), O(n), O(n^2)? )
If you are only selected one file, it does not if there is one or a million files, as long as you pick the path correctly.
If you are trying to read all the files,and then search from that, then that is different :P
but if you have the exact file path to it, then no difference
Related
I have a long list of files that are auto-produced every month. they'll all have the same file name, with a sequential file extension like this:file.001, file.002, file.003
Each file has differing information, despite having the same name. What I need to do is copy them from their home directory and paste them into a new directory with names that reflect their purpose, and as text files, like this: Budget.txt, Expense.txt, Retention.txt
Is it possible to do this with a batch file? I've been unable to find a method that works. Any help would be appreciated.
EDIT: I've tried that solution, and it works as far as it goes. the frustrating thing is that the extensions are not always the same, but always sequentially numbered.
want to test the upload file size limit in an application, and it's a pain finding / making various pdf's of certain sizes to test / debug this. Anybody have a better way?
You can write a simple shell script that converts set of images to pdf: How can I convert a series of images to a PDF from the command line on linux? and do it for 1,2,3, ..., all image files in certain directory.
Creating directory full of copies of single image, should be simple too, start with one image file with desired size e.g. 64KB.
# pseudocode - don't test it
END=5
for i in {1..$END}; do cp ./image ./image_$i; done
for i in {1..$END}; do convert ./image_{1..$i} mydoc_$i.pdf; done
I've found an online tool, however, it seems to not be working correctly since it can only generate 10MB files even though you tell it to make a 50MB file.
https://www.fakefilegenerator.com/generate-file.php
I am wondering if there is any reason why the source code of AngularJS must be written in ONE huge file instead of breaking down into a set of files and then combined together to make one distribution file?
In c#, given a folder path, is there a way to get the last modified file without getting all files?
I need to quickly find folders that have been updated after a certain time and if the file that was last modified is before the time, i want to skip the folder entirely.
I noticed that folder's last modified time does not get updated when one of its file get updated so this approach does't work.
No, this is why windows comes with indexing to speed up searching. The NTFS file system wasn't designed with fast searching in mind.
In any case you can monitor file changes which is not difficult to do. If it is possible to allow your program to run in the background and monitor changes then this would work. If you needed past history you could do an initial scan only once and then build up your hierarchy from their. As long as your program is always being ran then it should have a current snapshot and not have to do the slow scan.
You can also use the Window Search itself to find the files. If indexing is available then it's probably as fast as you'll get.
Try this.
DirectoryInfo di = new DirectoryInfo("strPath");
DateTime dt = di.LastWriteTime;
Then you should use
Directory.EnumerateFiles(strPath, "*.*", SearchOption.TopDirectoryOnly);
Then loop the above collection and get FileInfo() for each file.
I don't see a way how can you get the modified date of a file w/o getting reference to FileInfo() on that file.
I don't think FileInfo will get this file as far as I know.
I have millions of audio files, generated based on GUId (http://en.wikipedia.org/wiki/Globally_Unique_Identifier). How can I store these files in the file-system so that I can efficiently add more files in the same file-system and can search for a particular file efficiently. Also it should be scalable in future.
Files are named based on GUId (unique file name).
Eg:
[1] 63f4c070-0ab2-102d-adcb-0015f22e2e5c
[2] ba7cd610-f268-102c-b5ac-0013d4a7a2d6
[3] d03cf036-0ab2-102d-adcb-0015f22e2e5c
[4] d3655a36-0ab3-102d-adcb-0015f22e2e5c
Pl. give your views.
PS: I have already gone through < Storing a large number of images >. I need the particular data-structure/algorithm/logic so that it can also be scalable in future.
EDIT1: Files are around 1-2 millions in number and file system is ext3 (CentOS).
Thanks,
Naveen
That's very easy - build a folder tree based on GUID values parts.
For example, make 256 folders each named after the first byte and only store there files that have a GUID starting with this byte. If that's still too many files in one folder - do the same in each folder for the second byte of the GUID. Add more levels if needed. Search for a file will be very fast.
By selecting the number of bytes you use for each level you can effectively choose the tree structure for your scenario.
I would try and keep the # of files in each directory to some manageable number. The easiest way to do this is name the subdirectory after the first 2-3 characters of the GUID.
Construct n level deep folder hierarchy to store your files. The names of the nested folders will the first n bytes of the corresponding file name. For example: For storing a file "63f4c070-0ab2-102d-adcb-0015f22e2e5c" in a four level deep folder hierarchy, construct 6/3/f/4 and place this file in this hierarchy. The depth of the hierarchy depends on the maximum number of files you can have in your system. For a few million files in my project 4 level deep hierarchy works well.
I also did the same thing in my project having nearly 1 million files. My requirement was also to process the files by traversing this huge list. I constructed a 4 level deep folder hierarchy and the processing time reduced from nearly 10 minutes to a few seconds.
An add on to this optimization can be that, if you want to process all the files present in these deep folder hierarchies, then instead of calling a function to fetch the list for the first 4 levels just precompute all the possible 4 level deep folder hierarchy names. Suppose the guid can have 16 possible characters then we will have 16 folders each at the first four levels, we can just precompute the 16*16*16*16 folder hierarchies which takes just a few ms. This save a lot of time if these large number of files are stored at a shared location and calling a function to fetch the list in a directory takes nearly a second.
Sorting the audio files into separate subdirectories may slower if dir_index is used on the ext3 volume. (dir_index: "Use hashed b-trees to speed up lookups in large directories.")
This command will set the dir_index feature: tune2fs -O dir_index /dev/sda1