Openstack - File system changing to read-only mode after 1 or 2 days of creating the instance. Not able to access any files after that - filesystems

Openstack - File system changing to read-only mode after 1 or 2 days of creating the instance. Not able to access any files after that. I am using Ubuntu 18.04 instances. I have installed openstack on my server which is also ubuntu 18.04. I have enough disk space. After this issues, when i try to remount , i am not able to do. what is the cause of this kind of issue and how to resolve this. Kindly help me in resolving this. Attaching log here:
[Recent Log - Part1][1]
[Recent Log - Part2][2]
[dmesg-log][3]
[syslog][4]
[1]: https://i.stack.imgur.com/ISbKl.jpg
[2]: https://i.stack.imgur.com/07Ekg.jpg
[3]: https://i.stack.imgur.com/24Cy8.jpg
[4]: https://i.stack.imgur.com/tf3WM.jpg

According to dmesg log the disk is corrupted (I/O error dev vda), and fsck doesn't seem to be able to correct the errors, and they're reappearing. If this instances is critical you should mount it read-only and try to save as much data as possible. You could also try to copy the disk with ddrescue to a different device.

Instance turns into Read Only file system and no files whatsoever can be edited, deleted or created
I cloned a volume of an instance, create snapshot -> create new volume -> launch new instance.
It turned out to be the case that both existing and new instance became Read-Only.
Logon to the affected instance(s) and run fsck command for the mounted disks mapped in fstab.
$ sudo cat /etc/fstab
/etc/fstab: static file system information.
Use 'blkid' to print the universally unique identifier for a
device; this may be used with UUID= as a more robust way to name devices
that works even if disks are added and removed. See fstab(5).
/ was on /dev/ubuntu-vg/ubuntu-lv during curtin installation
/dev/disk/by-id/dm-uuid-LVM-do30PLgpOVFMSMdAM4yk61c90yuy3biE4jAn171mXyGByntK0sddMxpZRL1WLPmq / ext4 defaults 0 0
/boot was on /dev/vda2 during curtin installation
/dev/disk/by-uuid/1b94eedc-be13-45b6-b7c1-f9892b69296e /boot ext4 defaults 0 0
/swap.img none swap sw 0 0
As you can notice the logical volume is mounted on root drive /
/dev/disk/by-id/dm-uuid-LVM-do30PLgpOVFMSMdAM4yk61c90yuy3biE4jAn171mXyGByntK0sddMxpZRL1WLPmq /
Therefore, run fsck command to fix
$ sudo fsck /dev/disk/by-id/dm-uuid-LVM-do30PLgpOVFMSMdAM4yk61c90yuy3biE4jAn171mXyGByntK0sddMxpZRL1WLPmq
*
fsck from util-linux 2.34 e2fsck 1.45.5 (07-Jan-2020)
/dev/mapper/ubuntu--vg-ubuntu--lv contains a file system with errors,
check forced. Pass 1: Checking inodes, blocks, and sizes Deleted inode
400138 has zero dtime. Fix? yes Inodes that were part of a
corrupted orphan linked list found. Fix? yes Inode 400139 was part
of the orphaned inode list. FIXED. Inode 400140 was part of the
orphaned inode list. FIXED. Inode 400141 was part of the orphaned
inode list. FIXED. Inode 400142 was part of the orphaned inode list.
FIXED. Inode 400333 was part of the orphaned inode list. FIXED. Inode
420207 was part of the orphaned inode list. FIXED. Pass 2: Checking
directory structure Pass 3: Checking directory connectivity Pass 4:
Checking reference counts Pass 5: Checking group summary information
Block bitmap differences: -(397395--397407) -3850237
-(3852815--3852822) Fix? yes Free blocks count wrong for group #12 (15295, counted=15308). Fix? yes Free blocks count wrong for group
#117 (5588, counted=5597). Fix? yes Free blocks count wrong (1462405, counted=1462427). Fix? yes Inode bitmap differences:
-(400138--400142) -400333 -420207 Fix? yes Free inodes count wrong for group #48 (0, counted=6). Fix? yes Free inodes count wrong for
group #51 (27, counted=28). Fix? yes Free inodes count wrong
(935590, counted=935597). Fix ('a' enables 'yes' to all) ? yes
/dev/mapper/ubuntu--vg-ubuntu--lv: ***** FILE SYSTEM WAS MODIFIED
***** /dev/mapper/ubuntu--vg-ubuntu--lv: ***** REBOOT SYSTEM ***** /dev/mapper/ubuntu--vg-ubuntu--lv: 309587/1245184 files (0.2%
non-contiguous), 3517285/4979712 blocks
$ sudo reboot
All looks normal now.
Thereafter extend volume from openstack to test further
pvresize works
vgextend works
lvextend works
resize2fs works

Related

How to securly set flags with open()?

In this question: What's the connection between flags and mode in open file function in C
In comments:
question
If I create a file using 'S_IRUSR' as the mode bit in the first
open(), then others call open() using ''O_RDONLY" as flag will not be
able to read the file
answer
The same user will be able to read the file. And, just a technicality,
it's not that the other users won't be able to read the file, it's
that they won't be able to open the file
So I have asked what is the point of setting permission when opening (resp. creating) new file, and that's why I am asking here.
There is also mentioned:
They may, as an extreme example, have write access to the directory
meaning they could rename your file, copy it to the original name,
have access to that then delete your renamed one. Permissions do work
but you have to ensure they're set up correctly (like securing the
entire path leading to a file, not just the file itself.
So I do not understand what "set flags correctly" means. And also still do not understand the purpose of permission setting, when it is not secure. Can someone explain a little bit?
So I have asked what is the point of setting permission when opening
(resp. creating) new file, and that's why I am asking here. There is
also mentioned:
They may, as an extreme example, have write access to the directory meaning they could rename your file, copy it to the original name, have access to that then delete your renamed one. permissions do work but you have to ensure they're set up correctly (like securing the entire path leading to a file, not just the file itself.
The comments presented in the quotation are partially incorrect. It is true that a person with write permission on the directory but no permissions at all on the file could rename the file or remove it (in / from that directory, see below). It is also true that having thereby cleared the way, the same person could create a new file, readable to them, with the other's original name. It is untrue, however, that such a person could copy the original file, whether to its original name or to any other, or could use such a maneuver to read the original file.
A key point here is that a directory entry for file F in directory D is just data in a special kind of file: a mapping from a name to an entry in the underlying filesystem. The entry is part of D, not part of F, so manipulating it requires access to D, not to F. What people commonly refer to as a file's name is not actually part of that file at all, nor is it necessarily unique, for the same underlying file can be linked to the directory tree in multiple places, with the same or different names.
A second key point is related to the first: the permissions of a file reside in the filesystem, not in directory listings, and they are effective at controlling access to the file's contents. This is in fact the same for directories as for other kinds of files: a user needs read permission on a given directory to read its contents, and needs write permission on it to modify its contents. Thus, files' access-control attributes do serve a useful purpose, and they need to be set appropriately for files to adequately serve their intended purpose.
The answer to the first part of the question is thus twofold:
A file's access-control attributes are an inherent part of that file's filesystem metadata, so there is no avoiding them being set to something when the file is created.
You cannot defer permission setting when you're opening a file (name) that does not yet exist but that you are willing to create, because the system will use the file's initial permissions to check access for opening the newly-created file. The appropriate permissions for the file are not derivable from the requested open mode.
The answer to the second part is much simpler: you must specify an access mode when you attempt to open a file because that's how and when the OS enforces access control. Additionally, you may specify a mode that is more restrictive than is permitted to you as an internal control against mistakenly performing unintended actions on the file.
I do not understand what "set flags correcly" means.
The comment seems clear enough to me, especially in light of the example attached to it, but perhaps it would be clearer to you if the larger statement containing it were rewritten as "you need to take these considerations into account, with respect not just to the file itself but also to every directory in the path to it, to ensure that file permissions correctly address your security goals".
I will try to explain it as short as possible. If you execute 'ls -la' somewhere in your file system, you can see something similar to the file list below. It displays the basics of permissions on file systems on Linux and similar operating systems. Running 'man chmod' explains it in detail; you should see 'man 3 chmod' function description as well.
Proper setting of permissions of files and directories, among other settings, is essential for security. Basically, user should have minimal privileges to accomplish his legal task.
drwx r-x r-x 2 root root 4096 Dec 14 23:55 .
drwx r-x r-x 24 root root 4096 Dec 5 19:02 ..
-rwx r-x r-x 1 root root 1113504 Jun 7 2019 file1
-rwx r-x r-x 1 root root 748968 Aug 29 2018 file2
-rwx r-x r-x 1 root root 34888 Jul 4 2019 file3
...
-rwx r-x r-x 1 root root 34888 Jul 4 2019 file_n
^^^ ^^^ ^^^
| | |
| | +---------------------- permissions other users
| +-------------------------- permissions for group members
+------------------------------ permissions for owner
d - is a directory
r - read permission
w - write perrmision
x - execute permission

How the getcwd is implemented in the kernel (library)?

One process could do
chdir("/to/some/where");
when from the another shell
mv /to/some/where /now/different/path/
the 1st process
print getcwd();
#prints /now/different/path/
How the getcwd is implemented? (at the lowest level, e.g. at the level of kernel, inodes ...).
I know how common (inode based) filesystem works, e.g. what contains the directory (name of the entries and the corresponding inode numbers).
EDIT
Probably the question was to vague - trying to refine it. One possible scenario (from what o knows)
the kernel knows the inode of the CWD for the given process (and his threads) - e.g. inode number 1000
reads the inode (gets the blocks what needs to read)
reads the corresponding blocks (e.g. opens the directory)
read the directory entries (name of the entries and the inode numbers)
gets the inode number for the .. parent directory (for example 900) and the inode number of the . (current directory)
reads the content of the parent directory where gets
the name of the previous directory (for the inode 1000)
the inode number of the parent directory
continue to 5. - until the root inode is reached.
Thats mean, the getcwd for
/some/very/very/very/deep/directory/level
tooks more raw IO operations (more directory entries need to read) as for the short
/tmp
where the whole getcwd is done by two readings?
Is this correct? or it is done in totally another way?
First, you asking on the wrong place. This question is more about the operating system, so the unix.stackexchange is the better place.
Anyway, your proposed solution is true for some ancient UNIX implementation (for example BSD 2.8) or like. That pathname resolution could be done as you described.
However, many problems arises - few of them:
as you said - too complicated pathname resolution (and yes, for the deeper directories needs more IO)
depends on the premise that only ONE ROOT directory exists. This isn't true from the BSD 4.2 where are introduced the per process root directory - what allows the chroot system call - what allow sets the root to any directory without showing the real path to the process. (One of the coolest FreeBSD feature are the jails - depends on this) (Also ancient linuxes have only one root - only in the 0.96c are introduced the VFS - virtual filesystem layer)
and permission problems - e.g. what happens when
#shell1
$ mkdir -p /tmp/some
$ cd /tmp/some
second shell
$ su
# mkdir -p /tmp/my
# chmod 700 /tmp/my
# mv /tmp/some /tmp/my/
the /tmp/my directory isn't readable for the first process. So, it can't determine the path, so how it should work with the files? So, in shell1 again:
$ pwd
/tmp/some #the original
$ echo $CWD
/tmp/some
$ /bin/pwd
pwd: .: Permission denied
But, you still can do for example
$ touch bob #works
e.g. the system allows you work in the "current" directory without let you know where are you. (in both scenarios e.g. in chroot and in the second one) ;)
That's mean than every process stores in his table the current working directory:
device number (e.g. hdd1 or hdd2)
inode number on the device
and
the kernel maintains another global table(s), (in linux called as dentry (directory entries)), - where the kernel maintaining the "inode" -> "path" mapping for every process, every opened file descriptor, and also indode caches (in the linux maintained by the kernel itself, BSD: job for the vnod driver) and like.
E.g. when some process asks for the pathname for the inode X, the kernel searches the dentry table, if the entry found - return immediately, if not - calls the lookup process, what doing the pathname resolution.
When for example the rename occurs, the kernel searched the dentry table, if found the entry and changes it as needed.
All above is extremely simplified, as you can see yourself, all above is highly OS dependent, the common base is defined by POSIX - but happens behind (e.g. the implementation) - you need really read the sources of the kernel and/or google for:
linux dentry
linux vfs
freebsd vnode
pathname resolution
and such.
Ps: for the nitpickers, :) - as i said - everything is over-simplyfied, so if you want correct and add more details - edit the answer - i converted it to "community wiki answer".
In current POSIX kernels like Linux (or *BSD-s) the current working directory (as a kernel inode) is part of the process state. So the in-kernel process descriptor (probably some struct task_struct on Linux) contains or refers to that cwd. Then getcwd is "simply" a syscall querying that.
The kernel inodes (for opened file descriptors, including working directories) are related to filesystems and are not the same as disk inodes.
Of course, the evil is in the details!
Key point: chdir() only affects the current process and any child processes launched after that - it is not a global state.

Clearing sector zero of a removable media device

I need to clear sector 0 for removable media devices (custom USB memory devices) which I have been trying to clear within a WPF/C# application. My first attempt was to use DD, but I ran into problems. During the manufacturing of the devices a MBR is created at sector 0 and the volume (logical?) starts at sector 40. When I issue the following command it clears sector 40 and not sector 0:
dd bs=512 count=1 if=/dev/zero of=\.\E:
I found another version of DD here which includes a wipe utility. I tried this version and I am seeing the same behavior. I am using both HxD and Runtime's DiskExplorer that sector 40 is being cleared and not sector 0. I could use HxD or Runtime's DiskExplorer, but this needs to be scriptable.
Does anyone know of any other methods of clearing (filling) sector 0 within Windows XP SP2?? Any help would be greatly appreciated. Thanks.
Mark
Solution: My solution used WMI to find the physical drive based upon the logical drive letter. First, query the Win32_LogicalDiskToPartition class to find the logical drive I am looking for. This provides the Antecedent field which constains something like '...DeviceID="Disk #X, Partition #Y"'. Next, I query Win32_DiskDriveToDiskPartition class while searching against the Dependent field to find the match for the Antecedent field within the Win32_LogicalDiskToPartition class. Once found, the Antecedent field from Win32_LogicalDiskToPartition will yield the physical drive. I selected atzz since it is the closes to my solution. I wanted to use Eugene's suggestion, but I only had a few hours to implement this so I selected the easier of the two. I will need to revisit this at a later time though.
There are two ways to format a USB drive, from Windows standpoint:
As a floppy disk. In this case entire USB drive contains a single file system, and its boot record is located in sector 0.
As a hard drive. In this case, sector 0 contains MBR with partition table. Actual file system(s) with their individual boot records are located further on the drive.
I think you are observing the second case. Using \.\E: to identify the device, you end up accessing file system's boot record instead of MBR.
Here is how you can access sector 0 of the USB drive.
Load WinObj from here.
In WinObj, under GLOBAL??, find E:. It will be a SymbolicLink pointing to something like \Device\Harddisk2\DP(1)0-0+30.
Under GLOBAL??, find a PhysicalDrive# symlink referring to the same Harddisk# that you found on step 2. Most probably it will have the same numeric suffix as Harddisk#. E.g.: SymbolicLink PhysicalDrive2 refers to \Device\Harddisk2\DR47.
Use the PhysicalDrive# you've found in DD command:
dd bs=512 count=1 if=\\.\PhysicalDrive2 of=mbr.dat
You are trying to clear logical device E: and not physical device. Try doing the following:
call CreateFile() WinAPI function to open "\\.\PhysicalDriveX" where X is the number of the device (see Remarks in description of CreateFile function for information about how to open the physical device properly). Then use WriteFile API function to write 512 bytes at offset 0 of the opened device.
If you get permission denied error when opening the device for writing, you can take our RawDisk product (trial version will work fine for you) which lets one bypass this security measure of Windows.
upd: As for calling CreateFile from C#, see PInvoke.net.

How many files can I put in a directory?

Does it matter how many files I keep in a single directory? If so, how many files in a directory is too many, and what are the impacts of having too many files? (This is on a Linux server.)
Background: I have a photo album website, and every image uploaded is renamed to an 8-hex-digit id (say, a58f375c.jpg). This is to avoid filename conflicts (if lots of "IMG0001.JPG" files are uploaded, for example). The original filename and any useful metadata is stored in a database. Right now, I have somewhere around 1500 files in the images directory. This makes listing the files in the directory (through FTP or SSH client) take a few seconds. But I can't see that it has any effect other than that. In particular, there doesn't seem to be any impact on how quickly an image file is served to the user.
I've thought about reducing the number of images by making 16 subdirectories: 0-9 and a-f. Then I'd move the images into the subdirectories based on what the first hex digit of the filename was. But I'm not sure that there's any reason to do so except for the occasional listing of the directory through FTP/SSH.
FAT32:
Maximum number of files: 268,173,300
Maximum number of files per directory: 216 - 1 (65,535)
Maximum file size: 2 GiB - 1 without LFS, 4 GiB - 1 with
NTFS:
Maximum number of files: 232 - 1 (4,294,967,295)
Maximum file size
Implementation: 244 - 26 bytes (16 TiB - 64 KiB)
Theoretical: 264 - 26 bytes (16 EiB - 64 KiB)
Maximum volume size
Implementation: 232 - 1 clusters (256 TiB - 64 KiB)
Theoretical: 264 - 1 clusters (1 YiB - 64 KiB)
ext2:
Maximum number of files: 1018
Maximum number of files per directory: ~1.3 × 1020 (performance issues past 10,000)
Maximum file size
16 GiB (block size of 1 KiB)
256 GiB (block size of 2 KiB)
2 TiB (block size of 4 KiB)
2 TiB (block size of 8 KiB)
Maximum volume size
4 TiB (block size of 1 KiB)
8 TiB (block size of 2 KiB)
16 TiB (block size of 4 KiB)
32 TiB (block size of 8 KiB)
ext3:
Maximum number of files: min(volumeSize / 213, numberOfBlocks)
Maximum file size: same as ext2
Maximum volume size: same as ext2
ext4:
Maximum number of files: 232 - 1 (4,294,967,295)
Maximum number of files per directory: unlimited
Maximum file size: 244 - 1 bytes (16 TiB - 1)
Maximum volume size: 248 - 1 bytes (256 TiB - 1)
I have had over 8 million files in a single ext3 directory. libc readdir() which is used by find, ls and most of the other methods discussed in this thread to list large directories.
The reason ls and find are slow in this case is that readdir() only reads 32K of directory entries at a time, so on slow disks it will require many many reads to list a directory. There is a solution to this speed problem. I wrote a pretty detailed article about it at: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/
The key take away is: use getdents() directly -- http://www.kernel.org/doc/man-pages/online/pages/man2/getdents.2.html rather than anything that's based on libc readdir() so you can specify the buffer size when reading directory entries from disk.
I have a directory with 88,914 files in it. Like yourself this is used for storing thumbnails and on a Linux server.
Listed files via FTP or a php function is slow yes, but there is also a performance hit on displaying the file. e.g. www.website.com/thumbdir/gh3hg4h2b4h234b3h2.jpg has a wait time of 200-400 ms. As a comparison on another site I have with a around 100 files in a directory the image is displayed after just ~40ms of waiting.
I've given this answer as most people have just written how directory search functions will perform, which you won't be using on a thumb folder - just statically displaying files, but will be interested in performance of how the files can actually be used.
It depends a bit on the specific filesystem in use on the Linux server. Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.
So speed shouldn't be an issue, other than the one you already noted, which is that listings will take longer.
There is a limit to the total number of files in one directory. I seem to remember it definitely working up to 32000 files.
Keep in mind that on Linux if you have a directory with too many files, the shell may not be able to expand wildcards. I have this issue with a photo album hosted on Linux. It stores all the resized images in a single directory. While the file system can handle many files, the shell can't. Example:
-shell-3.00$ ls A*
-shell: /bin/ls: Argument list too long
or
-shell-3.00$ chmod 644 *jpg
-shell: /bin/chmod: Argument list too long
I'm working on a similar problem right now. We have a hierarchichal directory structure and use image ids as filenames. For example, an image with id=1234567 is placed in
..../45/67/1234567_<...>.jpg
using last 4 digits to determine where the file goes.
With a few thousand images, you could use a one-level hierarchy. Our sysadmin suggested no more than couple of thousand files in any given directory (ext3) for efficiency / backup / whatever other reasons he had in mind.
For what it's worth, I just created a directory on an ext4 file system with 1,000,000 files in it, then randomly accessed those files through a web server. I didn't notice any premium on accessing those over (say) only having 10 files there.
This is radically different from my experience doing this on ntfs a few years back.
I've been having the same issue. Trying to store millions of files in a Ubuntu server in ext4. Ended running my own benchmarks. Found out that flat directory performs way better while being way simpler to use:
Wrote an article.
The biggest issue I've run into is on a 32-bit system. Once you pass a certain number, tools like 'ls' stop working.
Trying to do anything with that directory once you pass that barrier becomes a huge problem.
It really depends on the filesystem used, and also some flags.
For example, ext3 can have many thousands of files; but after a couple of thousands, it used to be very slow. Mostly when listing a directory, but also when opening a single file. A few years ago, it gained the 'htree' option, that dramatically shortened the time needed to get an inode given a filename.
Personally, I use subdirectories to keep most levels under a thousand or so items. In your case, I'd create 256 directories, with the two last hex digits of the ID. Use the last and not the first digits, so you get the load balanced.
If the time involved in implementing a directory partitioning scheme is minimal, I am in favor of it. The first time you have to debug a problem that involves manipulating a 10000-file directory via the console you will understand.
As an example, F-Spot stores photo files as YYYY\MM\DD\filename.ext, which means the largest directory I have had to deal with while manually manipulating my ~20000-photo collection is about 800 files. This also makes the files more easily browsable from a third party application. Never assume that your software is the only thing that will be accessing your software's files.
It absolutely depends on the filesystem. Many modern filesystems use decent data structures to store the contents of directories, but older filesystems often just added the entries to a list, so retrieving a file was an O(n) operation.
Even if the filesystem does it right, it's still absolutely possible for programs that list directory contents to mess up and do an O(n^2) sort, so to be on the safe side, I'd always limit the number of files per directory to no more than 500.
ext3 does in fact have directory size limits, and they depend on the block size of the filesystem. There isn't a per-directory "max number" of files, but a per-directory "max number of blocks used to store file entries". Specifically, the size of the directory itself can't grow beyond a b-tree of height 3, and the fanout of the tree depends on the block size. See this link for some details.
https://www.mail-archive.com/cwelug#googlegroups.com/msg01944.html
I was bitten by this recently on a filesystem formatted with 2K blocks, which was inexplicably getting directory-full kernel messages warning: ext3_dx_add_entry: Directory index full! when I was copying from another ext3 filesystem. In my case, a directory with a mere 480,000 files was unable to be copied to the destination.
"Depends on filesystem"
Some users mentioned that the performance impact depends on the used filesystem. Of course. Filesystems like EXT3 can be very slow. But even if you use EXT4 or XFS you can not prevent that listing a folder through ls or find or through an external connection like FTP will become slower an slower.
Solution
I prefer the same way as #armandino. For that I use this little function in PHP to convert IDs into a filepath that results 1000 files per directory:
function dynamic_path($int) {
// 1000 = 1000 files per dir
// 10000 = 10000 files per dir
// 2 = 100 dirs per dir
// 3 = 1000 dirs per dir
return implode('/', str_split(intval($int / 1000), 2)) . '/';
}
or you could use the second version if you want to use alpha-numeric characters:
function dynamic_path2($str) {
// 26 alpha + 10 num + 3 special chars (._-) = 39 combinations
// -1 = 39^2 = 1521 files per dir
// -2 = 39^3 = 59319 files per dir (if every combination exists)
$left = substr($str, 0, -1);
return implode('/', str_split($left ? $left : $str[0], 2)) . '/';
}
results:
<?php
$files = explode(',', '1.jpg,12.jpg,123.jpg,999.jpg,1000.jpg,1234.jpg,1999.jpg,2000.jpg,12345.jpg,123456.jpg,1234567.jpg,12345678.jpg,123456789.jpg');
foreach ($files as $file) {
echo dynamic_path(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>
1/1.jpg
1/12.jpg
1/123.jpg
1/999.jpg
1/1000.jpg
2/1234.jpg
2/1999.jpg
2/2000.jpg
13/12345.jpg
12/4/123456.jpg
12/35/1234567.jpg
12/34/6/12345678.jpg
12/34/57/123456789.jpg
<?php
$files = array_merge($files, explode(',', 'a.jpg,b.jpg,ab.jpg,abc.jpg,ddd.jpg,af_ff.jpg,abcd.jpg,akkk.jpg,bf.ff.jpg,abc-de.jpg,abcdef.jpg,abcdefg.jpg,abcdefgh.jpg,abcdefghi.jpg'));
foreach ($files as $file) {
echo dynamic_path2(basename($file, '.jpg')) . $file . PHP_EOL;
}
?>
1/1.jpg
1/12.jpg
12/123.jpg
99/999.jpg
10/0/1000.jpg
12/3/1234.jpg
19/9/1999.jpg
20/0/2000.jpg
12/34/12345.jpg
12/34/5/123456.jpg
12/34/56/1234567.jpg
12/34/56/7/12345678.jpg
12/34/56/78/123456789.jpg
a/a.jpg
b/b.jpg
a/ab.jpg
ab/abc.jpg
dd/ddd.jpg
af/_f/af_ff.jpg
ab/c/abcd.jpg
ak/k/akkk.jpg
bf/.f/bf.ff.jpg
ab/c-/d/abc-de.jpg
ab/cd/e/abcdef.jpg
ab/cd/ef/abcdefg.jpg
ab/cd/ef/g/abcdefgh.jpg
ab/cd/ef/gh/abcdefghi.jpg
As you can see for the $int-version every folder contains up to 1000 files and up to 99 directories containing 1000 files and 99 directories ...
But do not forget that to many directories cause the same performance problems!
Finally you should think about how to reduce the amount of files in total. Depending on your target you can use CSS sprites to combine multiple tiny images like avatars, icons, smilies, etc. or if you use many small non-media files consider combining them e.g. in JSON format. In my case I had thousands of mini-caches and finally I decided to combine them in packs of 10.
The question comes down to what you're going to do with the files.
Under Windows, any directory with more than 2k files tends to open slowly for me in Explorer. If they're all image files, more than 1k tend to open very slowly in thumbnail view.
At one time, the system-imposed limit was 32,767. It's higher now, but even that is way too many files to handle at one time under most circumstances.
What most of the answers above fail to show is that there is no "One Size Fits All" answer to the original question.
In today's environment we have a large conglomerate of different hardware and software -- some is 32 bit, some is 64 bit, some is cutting edge and some is tried and true - reliable and never changing.
Added to that is a variety of older and newer hardware, older and newer OSes, different vendors (Windows, Unixes, Apple, etc.) and a myriad of utilities and servers that go along.
As hardware has improved and software is converted to 64 bit compatibility, there has necessarily been considerable delay in getting all the pieces of this very large and complex world to play nicely with the rapid pace of changes.
IMHO there is no one way to fix a problem. The solution is to research the possibilities and then by trial and error find what works best for your particular needs. Each user must determine what works for their system rather than using a cookie cutter approach.
I for example have a media server with a few very large files. The result is only about 400 files filling a 3 TB drive. Only 1% of the inodes are used but 95% of the total space is used. Someone else, with a lot of smaller files may run out of inodes before they come near to filling the space. (On ext4 filesystems as a rule of thumb, 1 inode is used for each file/directory.)
While theoretically the total number of files that may be contained within a directory is nearly infinite, practicality determines that the overall usage determine realistic units, not just filesystem capabilities.
I hope that all the different answers above have promoted thought and problem solving rather than presenting an insurmountable barrier to progress.
I ran into a similar issue. I was trying to access a directory with over 10,000 files in it. It was taking too long to build the file list and run any type of commands on any of the files.
I thought up a little php script to do this for myself and tried to figure a way to prevent it from time out in the browser.
The following is the php script I wrote to resolve the issue.
Listing Files in a Directory with too many files for FTP
How it helps someone
I recall running a program that was creating a huge amount of files at the output. The files were sorted at 30000 per directory. I do not recall having any read problems when I had to reuse the produced output. It was on an 32-bit Ubuntu Linux laptop, and even Nautilus displayed the directory contents, albeit after a few seconds.
ext3 filesystem: Similar code on a 64-bit system dealt well with 64000 files per directory.
I respect this doesn't totally answer your question as to how many is too many, but an idea for solving the long term problem is that in addition to storing the original file metadata, also store which folder on disk it is stored in - normalize out that piece of metadata. Once a folder grows beyond some limit you are comfortable with for performance, aesthetic or whatever reason, you just create a second folder and start dropping files there...
Not an answer, but just some suggestions.
Select a more suitable FS (file system). Since from a historic point of view, all your issues were wise enough, to be once central to FSs evolving over decades. I mean more modern FS better support your issues. First make a comparison decision table based on your ultimate purpose from FS list.
I think its time to shift your paradigms. So I personally suggest using a distributed system aware FS, which means no limits at all regarding size, number of files and etc. Otherwise you will sooner or later challenged by new unanticipated problems.
I'm not sure to work, but if you don't mention some experimentation, give AUFS over your current file system a try. I guess it has facilities to mimic multiple folders as a single virtual folder.
To overcome hardware limits you can use RAID-0.
There is no single figure that is "too many", as long as it doesn't exceed the limits of the OS. However, the more files in a directory, regardless of the OS, the longer it takes to access any individual file, and on most OS's, the performance is non-linear, so to find one file out of 10,000 takes more then 10 times longer then to find a file in 1,000.
Secondary problems associated with having a lot of files in a directory include wild card expansion failures. To reduce the risks, you might consider ordering your directories by date of upload, or some other useful piece of metadata.
≈ 135,000 FILES
NTFS | WINDOWS 2012 SERVER | 64-BIT | 4TB HDD | VBS
Problem: Catastrophic hardware issues appear when a [single] specific folder amasses roughly 135,000 files.
"Catastrophic" = CPU Overheats, Computer Shuts Down, Replacement Hardware needed
"Specific Folder" = has a VBS file that moves files into subfolders
Access = the folder is automatically accessed/executed by several client computers
Basically, I have a custom-built script that sits on a file server. When something goes wrong with the automated process (ie, file spill + dam) then the specific folder gets flooded [with unmoved files]. The catastrophe takes shape when the client computers keep executing the script. The file server ends up reading through 135,000+ files; and doing so hundreds of times each day. This work-overload ends up overheating my CPU (92°C, etc.); which ends up crashing my machine.
Solution: Make sure your file-organizing scripts never have to deal with a folder that has 135,000+ files.
flawless,
flawless,
absolutely flawless :
( G. M. - RIP )
function ff () {
d=$1; f=$2;
p=$( echo $f |sed "s/$d.*//; s,\(.\),&/,g; s,/$,," );
echo $p/$f ;
}
ff _D_ 09748abcGHJ_D_my_tagged_doc.json
0/9/7/4/8/a/b/c/G/H/J/09748abcGHJ_D_my_tagged_doc.json
ff - gadsf12-my_car.json
g/a/d/s/f/1/2/gadsf12-my_car.json
and also this
ff _D_ 0123456_D_my_tagged_doc.json
0/1/2/3/4/5/6/0123456_D_my_tagged_doc.json
ff .._D_ 0123456_D_my_tagged_doc.json
0/1/2/3/4/0123456_D_my_tagged_doc.json
enjoy !

Get `df` to show updated information on FreeBSD

I recently ran out of disk space on a drive on a FreeBSD server. I truncated the file that was causing problems but I'm not seeing the change reflected when running df. When I run du -d0 on the partition it shows the correct value. Is there any way to force this information to be updated? What is causing the output here to be different?
In BSD a directory entry is simply one of many references to the underlying file data (called an inode). When a file is deleted with the rm(1) command only the reference count is decreased. If the reference count is still positive, (e.g. the file has other directory entries due to symlinks) then the underlying file data is not removed.
Newer BSD users often don't realize that a program that has a file open is also holding a reference. The prevents the underlying file data from going away while the process is using it. When the process closes the file if the reference count falls to zero the file space is marked as available. This scheme is used to avoid the Microsoft Windows type issues where it won't let you delete a file because some unspecified program still has it open.
An easy way to observe this is to do the following
cp /bin/cat /tmp/cat-test
/tmp/cat-test &
rm /tmp/cat-test
Until the background process is terminated the file space used by /tmp/cat-test will remain allocated and unavailable as reported by df(1) but the du(1) command will not be able to account for it as it no longer has a filename.
Note that if the system should crash without the process closing the file then the file data will still be present but unreferenced, an fsck(8) run will be needed to recover the filesystem space.
Processes holding files open is one reason why the newsyslog(8) command sends signals to syslogd or other logging programs to inform them they should close and re-open their log files after it has rotated them.
Softupdates can also effect filesystem freespace as the actual inode space recovery can be deferred; the sync(8) command can be used to encourage this to happen sooner.
This probably centres on how you truncated the file. du and df report different things as this post on unix.com explains. Just because space is not used does not necessarily mean that it's free...
Does df --sync work?

Resources