OpenVMS ODS-5 freeblocks - filesystems

Our OpenVMS 8.3 ODS-5 machines, disks mounted as shadow set members sometimes lose freeblocks suddenly with no obvious cause. Adding up the FREEBLOCKS and the total size of all files on the disk gives a much lower total than the actual total available blocks on the disk. Can anyone suggest what might be causing this?
I have found that purging files will usually eliminate the issue but have no explanation for it and cannot find the file(s) causing it.
The machine is not in a cluster and ANALYZE/RMS told me, and others whom I consulted, nothing. All file versions were considered but it may be that dir/size needs to be qualified further. I am not aware of any temporary/scratch files but ideally I would like to find them if they exist. The shortfall between TOTALBLOCK-FREEBLOCKS and the output of dir/siz/grand [000000...] was approx 60 million blocks (about half the drive).
I am unfamiliar with DFU.

Don't worry. be happy. It is sure NOT to be a problem just a lack of understanding. ( Of course one could consider that in and of itself a bigger problem than an apparent mismatch in numbers. :-)
"Much lower" is almost meaningless. Everything is relative. How about some quantitative numbers.
Is this a cluster? Each cluster member can, and will have its own extend cache, possibly 10% of the free space each. Did you flush that/those before counting?
Were the ALLOCATED blocks counted, as one should, or perhaps just used blocks?
Were all versions of all files included in the count (since purge possibly changed the result)
Do the application on the system use TEMPORARY files which are not entered into a directory, and thus possibly not counted?
Have you considered enabling DISK QUOTA, just for the count, not to limit usage?
How about ANALYZE / DISK?
How about poking at the drive with DFU... highly recommended! Likely "Much faster" :-), and "Much more accurate" than anything DIRECTORY based.
Regards,
Hein.

Related

Why readdir() call in linux grows non-linearly

I have a directory with 1000 files and readdir() takes less than 1 sec, but 10000 files take around 24 sec.
Why? It should be linear.
Can anyone explain the reason.
And is there a better solution if only I need is to get the file and sub-directory names in a directory?
EDIT
I am on my local linux pc.
It might be file system specific. Perhaps using a suitably configured Ext4 or BTRFS file system should help. Some file systems are using hashing or B-tree techniques to make the complexity of file access in a directory of size N be O(log N), others are still linear e.g. O(N), and the kernel might do weird things above that.
The shell that you might use in your huge directories will generally sort entries when globbing (see also glob(7)). And you don't want its auto-completion to last many seconds on each keystroke!
I believe that you should never have huge directories (e.g. with more than a few hundred entries), so 10000 files in a single directory is unreasonable. If that is the case, you'll better organize your files differently, e.g. subdir01/file001.txt ... sbudir99/file999.txt
BTW, if your need is to have a lot of small things accessible by some textual key, using an indexed file (like gdbm) or a Sqlite "database", or a real database (PostGreSQL, MongoDb ...) is much more suitable, and probably more efficient. Don't forget to dump the data (probably in some textual format) for backup.
Notice that the documentation of readdir(3) on Linux, and of POSIX readdir do not mention any time complexity or any linear behavior. This lack of mention is significant.
On the commonly used FAT filesystem (e.g. on many USB keys) the time complexity is probably quadratic.
It has no reason to be linear. At lower level, a directory is like a file, a collection of clusters. If it is contained in one single cluster, you have only one actual physical read operation, the remaining occurs in memory. But when you directory becomes excessively large, you will have many physical reads. At this moment, as stated by Basile Starynkevitch, it becomes highly dependent on the file system structure.
But IMHO, if you want to browse the directory, it depends essentially on the number of clusters used by the directory. It is much more implementation dependant when you directly look for a file (by name) in a huge directory. Filsystems with linear search will have worse results than filesystems using natively hashing like for example BSD FFS.
All operations should be linear on a poor filesystem (e.g. FAT/FAT32 are O(N)).
Seeks, updates and deletes should be better than linear on a good filesystem like NTFS which is O(log N). A full directory listing will still be linear though.
In either case it should be much, much faster than what you have reported in both the small and large cases.
I suspect something else is going on. Very likely your results are biased by other factors than the directory structure, such as:
Disk has a hardware problem which is triggered in the large example but not the small one
Other disk activity from other parts of the system interrupts the test in the large case
Disk hardware pre-fetching. Disks contain RAM caches which will try to predict which sectors will be requested next, and have them ready.
Operating system cache. Operating systems will also cache data in a similar way.
You are possibliy doing something with the data other than just readdir and this other operation has higher time complexity which dominates.
Your application memory usage pattern is able to fit into L1 cache for small directories but not large ones.
Your application memory usage pattern forces swapping on large directories but not small ones.
readdir is at best linear. If we ignore everything that goes on in the filesystem, the amount of data (file names and other stuff in struct dirent) from the kernel into userland is directly proportional to the number of files. So we start with O(n).
Then the kernel needs to figure out which data to give you. At best it is linearly stored in something that looks like a file. This is what older file systems like FFS and EXT2 do. This gives good performance for readdir (because finding which disk block to give you is just an array lookup), but has the disadvantage that actually opening those files (open, stat or almost anything else that works with the file name) becomes an O(n) operation because every open has to linearly scan the directory to find the file name. This is why there has been so much work in caching directory data for those file systems. Even on those filesystems you might end up seeing that larger directories take longer to read per item because the way file information is stored gets more expensive with file size. Depending on your file (or directory) size the kernel might need to read between 1 and 5 other blocks from disk (or cache) to find out which block to give you.
If you have a different filesystem (most modern ones), they trade the convenience and speed of a linear directory for a more complex structure on disk which gives you a much better performance of open and stat (after all, why would you readdir if you don't intend to do anything with the files?), but as a consequence you end up (not necessarily, but most likely) with worse than linear time to actually perform a readdir because the operation to find out which disk block to read for your information might be O(log n).

Conceptual Ideas - Memory is limited for an application but need to pass more data

I have a situation as followed - (because of IP-right I cannot share technical details)
There are few individual embedded applications running as a part of a whole project.
Any of these applications can occpy maximum 9000 MB (9GB) of memory.
I am upgrading some application as per new requirement.
There are few tables with buffer length 32767 in each application with is passed to a network server for calculation using 15KHz frequency.
I need to make it double ie 65534 that will be passed to the network at the rate of 30KHz frequency.
The problem arises here -
One of these applications occupy 8094 MB (8GB+) so doubling the table buffer length goes beyond the maximum size of the application.
As a result the application output does not appear (but there is no crash).
My question is have you ever overcome such problem, could you share some idea how can I do memory management in this particular case? All these programs are written in cpp, perl, c and python (VxWorks, Linux, sunsolaris OS are used).
A quick reply is highly appreciated.
Thanks
It is very vague, but I'll try to answer to the point:
If your program needs larger tables due to whatever reasons, but cannot occupy more memory, you have to change something to compensate that.
You don't mention why you need larger tables:
If the length of the records has increased, try to reduce their number.
If you then can store a fewer number of entries, you'll have to send them quicker so that you don't have to store so much of them.
What you can do as well is do some compressing in RAM. That is dependent on the nature of the data, but in general, this might help you.

Using database instead of thousands of small files

At work, I have started working on a program that can potentially generate hundreds of thousands of mostly small files an hour. My predecessors have found out that working with many small files can become very slow, so they have resorted to some (in my opinion) crude methods to alleviate the problem.
So I asked my boss why won't we use a database instead and he gave me his oh-so-famous I-know-better-than-you look and told me obviously a database that big won't have a good performance.
My question is, is it really so? It seems to me that a database engine should be able to handle such data much better than the file system. Here are the conditions we have:
The program mostly writes data. Queries are much less frequent and their performance is not very important.
Millions of files could be generated every day. Most of these are small (a few kilobytes) but some can be huge.
If you think we should opt with the database solution, what open source database system do you think will work best? (If I decide that a database will certainly work better, I'm going to push for a change whatever the boss says!)
This is another one of those "it depends" type questions.
If you are just writing data (write once, read hardly ever) then just use the file system. Maybe use a hash-directory approach to create lots of sub-directories (things tend to go slowly with many files in a single directory.
If you are writing hundreds of thousands of events for later querying (e.g. find everything with X > 10 and Y < 11) then a database sounds like a great idea.
If you are writing hundreds of thousands of bits of non-relational data (e.g. simple key-value pairs) then it might be worth investigating a NoSQL approach.
The best approach is probably to prototype all the ideas you can think of, measure and compare!
As a minimal impact improvement, I'd split your millions of small files into a heirachy of directories. So say you were using uuids as your file names, I'd stip out the redundant urn:uuid: at the front, and then make 16 directories based on the first letter, and inside them make 16 subdirectories based on the second letter, and add even more levels if you need it. That alone will speed up the access quite a bit. Also, I would remove the directory whenever it became empty, to make sure the directory entry itself doesn't grow larger and larger.

File descriptor limits and default stack sizes

Where I work we build and distribute a library and a couple complex programs built on that library. All code is written in C and is available on most 'standard' systems like Windows, Linux, Aix, Solaris, Darwin.
I started in the QA department and while running tests recently I have been reminded several times that I need to remember to set the file descriptor limits and default stack sizes higher or bad things will happen. This is particularly the case with Solaris and now Darwin.
Now this is very strange to me because I am a believer in 0 required environment fiddling to make a product work. So I am wondering if there are times where this sort of requirement is a necessary evil, or if we are doing something wrong.
Edit:
Great comments that describe the problem and a little background. However I do not believe I worded the question well enough. Currently, we require customers, and hence, us the testers, to set these limits before running our code. We do not do this programatically. And this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault.
So rewording the question, is requiring the customer to change these ulimit values to run our software to be expected on some platforms, ie, Solaris, Aix, or are we as a company making it to difficult for these users to get going?
Bounty:
I added a bounty to hopefully get a little more information on what other companies are doing to manage these limits. Can you set these pragmatically? Should we? Should our programs even be hitting these limits or could this be a sign that things might be a bit messy under the covers? That is really what I want to know, as a perfectionist a seemingly dirty program really bugs me.
If you need to change these values in order to get your QA tests to run, then that is not too much of a problem. However, requiring a customer to do this in order for the program to run should (IMHO) be avoided. If nothing else, create a wrapper script that sets these values and launches the application so that users will still have a one-click application launch. Setting these from within the program would be the preferable method, however. At the very least, have the program check the limits when it is launched and (cleanly) error out early if the limits are too low.
If a software developer told me that I had to mess with my stack and descriptor limits to get their program to run, it would change my perception of the software. It would make me wonder "why do they need to exceed the system limits that are apparently acceptable for every other piece of software I have?". This may or may not be a valid concern, but being asked to do something that (to many) can seem hackish doesn't have the same professional edge as an program that you just launch and go.
This problem seems even worse when you say "this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault". A program exceeding these limits is one thing, but a program that doesn't gracefully handle the error conditions resulting from exceeding these limits is quite another. If you hit the file handle limit and attempt to open a file, you should get an error indicating that you have too many files open. This shouldn't cause a program crash in a well-designed program. It may be more difficult to detect stack usage issues, but running out of file descriptors should never cause a crash.
You don't give much details about what type of program this is, but I would argue that it's not safe to assume that users of your program will necessarily have adequate permissions to change these values. In any case, it's probably also unsafe to assume that nothing else might change these values while your program is running without the user's knowledge.
While there are always exceptions, I would say that in general a program that exceeds these limits needs to have its code re-examined. The limits are there for a reason, and pretty much every other piece of software on your system works within those limits with no problems. Do you really need that many files open at the same time, or would it be cleaner to open a few files, process them, close them, and open a few more? Is your library/program trying to do too much in one big bundle, or would it be better to break it into smaller, independent parts that work together? Are you exceeding your stack limits because you are using a deeply-recursive algorithm that could be re-written in a non-recursive manner? There are likely many ways in which the library and program in question can be improved in order to ease the need to alter the system resource limits.
The short answer is: it's normal, but not inflexible. Of course, limits are in place to prevent rogue processes or users from starving the system of resources. Desktop systems will be less restrictive than server systems but still have certain limits (e.g. filehandles.)
This is not to say that limits cannot be altered in persistent/reproduceable manners, either by the user at the user's discretion (e.g. by adding the relevant ulimit calls in .profile) or programatically from within programs/libraries which know with certitude that they will require large amounts of filehandles (e.g. setsysinfo(SSI_FD_NEWMAX,...)), stack (provided at pthread creation time), etc.
On Darwin, the default soft limit on the number of open files is 256; the default hard limit is unlimited.
AFAICR, on Solaris, the default soft limit on the number of open files is 16384 and the hard limit is 32768.
For stack sizes, Darwin has soft/hard limits of 8192/65536 KB. I forget what the limit is on Solaris (and my Solaris machine is unavailable - power outages in Poughkeepsie, NY mean I can't get to the VPN to access the machine in Kansas from my home in California), but it is substantial.
I would not worry about the hard limits. If I thought the library might run out of 256 file descriptors, I'd increase the soft limit on Darwin; I would probably not bother on Solaris.
Similar limits apply on Linux and AIX. I can't answer for Windows.
Sad story: a few years ago now, I removed the code that changed the maximum file size limit in a program - because it had not been changed from the days when 2 MB was a big file (and some systems had a soft limit of just 0.5 MB). Once upon a decade and some ago, it actually increased the limit; when it was removed, it was annoying because it reduced the limit. Tempus fugit and all that.
On SuSE Linux (SLES 10), the open files limits are 4096/4096, and the stack limits are 8192/unlimited.
As you have to support a large number of different systems i would consider it wise to setup certain known to be good values for system limits/resources because the default values can differ wildly between systems.
The default size for pthread stacks is for example such a case. I recently had to find out that the default on HPUX11.31 is 256KB(!) which isn't very reasonable at least for our applications.
Setting up well defined values increases the portability of an application as you can be sure that there are X file descriptors, a stack size of Y, ... on every platform and that things are not just working by good luck.
I have the tendency to setup such limits from within the program itself as the user has less things to screw up (someone always tries to run the binary without the wrapper script). To optionally allow for runtime customization environment variables could be used to override the defaults (still enforcing the minimum limits).
Lets look at it this way. It is not very customer friendly to require customers to set these limits. As detailed in the other answers, you are most likely to hit soft limits and these can be changed. So change them automatically, if necessary in a script that starts the actual application (you can even write it so that it fails if the hard limits are too low and produce a nice error message instead of a segfault).
That's the practical part of it. Without knowing what the application does I'm a bit at a guess, but in most cases you should not be anywhere close to hitting any of the default limits of (even less progressive) operating systems. Assuming the system is not a server that is bombarded with requests (hence the large amount of file/socket handles used) it is probably a sign of sloppy programming. Based on experience with programmers, I would guess that file descriptors are left open for files that are only read/written once, or that the system keeps open a file descriptor on a file that is only sporadically changed/read.
Concerning stack sizes, that can mean two things. The standard cause of a program running out of stack is excessive recursion (or unbounded recursion), which is an error condition that the limits actually are designed to address. The second thing is that some big (probably configuration) structures are allocated on the stack that should be allocated in heap memory. It might even be worse and those huge structures are being passed around by value (instead of reference) and that would mean a big hit on available (wasted) stack space as well as a big performance penalty.
A small tip : If you plan to run the application over 64 bit processor, then please be careful about setting stacksize unlimited. Which in 64 Bit Linux system give -1 as stacksize.
Thanks
Shyam
Perhaps you could add whatever is appropriate to the start script, like 'ulimit -n -S 4096'.
But having worked with Solaris since 2.6, its not unusual to modify rlim_fd_cur and rlim_fd_max in /etc/system permanently. In older versions of Solaris, they're just too low for some workloads, like running webservers.

How to programmatically really clean Delete files?

So you are about to pass your work-computer to some of your colleague. How do you make sure you really delete all your personal data?
Re-formatting, Re-installing OS will not really solve the problem.
I searched around and found some programs does "Wipe out" disks.
This caught me thinking how does those programs work?
I mean, What algorithms they use and how low level those implementations go?
Any ideas?
Most of those programs do a "secure delete" by overwriting the file bits with random noise.
The biggest problem has more to do with the actual implementation of hard drives and file systems than anything else. Fragmentation, caching, where the data actually is that you're trying to overwrite: that's the big problem . And it's a very low-level problem -- driver level, really. You're not going to be able to do it with Python, C#, or Java.
Once that problem is solved, there's the one of physical media. Because of the nature of magnetic media, it's very frequently possible to read the previous bits that were once on the hard drive -- even if you overwrote them with a different bit. "Secure delete" programs solve this problem by overwriting several times -- preferably a random but suitably large number of times.
Further Reading:
Data Erasure
Data Remanence
The Great Zero Challenge (provided by #Stefano Borini -- vote him up!)
Safe delete programs overwrite the file multiple times with random patterns of data, so that even residual magnetization cannot be picked up and is lost in the noise.
However, assuming that the great zero challenge has some truth in it, I think you can just fill the file/disk with zeros and call yourself happy, as this residual magnetization is practically impossible to pick even with professional setup.
As far as I know most tools do this with X writes and deletes, where X is some suitably large number. The best way to do this is probably to interface with the hardware at some level, although a cheap and easy way would be to create files until the disk is full, writing random data, delete them, create new files and repeat.
Its all paranoia anyway. Just deleting a file is usually much more than enough...

Resources