Fastest Method for Counting Files in Directory Hierarchy - file

Quite simply I want to get a count of files in a directory (and all nested sub-directories) as quickly as possible.
I know how to do this using find paired with wc -l and similar methods, however these are extremely slow and they pass through each file entry in each directory and count them that way.
Is this the fastest method, or are there alternatives? For example; I don't need to find specific types of files, so I'm fine with grabbing symbolic-links, hidden files etc. if I can get the file-count more quickly by counting everything with no further processing involved.

The fastest methos is to use locate + wc or similar. It can't be faster. The main disatvantage of the method that it counts not actual files, but the files that are in the locate's database. And this database can be alread 1 day old.
So it depends on your task: if it tolerates delays, I would prefer locate.
On my superfast SSD-based machine:
$ time find /usr | wc -l
156610
real 0m0.158s
user 0m0.076s
sys 0m0.072s
$ time locate /usr | wc -l
156612
real 0m0.079s
user 0m0.068s
sys 0m0.004s
On a normal machine the difference will be much much bigger.
How often is the locate database updated depends on the configuration of the host.
By default, it is updated each day (it is made using cron). But you can configure the system so, that the script will run every hour or even frquently. Of course, you can run it not periodically, but on demand (I thank you William Pursell for the hint).

Try this script as an alternative:
find . -type d -exec bash -c 'd="{}"; arr=($d/*); echo "$d:${#arr[#]}"' \;
In my quick basic testing it came faster than wc -l

Related

ClearCase VOBs not being used for a long time

I administer a SCM environment with ClearCase, that has a lot of VOBS.
Many of these VOBS are not used since a long time ago. I would like to know whether it is possible to determine the last modification time on these vobs.
Another doubt is: if I only unregister these VOBS, the CPU and Memory consumption will decrease in the VOB Server?
In theory, to put these vobs online again, I will only have to run a register command, right?
Is there any other approach that you guys could recommend to me to manage this scenario (VOBs not being used for a long time)?
Many of these VOBS are not used since a long time ago. I would like to know whether it is possible to determine the last modification time on these vobs.
You can try and use cleartool lshis -all on a vob tag.
I had a script which filtered the last events with:
cleartool lshis -fmt "%Xn\t%Sd\t%e\t%h\t%u \n" -since 01-Oct-2010 -all <vobname>| grep -v lock | head -1 | grep -o '20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]'
Another doubt is: if I only unregister these VOBS, the CPU and Memory consumption will decrease in the VOB Server?
Yes, because there wouldn't be anymore vob_server process associated with that vob.
In theory, to put these vobs online again, I will only have to run a register command, right?
Yes, although I prefer unregister/rmtag (as in "Removing ClearCase vobs") before registering and mktagging.

How to list directories on an OpenVMS volume

I've been searching Google as well as the OpenVMS System Administrator's Guide and User Guide, and still can't find anything regarding listing the directories present on an OpenVMS volume. I can't see how this could taken for granted in the docs, since everything else is very specific, so either I'm failing to see it or it can't be done. If it can't be done, then I'm missing some incredibly large chunk of the picture in regards to using VMS. Any suggestions are appreciated.
TIA,
grobe0ba
By "listing", I assume you mean via a command such as Dir...
To see all directories on a volume I would do something like,
$ dir volumeid:[000000...]*.dir
Of course, you need enough privilege to be able to see all the directories on the volume.
For a quick overview of all the directories you may also check out the /TOTAL option for 'directory'.
$ DIRE /TOTAL [*...]
Add /SIZE for effect (and slowdown)
You can of course post process to your hearts content...
$ pipe dir /total data:[*...] | perl -ne "print if /^Dir/"
Directory DATA:[CDC]
Directory DATA:[CDC.ALPHA]
Directory DATA:[CDC.ALPHA.V8_3]
$ pipe dir /total data:[*...] | searc sys$pipe "ory "
Directory DATA:[CDC]
Directory DATA:[CDC.ALPHA]
Directory DATA:[CDC.ALPHA.V8_3]
$ pipe dir /total data:[*...] | perl -ne "chomp; $x=$1 if /^Di.* (\S+)/; printf qq(%-60s%-19s\n),$x,$_ if /Tot/"
DATA:[CDC] Total of 7 files.
DATA:[CDC.ALPHA] Total of 1 file.
DATA:[CDC.ALPHA.V8_3] Total of 11 files.
Finally, if you are serious about playing with files and directories on OpenVMS, be sure to google for DFU OPENVMS ... download and enjoy.
Unfortunately I do not have the reputation required for commenting so I have to reformulate the answer.
#ChrisB
This answer while voted is not correct generally speaking. Directories are always files ending with .DIR and having a version of 1. Renaming a directory to *.DIR;x with x>1 will render the directory not traverseval. The DIR file however retains its directory characteristics and renaming it back to ;1 will return its normal behavior.
So one may add a ;1 to the DIR command
$ dir volumeid:[000000...]*.dir;1
But again this is not valid because any one may create *.DIR files which are not directories (ex. EDIT TEST.DIR), and there are applications out there doing so.
#Hein
So the second answer from Hein, which at this time does have 0 votes, is the corretc one. The one that does exactely the requested operation without 3rd party tool is:
$ PIPE DIR /TOTAL volume:[*...] | SEARCH SYS$PIPE "ory "
This command will only show valid directories

Delete all files except

I have a folder with a few files in it; I like to keep my folder clean of any stray files that can end up in it. Such stray files may include automatically generated backup files or log files, but could be a simple as someone accidentally saving to the wrong folder (my folder).
Rather then have to pick through all this all the time I would like to know if I can create a batch file that only keeps a number of specified files (by name and location) but deletes anything not on the "list".
[edit] Sorry when I first saw the question I read bash instead of batch. I don't delete the not so useful answer since as was pointed out in the comments it could be done with cygwin.
You can list the files, exclude the one you want to keep with grep and the submit them to rm.
If all the files are in one directory:
ls | grep -v -f ~/.list_of_files_to_exclude | xargs rm
or in a directory tree
find . | grep -v -f ~/.list_of_files_to_exclude | xargs rm
where ~/.list_of_files_to_exclude is a file with the list of patterns to exclude (one per line)
Before testing it make a backup copy and substitute rm with echo to see if the output is really what you want.
White lists for file survival is an incredibly dangerous concept. I would strongly suggest rethinking that.
If you must do it, might I suggest that you actually implement it thus:
Move ALL files to a backup area (one created per run such as a directory containing the current date and time).
Use your white list to copy back files that you wanted to keep, such as with copy c:\backups\2011_04_07_11_52_04\*.cpp c:\original_dir).
That way, you keep all the non-white-listed files in case you screw up (and you will at some point, trust me) and you don't have to worry about negative logic in your batch file (remove all files that _aren't of all these types), instead using the simpler option (move back every file that is of each type).

Clearmake rules for handling stamp file and siblings from a large monolith command?

Given a build process like this:
Run a VerySlowProcess that produces one output file for each given input file.
Run a SecondProcess on each output file from VerySlowProcess
VerySlowProcess is slow to start, but can handle additional input files without much extra delay, therefore it is invoked with several input files. VerySlowProcess may access additional files referenced from the input files, but we can not match the file accesses to specific input files, and therefore all derived output files from VerySlowProcess will get the same Configuration Record by clearmake.
Since VerySlowProcess is invoked with several input files (inlcuding input files that has not changed) many of the output files are overwritten again with identical content. In those cases it would be uneccesery to execute SecondProcess on them and therefore output is written to a temporary file, that is only copied to the real file if the content has actually changed.
Example Makefile:
all: a.3 b.3
2.stamp:
#(echo VerySlowProcess simulated by two cp commands)
#(cp a.1 a.2_tmp)
#(cp b.1 b.2_tmp)
#(diff -q a.2_tmp a.2 || (echo created new a.2; cp a.2_tmp a.2))
#(diff -q b.2_tmp b.2 || (echo created new b.2; cp b.2_tmp b.2))
#(touch $#)
%.3: %.2 2.stamp
#(echo Simulating SecondProcess creating $#)
#(cp $< $#)
If only a.1 is changed only a.2 is written, but SecondProcess is still executed also for b:
> clearmake all
VerySlowProcess simulated by two cp commands
Files a.2_tmp and a.2 differ
created new a.2
Simulating SecondProcess creating a.3
Simulating SecondProcess creating b.3
As a workaround we can remove the '2.stamp' from the '%.3' dependencies, then it work to execute like this:
> clearmake 2.stamp && clearmake all
VerySlowProcess simulated by two cp commands
Files a.2_tmp and a.2 differ
created new a.2
Simulating SecondProcess creating a.3
Is there a better way to handle our problem with VerySlowProcess?
Your workaround seems valid.
The only other use of clearmake for supporting "incremental update" is presented here, but I am not sure if it applies in your case.
Incremental updating means that a compound object, such as a library is partially updated by the rebuild of one or more of its components, as opposed to being generated by the build of just one target.
The use of the .INCREMENTAL_TARGET is of importance here.
This special target tells clearmake to merge the entries of the target's previous configuration record with those of the latest build.
This way the build history of the object is not lost, because the object's configuration record is not completely overwritten every time the object gets modified.
Here's an alterative scenario though similar problem...perhaps...though your description does not quite match my scenario.
I have a very slow process that may change certain files but will regenerate all the files.
I want to avoid the slow process and also want to avoid updating the files that do not change.
Checked if regeneration (slow process) is necessary - this logic needed to be separated out from the makefile into a shell script since there are issues with clearmake with targets being updated and .INCREMENTAL could not help resolve.
Logic Overview:
If the md5sums.txt file is empty or the md5sums do not match
then long process if invoked.
To check md5sums:
md5sum -c md5sums.txt
To build slow target:
clearmake {slowTarget}
this will generate to a temp dir and afterwards update the changed elements
To regenerate md5sums:
checkout md5sums.txt
cleartool catcr -ele -s {slowTarget} | sed '1,3d;s/\\/\//g;s/##.*//;s/^.//;' | xargs -i md5sum {} > md5sums.txt
checkin md5sums.txt

Cat selected files fast and easy?

I have been cat'ing files in the Terminal untill now.. but that is time consuming when done alot. What I want is something like:
I have a folder with hundreds of files, and I want to effectively cat a few files together.
For example, is there a way to select (in the Finder) five split files;
file.txt.001, file.txt.002, file.txt.003, file.txt.004
.. and then right click on them in the Finder, and just click Merge?
I know that isn't possible out of the box of course, but with an Automator action, droplet or shell script, is something like that possible to do? Or maybe assigning that cat-action a keyboard shortcut, and when hit selected files in the Finder, will be automatically merged together to a new file AND placed in the same folder, WITH a name based on the original split files?
In this example file.001 through file.004 would magically appear in the same folder, as a file named fileMerged.txt ?
I have like a million of these kind of split files, so an efficient workflow for this would be a life saver. I'm working on an interactive book, and the publisher gave me this task..
cat * > output.file
works as a sh script. It's piping the contents of the files into that output.file.
* expands to all files in the directory.
Judging from your description of the file names you can automate that very easily with bash. e.g.
PREFIXES=`ls -1 | grep -o "^.*\." | uniq`
for PREFIX in $PREFIXES; do cat ${PREFIX}* > ${PREFIX}.all; done
This will merge all files in one directory that share the same prefix.
ls -1 lists all files in a directory (if it spans multiple directories can use find instead. grep -o "^.*\." will match everything up to the last dot in the file name (you could also use sed -e 's/.[0-9]*$/./' to remove the last digits. uniq will filter all duplicates. Then you have something like speech1.txt. sound1.txt. in the PREFIXES variable. The next line loops through those and merges the groups of files individually using the * wildcard.

Resources