For loop: process files in different folders UNIX - file

I have tried to find an answer to my question looking at similar topics but didn't succeed. Maybe I have overlooked. Any help is appreciated!
So, I have hundreds of folders in my current directory named from folder1000 to folder1500. In each folder, I have one .fastq file with a different name (Lib1.fastq, Lib2.fastq, etc). I want to proceed each of these files in one loop command by running a shell script.
Here is my shell script (script.sh) for one file (it creates outputs which further proceeded) which I run in my Terminal:
#!/bin/sh
bowtie --threads 4 -v 2 -m 10 -a genome Lib1.fastq --sam > Lib1.sam
samtools view -h -o Lib1.sam Lib1.bam
sort -k 3,3 -k 4,4n Lib1.sam > Lib1.sam.sorted
# ...etc
Here is the loop I am trying to make as well in a shell script (here I have started only with a simple checking "head" command, and only with 5 first folders) which I run from my current directory where all folders are located:
#!/bin/sh
for file in ./folder{1000..1005}
do
head -10 *.fastq
done
But as a result I get:
head: *.fastq: No such file or directory
head: *.fastq: No such file or directory
head: *.fastq: No such file or directory
head: *.fastq: No such file or directory
head: *.fastq: No such file or directory
So, even a simple checking command does not work for me in a loop. Somehow I can not see the file. But if I run the command directly in one of the folders:
MacBook-Air-Maxim:folder1000 maxim$ head -10 *.fastq
then I get the correct result (the first 10 lines of the file displayed).
Could anyone suggest the way to process all files in the most convenient way?
Thanks a lot and very sorry, I am just learning.

Well, you are traversing through the folders using the variable $file, but you are not using this variable in the loop body. Just use it:
#!/bin/sh
for file in ./folder{1000..1005}
do
head -10 $file/*.fastq
done
There are other issues in the overall problem, but this is the answer to the point that is stopping you. Let's tackle the problems one by one :-)

Related

Stata on batch mode and log file

Say I have the following folder structure.
/foo
/bar1
code.do
/bar2
I want to run Stata on batch mode and have the log file generated inside /foo/bar2. What exact batch code should I run?
I'll give you examples that I tried and that didn't work. Right now the log file is being created as stata.log inside /foo. Also, I would like to run Stata on batch mode with -b, and not seeing the whole output on my GUI.
stata-se < "/foo/bar1/code.do" > "/foo/bar2"
stata-se "/foo/bar1/code.do" "/foo/bar2"
stata-se do "/foo/bar1/code.do" "/foo/bar2"
stata-se -b do "/foo/bar1/code.do" "/foo/bar2"
Both methods work for me. Below my exact terminal commands after creating your example directories:
Method 1
$ stata < /home/roberto/Desktop/foo/bar1/code.do > /home/roberto/Desktop/foo/bar2/code.log
Method 2
$ cd /home/roberto/Desktop/foo/bar2
$ stata -b /home/roberto/Desktop/foo/bar1/code.do
Notice that with Method 2, Stata will write the log file to the current directory. Just change it before running Stata.
Another option is to specify your log file inside of your do-file
log using /home/roberto/Desktop/foo/bar2/code.log, replace
Then you can run the file from batch mode without worrying about current directory

How to list directories on an OpenVMS volume

I've been searching Google as well as the OpenVMS System Administrator's Guide and User Guide, and still can't find anything regarding listing the directories present on an OpenVMS volume. I can't see how this could taken for granted in the docs, since everything else is very specific, so either I'm failing to see it or it can't be done. If it can't be done, then I'm missing some incredibly large chunk of the picture in regards to using VMS. Any suggestions are appreciated.
TIA,
grobe0ba
By "listing", I assume you mean via a command such as Dir...
To see all directories on a volume I would do something like,
$ dir volumeid:[000000...]*.dir
Of course, you need enough privilege to be able to see all the directories on the volume.
For a quick overview of all the directories you may also check out the /TOTAL option for 'directory'.
$ DIRE /TOTAL [*...]
Add /SIZE for effect (and slowdown)
You can of course post process to your hearts content...
$ pipe dir /total data:[*...] | perl -ne "print if /^Dir/"
Directory DATA:[CDC]
Directory DATA:[CDC.ALPHA]
Directory DATA:[CDC.ALPHA.V8_3]
$ pipe dir /total data:[*...] | searc sys$pipe "ory "
Directory DATA:[CDC]
Directory DATA:[CDC.ALPHA]
Directory DATA:[CDC.ALPHA.V8_3]
$ pipe dir /total data:[*...] | perl -ne "chomp; $x=$1 if /^Di.* (\S+)/; printf qq(%-60s%-19s\n),$x,$_ if /Tot/"
DATA:[CDC] Total of 7 files.
DATA:[CDC.ALPHA] Total of 1 file.
DATA:[CDC.ALPHA.V8_3] Total of 11 files.
Finally, if you are serious about playing with files and directories on OpenVMS, be sure to google for DFU OPENVMS ... download and enjoy.
Unfortunately I do not have the reputation required for commenting so I have to reformulate the answer.
#ChrisB
This answer while voted is not correct generally speaking. Directories are always files ending with .DIR and having a version of 1. Renaming a directory to *.DIR;x with x>1 will render the directory not traverseval. The DIR file however retains its directory characteristics and renaming it back to ;1 will return its normal behavior.
So one may add a ;1 to the DIR command
$ dir volumeid:[000000...]*.dir;1
But again this is not valid because any one may create *.DIR files which are not directories (ex. EDIT TEST.DIR), and there are applications out there doing so.
#Hein
So the second answer from Hein, which at this time does have 0 votes, is the corretc one. The one that does exactely the requested operation without 3rd party tool is:
$ PIPE DIR /TOTAL volume:[*...] | SEARCH SYS$PIPE "ory "
This command will only show valid directories

Moving/Grouping Files Unix

I have one folder with about 1000 files and I want to group them according to their resepctive parent folders.
I did ls- R > updated.txt to get the original setup of folders and files.
The updated. txt looks like this:
./Rhodococcus_RHA1:
NC_008268.fna
NC_008269.fna
NC_008270.fna
NC_008271.fna
./Rhodoferax_ferrireducens_T118:
NC_007901.fna
NC_007908.fna
./Rhodopseudomonas_palustris_BisA53:
NC_008435.fna
./Rhodopseudomonas_palustris_BisB18:
NC_007925.fna
./Rhodopseudomonas_palustris_BisB5:
NC_007958.fna
./Rhodopseudomonas_palustris_CGA009:
NC_005296.fna
NC_005297.fna
So, by looking at this file, I know what files go into what folder. The folder with all the 1000 files together looks like this:
results_NC_004193.fna.1.ebwt.map
results_NC_004307.fna.1.ebwt.map
results_NC_004310.fna.1.ebwt.map
results_NC_004311.fna.1.ebwt.map
results_NC_004337.fna.1.ebwt.map
results_NC_004342.fna.1.ebwt.map
results_NC_004343.fna.1.ebwt.map
results_NC_004344.fna.1.ebwt.map
and so on...
You can see that the filenames of all the 1000 files are dependent on their original names in the folder setup(if that's a good way to explain it).
I want to move these results_XXXXXXXX files to folders (have to create new folders) with the original setup. So it should be something like this:
./Rhodococcus_RHA1: (this is a folder)
results_NC_008268.fna.1.ebwt.map
results_NC_008269.fna.1.ebwt.map
results_NC_008270.fna.1.ebwt.map
results_NC_008271.fna.1.ebwt.map
./Rhodoferax_ferrireducens_T118:
results_NC_007901.fna.1.ebwt.map
results_NC_007908.fna.1.ebwt.map
I don't really know how to do this... maybe some kind of mov command? I'd appreciate help with this problem.
Run the following command from the folder where you have those 1000 files. The path/to/original/files is the path to the original files (the one that you did ls -R). you should get a list of mv commands. Verify several of them to confirm that those are correct. If so, add | sh next the command and rerun it to execute those commands. If you don't have all the corresponding files in the 1000 files folder, you would get mv commands that would return "file not found", that can be ignored or piped to /dev/null. This assumes that you always have a file in original folder so that it knows where to move the file. If not, some of those 1000 files won't be moved. As always, take a good backup before you do this.
find path/to/original/files -type f | awk -F"/" '{ path=$0; sub($NF, "", path); printf("mv results_%s.1.ebwt.map \"%s\"\n", $NF, path);}'

Delete all files except

I have a folder with a few files in it; I like to keep my folder clean of any stray files that can end up in it. Such stray files may include automatically generated backup files or log files, but could be a simple as someone accidentally saving to the wrong folder (my folder).
Rather then have to pick through all this all the time I would like to know if I can create a batch file that only keeps a number of specified files (by name and location) but deletes anything not on the "list".
[edit] Sorry when I first saw the question I read bash instead of batch. I don't delete the not so useful answer since as was pointed out in the comments it could be done with cygwin.
You can list the files, exclude the one you want to keep with grep and the submit them to rm.
If all the files are in one directory:
ls | grep -v -f ~/.list_of_files_to_exclude | xargs rm
or in a directory tree
find . | grep -v -f ~/.list_of_files_to_exclude | xargs rm
where ~/.list_of_files_to_exclude is a file with the list of patterns to exclude (one per line)
Before testing it make a backup copy and substitute rm with echo to see if the output is really what you want.
White lists for file survival is an incredibly dangerous concept. I would strongly suggest rethinking that.
If you must do it, might I suggest that you actually implement it thus:
Move ALL files to a backup area (one created per run such as a directory containing the current date and time).
Use your white list to copy back files that you wanted to keep, such as with copy c:\backups\2011_04_07_11_52_04\*.cpp c:\original_dir).
That way, you keep all the non-white-listed files in case you screw up (and you will at some point, trust me) and you don't have to worry about negative logic in your batch file (remove all files that _aren't of all these types), instead using the simpler option (move back every file that is of each type).

Cat selected files fast and easy?

I have been cat'ing files in the Terminal untill now.. but that is time consuming when done alot. What I want is something like:
I have a folder with hundreds of files, and I want to effectively cat a few files together.
For example, is there a way to select (in the Finder) five split files;
file.txt.001, file.txt.002, file.txt.003, file.txt.004
.. and then right click on them in the Finder, and just click Merge?
I know that isn't possible out of the box of course, but with an Automator action, droplet or shell script, is something like that possible to do? Or maybe assigning that cat-action a keyboard shortcut, and when hit selected files in the Finder, will be automatically merged together to a new file AND placed in the same folder, WITH a name based on the original split files?
In this example file.001 through file.004 would magically appear in the same folder, as a file named fileMerged.txt ?
I have like a million of these kind of split files, so an efficient workflow for this would be a life saver. I'm working on an interactive book, and the publisher gave me this task..
cat * > output.file
works as a sh script. It's piping the contents of the files into that output.file.
* expands to all files in the directory.
Judging from your description of the file names you can automate that very easily with bash. e.g.
PREFIXES=`ls -1 | grep -o "^.*\." | uniq`
for PREFIX in $PREFIXES; do cat ${PREFIX}* > ${PREFIX}.all; done
This will merge all files in one directory that share the same prefix.
ls -1 lists all files in a directory (if it spans multiple directories can use find instead. grep -o "^.*\." will match everything up to the last dot in the file name (you could also use sed -e 's/.[0-9]*$/./' to remove the last digits. uniq will filter all duplicates. Then you have something like speech1.txt. sound1.txt. in the PREFIXES variable. The next line loops through those and merges the groups of files individually using the * wildcard.

Resources