Script for renameing special characters files and directories - file

I am looking for a script to rename files and directories that have special characters in them.
My files:
?rip?ev <- Directory
- Juhendid ?rip?evaks.doc <- Document
- ?rip?ev 2 <- Subdirectory
-- t?ts?.xml <- Subdirectory file
They need to be like this:
ripev <- Directory
- Juhendid ripevaks.doc <- Document
- ripev 2 <- Subdirectory
-- tts.xml <- Subdirectory file
I need to change the files and the folders so that the filetype stays the same as it is for example .doc and .xml wont be lost. Last time I did it with rename it lost every filetype and the files were moved to mother directory in this case ?rip?ev directory and subdirectories were empty. Everything was located under the mother directory /home/samba/.
So in this case I need just to rename the question mark in the file name and directory name, but not to move it anywhere else or lose any other character or the filetype. I have been looking around google for a answer but haven't found one. I know it can be done with find and rename, but haven't been able to over come the complexity of the script. Can anyone help me please?

You can just do something like this
find -name '*\?*' -exec bash -c 'echo mv -iv "$0" "${0//\?/}"' {} \;
Note the echo before the mv so you can see what it does before actually changing anything. Otherwise above:
searches for ? in the name (? is equivalent to a single char version of * so needs to be escaped)
executes a bash command passing the {} as the first argument (since there is no script name it's $0 instead of $1)
${0//\?/} performs parameter expansion in bash replacing all occurrences of ? with nothing.
Note also that file types do not depend on the name in linux, but this should not change any file extension unless they contain ?.
Also this will rename symlinks as well if they contain ? (not clear whether or not that was expected from question).

I usually do this kind of thing in Perl:
#!/usr/bin/perl
sub procdir {
chdir #_[0];
for (<*>) {
my $oldname = $_;
rename($oldname, $_) if s/\?//g;
procdir($_) if -d;
}
chdir "..";
}
procdir("top_directory");

Related

How to cat similar named sequence files from different directories into single large fasta file

I am trying to get the following done. I have circa 40 directories of different species, each with 100s of sequence files that contain orthologous sequences. The sequence files are similarly named for each of the species directories. I want to concatenate the identically named files of the 40 species directories into a single sequence file which is named similarly.
My data looks as follows, e.g.:
directories: Species1 Species2 Species3
Within directory (similar for all): sequenceA.fasta sequenceB.fasta sequenceC.fasta
I want to get single files named: sequenceA.fasta sequenceB.fasta sequenceC.fasta
where the content of the different files from the different species is concatenated.
I tried to solve this with a loop (but this never ends well with me!):
ls . | while read FILE; do cat ./*/"$FILE" >> ./final/"$FILE"; done
This resulted in empty files and errors. I did try to find a solution elsewhere, e.g.: (https://www.unix.com/unix-for-dummies-questions-and-answers/249952-cat-multiple-files-according-file-name.html, https://unix.stackexchange.com/questions/424204/how-to-combine-multiple-files-with-similar-names-in-different-folders-by-using-u) but I have been unable to edit them to my case.
Could anyone give me some help here? Thanks!
In a root directory where your species directories reside, you should run the following:
$ mkdir output
$ find Species* -type f -name "*.fasta" -exec sh -c 'cat {} >> output/`basename {}`' \;
It traverses all the files recursively and merges the contents of files with identical basename into one under output directory.
EDIT: even though this was an accepted answer, in a comment the OP mentioned that the real directories don't match a common pattern Species* as shown in the original question. In this case you can use this:
$ find -type f -not -path "./output/*" -name "*.fasta" -exec sh -c 'cat {} >> output/`basename {}`' \;
This way, we don't specify the search pattern but rather explicitly omit output directory to avoid duplicates of already processed data.

How to store a directory's contents inside of an array

I want to be able to store a directory's contents inside of an array. I know that you can use:
#!/bin/bash
declare -A array
for i in Directory; do
array[$i]=$i
done
to store directory contents in an associative array. But if there are subdirectories inside of a directory, I want to be able to store them and their contents inside of the same array. I also tried using:
declare -A arr1
find Directory -print0 | while read -d $'\0' file; do
arr1[$file]=$file
echo "${arr1[$file]}"
done
but this just runs into the problem where the array contents vanish once the while loop ends due to the subshell being discarded from the pipeline (not sure if I'm describing this correctly).
I even tried the following:
for i in $(find Directory/*); do
arr2[$i]="$i"
echo $i
done
but the output is a total disaster for files containing any spaces.
How can I store both a directory and all of its subdirectories (and their subdirectories if need be) inside of a single array?
So you know, you don't need associative arrays. A simpler way to add an element to a regular indexed array is:
array+=("$value")
Your find|while approach is on the right track. As you've surmised, you need to get rid of the pipeline. You can do that with process substitution:
while read -d $'\0' file; do
arr1+=("$file")
done < <(find Directory -print0)
Another way to do this without find is with globbing. If you just want the files under Directory it's as simple as:
array=(Directory/*)
If you want to recurse through all of its subdirectories as well, you can enable globstar and use **:
shopt -s globstar
array=(Directory/**)
The globbing methods are really nice because they automatically handle file names with whitespace and other special characters.

Compare strings inside in the two different directories using array

I don't get the scenario of this given code. All I wanted is to compare the files that is given below. But, in this script nothings happen. I assume that this given code can executed wherever like in /root and it will run. Please check this out.
#!/bin/bash
for file in /var/files/sub/old/*
do
# Strip path from file name
file="${file##*/}"
# Strip everything after the first hyphen
prefix="${file%%-*}-"
# Strip everything before the second-to-last dot
suffix="$(echo $file | awk -F. '{ print "."$(NF-1)"."$NF }')"
# Create new file name from $prefix and $suffix, and any version number
new=$(echo "/var/files/new/${prefix}"*"${suffix}")
# If file exists in the 'new' folder:
if test -f "${new}"
then
# Do string comparison to see if new file is lexicographically "greater than" old
if [[ "${new##*/}" > "${file}" ]]
then
# If so, delete the old version.
rm /var/sub/files/old/"${file}"
else
# 'new' file is NOT newer, delete it instead.
rm "${new}"
fi
fi
done
# Move all new files into the old folder.
mv /var/files/new/* /var/files/sub/old/
Example files inside of each sub- directories ..
/var/files/sub/old/
firefox-24.5.0-1.el5_10.i386.rpm
firefox-24.5.0-1.el5_10.x86_64.rpm
google-1.6.0-openjdk-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
google-1.6.0-openjdk-demo-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
/var/files/new/
firefox-25.5.0-1.el5_10.i386.rpm
firefox-25.5.0-1.el5_10.x86_64.rpm
ie-1.6.0-openjdk-devel-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
ie-1.6.0-openjdk-javadoc-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
ie-1.6.0-openjdk-src-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
google-2.6.0-openjdk-demo-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
In this instance, I want to get the files that are the same. So the files that are the same in the given example are:
firefox-24.5.0-1.el5_10.i386.rpm
firefox-24.5.0-1.el5_10.x86_64.rpm
google-1.6.0-openjdk-demo-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
in the old/ directory and for the new/ directory the equivalents are:
firefox-25.5.0-1.el5_10.i386.rpm
firefox-25.5.0-1.el5_10.x86_64.rpm
google-2.6.0-openjdk-demo-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
The files have similarity for their first characters. It will display in the terminal. After that, there will be another comparing again of the files and the comparison is about which file is more updated one by the number after the name of the file like: firefox-24.5.0-1.el5_10.i386.rpm compared with firefox-25.5.0-1.el5_10.i386.rpm. So in that instance the firefox-24.5.0-1.el5_10.i386.rpm will be replaced by firefox-25.5.0-1.el5_10.i386.rpm because it has a greater value and more updated one and same as other files that are similar. And if the old one is removed and the new will take replacement of it.
So at this moment after the script has been executed the output will be like this.
/var/files/sub/old/
google-1.6.0-openjdk-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
firefox-25.5.0-1.el5_10.i386.rpm
firefox-25.5.0-1.el5_10.x86_64.rpm
ie-1.6.0-openjdk-devel-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
ie-1.6.0-openjdk-javadoc-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
ie-1.6.0-openjdk-src-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
google-2.6.0-openjdk-demo-1.6.0.0-5.1.13.3.el5_10.x86_64.rpm
/var/files/new/
<<empty all files here must to moved to other directory take as a replacement>>
Can anyone help me to make a script for this ? above is just an example. Let's assume that there are lots of files to considered as similar and need to removed and moved.
You can use rpm to get the name of the package without version or architecture strings:
rpm -qi -p /firefox-25.5.0-1.el5_10.i386.rpm
Gives:
Name : firefox
Version : 25.5.0
Release : 1.el5_10
Architecture: i386
....
So you can compare the Names to find related packages.
If the goal here is to have the newrpms directory have only the newest version of each RPM from a combination of sources then you most likely want to simply combine all the files in a single directory and then use the repomanage tool (from the yum-utils package, at least on CentOS) to have it inform you which of the RPMS are old and remove them.
Something like:
repomanage --old combined_rpms_directory | xargs -r rm
As to your initial script
for i in $(\ls -d ./new/*);
do
diff ${i} newrpms/;
rm ${i}
done
You generally don't want to "parse" the output from ls, especially when a glob will do what you want just as easily (for i in ./new/* in this case).
diff ${i} newrpms/ is attempting to diff a file and a directory (or two directories if your ls/glob happened to catch a directory) but in neither case will diff do what you want there. That being said what diff does doesn't really matter because, as Barmar said in his comment
your script is removing them without testing the result of diff
A bash script that does the checking. Here's how it works:
Traverse over each file in the old files directory. Get the prefix (package name with no version, architecture, etc), eg. firefox-; get the suffix (architecture.rpm), eg. .i386.rpm.
Attempt to match prefix and suffix with any version number within the new files directory, ie. firefox-*.i386.rpm. If there is a match, $new will contain the file name, eg. firefox-25.5.0-1.el5_10.i386.rpm; if no match, $new will equal the literal string firefox-*.i386.rpm which is not a file.
Check new files directory for existence of $new.
If it exists, check that $new is indeed newer than the old version. This is done by lexicographical string comparison, ie. firefox-24.5.0-1.el5_10.i386.rpm is less than firefox-25.5.0-1.el5_10.i386.rpm because it comes earlier in the alphabet. Conveniently, sane versioning schemes also happen to be alphabetical. NB: this may fail, for example, when comparing version 2 to version 10.
A new version of a file in the old files directory has been found! In this case, get rid of the old file with rm. If the file in the new directory is not newer, then delete it instead.
Done removing old versions. Old files directory has only files without newer versions.
Move all new files into old directory, leaving newest files in old directory, and new directory empty.
#!/bin/bash
for file in /var/files/sub/old/*
do
# Strip path from file name
file="${file##*/}"
# Strip everything after the first hyphen
prefix="${file%%-*}-"
# Strip everything before the second-to-last dot
suffix="$(echo $file | awk -F. '{ print "."$(NF-1)"."$NF }')"
# Create new file name from $prefix and $suffix, and any version number
new=$(echo "/var/files/new/${prefix}"*"${suffix}")
# If file exists in the 'new' folder:
if test -f "${new}"
then
# Do string comparison to see if new file is lexicographically "greater than" old
if [[ "${new##*/}" > "${file}" ]]
then
# If so, delete the old version.
rm /var/sub/files/old/"${file}"
else
# 'new' file is NOT newer, delete it instead.
rm "${new}"
fi
fi
done
# Move all new files into the old folder.
mv /var/files/new/* /var/files/sub/old/

finding a file in unix using wildcards in file name

I have few files in a folder with name pattern in which one of the section is variable.
file1.abc.12.xyz
file2.abc.14.xyz
file3.abc.98.xyz
So the third section (numeric) in above three file names changes everyday.
Now, I have a script which does some tasks on the file data. However, before doing the work, I want to check whether the file exists or not and then do the task:
if(file exist) then
//do this
fi
I wrote the below code using wildcard '*' in numeric section:
export mydir=/myprog/mydata
if[find $mydir/file1.abc.*.xyz]; then
# my tasks here
fi
However, it is not working and giving below error:
[find: not found [No such file or directory]
Using -f instead of find does not work as well:
if[-f $mydir/file1.abc.*.xyz]; then
# my tasks here
fi
What am I doing wrong here ? I am using korn shell.
Thanks for reading!
for i in file1.abc.*.xyz ; do
# use $i here ...
done
I was not using spaces before the unix keywords...
For e.g. "if[-f" should actually be " if [ -f" with spaces before and after the bracket.

Replace all files of a certain type in all directories and subdirectories?

I have played around with the find command and anything else I can think of but nothing will work.
I would like my bash script to be able to find all of a file type in a given directory and all of its subdirectories and replace the file with another.
EX: lets say
/home/test1/randomfolder/index.html
/home/test1/randomfolder/stuff.html
/home/different/stuff/index.html
/home/different/stuff/another.html
Each of those .html files need to be found when the program is given /home/ as a directory to search in, and then replaced by echoing the other file into them.
Is this possible in bash?
This should more or less get you going in the right direction:
for file in `find . -type f -name \*.html`; do echo "new content" > $file; done

Resources