Remove all vowels from a file name using shell script - file

Current code:
find . -depth | \
while read LONG; do
SHORT=$( basename "$LONG" | tr '[aeiou]' '[ ]' )
DIR=$( dirname "$LONG" )
if [ "${LONG}" != "${DIR}/${SHORT}" ]; then
mv "${LONG}" "${DIR}/${SHORT}"
fi
done
So if I have files like aaa abc bdf I get the files ' ' ' bc' 'bdf'
The way I want this to work is to return 'aaa' 'bc' bdf'.
(Completly remove the a from the second file and if all the characters (excluding the file extension) are vowels, ignore it.

I think the two problems with your solution are:
You're substituting vowels for a space. Shouldn't you substitute an empty string?
Then you need to test if SHORT is empty. If it is, discard it, perhaps by assigning SHORT=LONG.

Remove all vowels:
tr -d aeiou
Ignore if basename (excluding the file extension) is only vowels:
case $SHORT in ''|.*) continue;; esac

Related

Recursive parsing and arrays in shell script

I intend to accept a single argument for my shell script my_script.sh and parse the values from it using separators. For example,
./my_script.sh a-e,f/b-1/c-5,g/d
means my primary separator is / and secondary separator is - and tertiary separator is ,. The challenge here is the number of values separated by , or - is not fixed, but variable. Like in d, there is no - or , at all. I can always parse the values separated by / as:
IFS='/' read -ra list_l1 <<<$1
This way, I get the number of times I need to loop over. But I'm stuck trying a parsing within list_l1. Here,
I need to see if there is - and , or if they are there at all.
If there is - and ,, get the values after - and pass it/them as arguments to another script (eg. for a e,f will be passed as separate arguments to another script).
If there is no - and ,, just run another script without arguments (eg. for d, another script is run without any arguments).
How can I get this done?
UPDATE:
I managed to figure a way for level one:
IFS='/' read -ra list_l1 <<<$1
for i in "${!list_l1[#]}"; do
list_l2[$i]="${list_l1[$i]//,/$' '}"
# This section is a pseudocode of what I would like to do:
get 'type' from first part (before '-' as in example above)
if type == 'a':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'b':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'c':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'd':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
# This section is a pseudocode of what I would like to do:
done
Take a look at this:
#!/usr/bin/env bash
f() { printf 'I am called with %d arguments: %s\n' "$#" "$*"; }
param='a-e,f/b-1/c-5,g/d'
IFS=/ read -ra a <<< "$param"
for i in "${a[#]}"; do
IFS=- read -r _ b <<< "$i"
IFS=, read -ra c <<< "$b"
f "${c[#]}"
done
$ ./script
I am called with 2 arguments: e f
I am called with 1 arguments: 1
I am called with 2 arguments: 5 g
I am called with 0 arguments:
Based on what I understood of your question, I produced this code:
** Edit no1, calling another script using that array**
#!/bin/bash
arg='a-e,f/b-1/c-5,g/d'
# Cuts it in [a-e,f] [b-1] [c5,g] [d]
IFS='//' read -ra list_l1 <<<$arg
echo "First cut on /."
echo "Content of list_l1"
for K in "${!list_l1[#]}"
do
echo "list_l1[$K]: ${list_l1[$K]}"
done
echo ""
declare -A list_l2
echo "Then loop, cut on '-' and replace ',' by ' '."
for onearg in ${list_l1[#]}
do
IFS='-' read part1 part2 <<<$onearg
list_l2[$part1]=$(echo $part2 | tr ',' ' ')
done
echo "Content of list_l2:"
for K in "${!list_l2[#]}"
do
echo "list_l2[$K]: ${list_l2[$K]}"
done
# Calling another script using these values
echo ""
for K in "${!list_l2[#]}"
do
echo "./another_script.sh ${list_l2[$K]}"
done
Which gives the following output:
$ ./t.bash
First cut on /.
Content of list_l1
list_l1[0]: a-e,f
list_l1[1]: b-1
list_l1[2]: c-5,g
list_l1[3]: d
Then loop, cut on '-' and replace ',' by ' '.
Content of list_l2:
list_l2[a]: e f
list_l2[b]: 1
list_l2[c]: 5 g
list_l2[d]:
./another_script.sh e f
./another_script.sh 1
./another_script.sh 5 g
./another_script.sh
Some details:
The first step is to cut on '/'. This creates list_l1.
All elements in list_l1 start with ['a', 'b', 'c', 'd', ...]. The first letter of each element after the cut on '/'.
Then each of these is cut a second time on '-'.
The first part of that cut (left of the '-') becomes key.
The second part of that cut (right of the '-') becomes the value.
list_l2 is created as an associative array, using the key and value that were just calculated.
This way list_l2 contains everything you need, without having to reference list_l1 at all later. If you need the list of keys, use ${!list_l2[#]}. If you need the list of values, use ${list_l2[#]}.
Let me know if that meets your requirement.

Bash: set IFS to Space after specific character only?

I'm using IFS=', ' to split a string of comma-delimited text into an array. The problem is that occasionally one of the comma-delimited items contains a space following a :. The resulting array contains that item as two separate array elements. Is it possible to set IFS to only split ', ' and ignore a comma-delimited item that contains ': ' (or any other character for that matter)?
See the comma-delimited string returned from the first command below, note the second item has the :. See the MarkerNames[1] and MarkerNames[2] to see the unwanted split in the second command below.
$ exiftool -s3 -TracksMarkersName audioFile.wav
Marker1, Tempo: 120.0, Silence, Marker2, Silence.1, Marker3, Silence.2, Marker4, Silence.3, Marker5
$ IFS=', ' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
$ declare -p MarkerNames
declare -a MarkerNames='([0]="Marker1" [1]="Tempo:" [2]="120.0" [3]="Silence" [4]="Marker2" [5]="Silence.1" [6]="Marker3" [7]="Silence.2" [8]="Marker4" [9]="Silence.3" [10]="Marker5")'
IFS contains an enumeration of the characters which each can be a field separator. So ", " says "any run of spaces or commas separates my fields".
The simplest workaround I think would be to preprocess the output so you get the breaks where you want them.
IFS='~' MarkerNames=($(exiftool -s3 -TracksMarkersName audioFile.wav | sed 's/, /~/g'))
This of course requires you to find another IFS value which doesn't occur in your data. If Bash 4+ is available, maybe use a newline and readarray.
You could split on commas and remove the leading / trailing spaces afterwards:
IFS=',' read -r -a MarkerNames <<< $(exiftool -s3 -TracksMarkersName audioFile.wav)
shopt -s extglob # Needed for extended glob
MarkerNames=( "${MarkerNames[#]/#*( )}" ) # Remove leading spaces
MarkerNames=( "${MarkerNames[#]/%*( )}" ) # Remove trailing spaces

Bash Convert text string into array with multiple \r\n as field seperator

I have a windows text file in the format:
line\r\n
line\r\n
line\r\n
r\n
line\r\n
line\r\n
line\r\n
r\n
...
I want to put this textfile into an array where the field seperator is \r\n\r\n - I did search for an answer but nothing I found and tried did work . awk for example is too complex for me and FS= did not work as I expected.
Commands to read arrays in bash can (as far as I know) only use single characters as a field separator, not complete strings like \r\n\r\n.
Workaround
First replace the field separator \r\n\r\n with a single char which is not used in the string to be splitted. I found \x1e (the ASCII control character »Record Separator«) to work out quite well.
Then read the array using the new (one character) field separator.
The field separator will always be removed when reading something to an array. But you can append the separator to each field.
Here is a pure bash solution to read the file file into the array array:
IFS=$'\x1e'
filecontent="$(< file)"
array=(${filecontent//$'\r\n\r\n'/$'\x1e'})
array=("${array[#]/%/$'\r\n\r\n'}")
IFS=$'\x1e' sets bash's field separator which is used to split strings into arrays. Depending on your script you may want to restore the old IFS afterwards (default is IFS=$' \t\n').
Results
For file
A B C\r\n
D E F\r\n
\r\n
G H I\r\n
\r\n
the resulting array will have two entries:
${array[0]}
A B C\r\n
D E F\r\n
\r\n
${array[1]}
G H I\r\n
\r\n
Known Problems
IFS at the beginning and end of the string will be trimmed. Repeated IFS will be squeezed. The file \r\n\r\n will result in an array without entries. Empty entries cannot be created.
\r\n\r\n is appended to all entries in all cases. The file A\r\n\r\nB will result an array with the two entries A\r\n\r\n and B\r\n\r\n.
In Linux all lines of files are terminated with \n.
So your problem is not the \r\n , it is just the \r. So just remove it:
$ tr -d '\r' <file >newfile
To verify that \r is removed you can do:
$ head -n2 newfile |od -t x1c
This will get the first two lines of the new file and the od tool will dump / convert those lines in ascii hex codes. In ascii hex \r is \x0d and \n is \x0a.
Once you have removed the \r from your file you can do anything you want.
You can use all linux tools (including awk) straight forward without special settings.
To built an array you can use:
$ while read -r line;do data+=("$line");done <newfile
If you want to skip blank lines , this one is enough:
$ while read -r line;do [[ "$line" == "" ]] && continue;data+=("$line") ;done <file1
You can offcourse combine array creation with removal of the \r on-the-fly, without modifying your existed file like this ( See online testing here. )
while read -r line;do [[ "$line" == "" ]] && continue;data+=("$line") ;done < <(tr -d '\r' <file1)
To see what is inside array "data" just use $ declare -p data
PS: By the way using awk -v RS="\r\n" '{you awk code here}' should be enough even to read the initial file in awk as well. RS = Record (lines) Separator
I made this script in pure bash, even if the answer from socowi is pure bash too:
exec < filern.txt
declare -a array
acc=""
lineno=0
cr=$(echo -en "\r")
while read line; do
line=${line%$cr}
if [ -z "$line" ]; then
let lineno=$lineno+1
array[$lineno]=$acc
acc=""
else
[ ! -z "$acc" ] && acc="$acc--" # you can use any separator here
acc="$acc$line"
fi
done
echo "Read file in array:"
for ((i=1; i<= ${#array[#]}; i++)) do
printf "%3.3d |%s|\n" $i "${array[$i]}"
done
It reads a "real" line of input at a time, and strips the trailing \r.
At this point, a sequence \r\n\r\n turns into an empty line, so that is used to assign the array elements one after the other.
The output from the example file is:
Read file in array:
001 |line--line--line|
002 |line--line--line|
The separator could also be a \r, or whatever. I coudn't find a way to clear the trailing \r with the command line=${line% ?? }, so I used a variable. The same trick can be used to add "strange" separator to the variable ACC. I hope it helps.

Check if each element of an array is present in a string in bash, ignoring certain characters and order

On the web I found answers to find if an element of array is present in the string. But I want to find if each element in the array is present in the string.
eg. str1 = "This_is_a_big_sentence"
Initially str2 was like
str2 = "Sentence_This_big"
Now I wanted to search if string str1 contains "sentence"&"this"&"big" (All 3, ignore alphabetic order and case)
So I used arr=(${str2//_/ })
How do i proceed now, I know comm command finds intersection, but it needs a sorted list, also I need to ignore _ underscores.
I get my str2 by finding the extension of a particular type of file using the command
for i in `ls snooze.*`; do echo $i | cut -d "." -f2
# Till here i get str2 and need to check as mentioned above. Not sure how to do this, i tried putting str2 as array and now just need to check if all elements of my array occur in str1 (ignore case,order)
Any help would be highly appreciated. I did try to use This link
Now I wanted to search if string a contains "sentence"&"this"&"big"
(All 3, ignore alphabatic order and case)
Here is one approach:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
if ! grep -qvwFf <(sed 's/_/\n/g' <<<${str1,,}) <(sed 's/_/\n/g' <<<${str2,,})
then
echo "All words present"
else
echo "Some words missing"
fi
How it works
${str1,,} returns the string str1 with all capitals replaced by lower case.
sed 's/_/\n/g' <<<${str1,,} returns the string str1, all converted to lower case and with underlines replaced by new lines so that each word is on a new line.
<(sed 's/_/\n/g' <<<${str1,,}) returns a file-like object containing all the words in str1, each word lower case and on a separate line.
The creation of file-like objects is called process substitution. It allows us, in this case, to treat the output of a shell command as if it were a file to read.
<(sed 's/_/\n/g' <<<${str2,,}) does the same for str2.
Assuming that file1 and file2 each have one word per line, grep -vwFf file1 file2 removes from file2 every occurrence of a word in file2. If there are no words left, that means that every word in file2 appears in file1.
By adding the option -q, grep will return no output but will set an exit code that we can use in our if statement.
In the actual command, file1 and file2 are replaced by our file-like objects.
The remaining grep options can be understood as follows:
-w tells grep to look for whole words only.
-F tells grep to look for fixed strings, not regular expressions.
-f tells grep to look for the patterns to match in the file (or file-like object) which follows.
-v tells grep to remove (the default is to keep) the words which match.
Here is an awk solution to check existence of all the words from a string in another string:
str1="This_is_a_big_sentence"
str2="Sentence_This_big"
awk -v RS=_ 'FNR==NR{a[tolower($1)]; next} {delete a[tolower($1)]} END{print (length(a)) ? "Not all words" : "All words"}' <(echo "$str2") <(echo "$str1")
With indentation:
awk -v RS=_ 'FNR==NR {
a[tolower($1)];
next
}
{ delete a[tolower($1)] }
END {
print (length(a)) ? "Not all words" : "All words"
}' <(echo "$str2") <(echo "$str1")
Explanation:
-v RS=_ We use record separator as _
FNR==NR - Execute this block for str2
a[tolower($1)]; next - Populate an array a with each lowercase word as key
{delete a[tolower($1)]} - For each word in str1 delete key in array a
END - If length of array a is still not 0 then there are some words left.
Here's another solution:
#!/bin/bash
str1="This_is_a_big_sentence"
str2="sentence_This_big"
var=0
var2=0
while read in
do
if [ $(echo $str1 | grep -ioE $in) ]
then
var=$((var+1))
fi
var2=$((var2+1))
done < <(echo $str2 | sed -e 's/\(.*\)/\L\1/' -e 's/_/\n/g')
if [[ $var -eq $var2 && $var -ne 0 ]]
then
echo "matched"
else
echo "not matched"
What this script does make str2 all lower case with sed -e 's/\(.*\)/\L\1/' which is a substitution of any character with its lower case, then replace underscores _ with return lines \n with the following sed expression: sed -e 's/_/\n/g', which is another substitution.
Now the individual words are fed into a while loop that compares str1 with the word that was fed in. Every time there's a match, increment var and every time we iterate though the while, we increment var2. If var == var2, then all the words of str2 were found in str1. Hope that helps.
Here's an approach.
if [ "$(echo "This_BIG_senTence" | grep -ioE 'this|big|sentence' | wc -l)" == "3" ]; then echo "matched"; fi
How it works.
grep options -i makes the grep case insensitive, -E for extended regular expressions, and -o separates the matches by line. Now that it is separated by line use wc with -l for line count. Since we had 3 conditions we check if it equals 3. Grep will return the lines where the match occurred, so if you are only working with a string, the example above will return the string for each condition, in this case 3, so there won't be any problems.
Note you can also create a grep chain and see if its empty.
if [ $(echo "This_BIG_SenTence" | grep -i this | grep -i big | grep -i sentence) ]; then echo matched; else echo not_matched; fi
Now I know what you mean. Try this:
#!/bin/bash
# add 4 non-matching examples
> snooze.foo_bar
> snooze.bar_go
> snooze.go_foo
> snooze.no_match
# add 3 matching examples
> snooze.foo_bar_go
> snooze.goXX_XXfoo_XXbarXX
> snooze.bar_go_foo_Ok
str1=("foo" "bar" "go")
for i in `ls snooze.*`; do
str2=${i#snooze.}
j=0
found=1
while [[ $j -lt ${#str1[#]} ]]; do
if ! echo $str2 | eval grep \${str1[$j]} >& /dev/null; then
found=0
break
fi
((j++))
done
if [[ $found -ne 0 ]]; then
echo Match found: $str2
fi
done
Resulting print of this script:
Match found: bar_go_foo_Ok
Match found: foo_bar_go
Match found: goXX_XXfoo_XXbarXX
alternatively, the if..grep line above can be replaced by
if [[ ! $str2 =~ `eval echo \${str1[$j]}` ]]; then
utilizing bash's regular expression match.
Note: I am not too careful about special characters in the search string, such as "\" or " " (space), which may cause problem.
--- Some explanations ---
In the if .. grep line, $j is first evaluated to the running index, from 0 to the number of elements in $str1 minus 1. Then, eval will re-evaluate the whole grep command again, causing ${str1[jjj]} to be re-evaluated (Here, jjj is the already evaluated index)
The strategy is to set found=1 (found by default), and then when any grep fails, we set found to 0 and break the inner j-loop.
Everything else should be straightforward.

Reading a space-delimited string into an array in Bash

I have a variable which contains a space-delimited string:
line="1 1.50 string"
I want to split that string with space as a delimiter and store the result in an array, so that the following:
echo ${arr[0]}
echo ${arr[1]}
echo ${arr[2]}
outputs
1
1.50
string
Somewhere I found a solution which doesn't work:
arr=$(echo ${line})
If I run the echo statements above after this, I get:
1 1.50 string
[empty line]
[empty line]
I also tried
IFS=" "
arr=$(echo ${line})
with the same result. Can someone help, please?
In order to convert a string into an array, create an array from the string, letting the string get split naturally according to the IFS (Internal Field Separator) variable, which is the space char by default:
arr=($line)
or pass the string to the stdin of the read command using the herestring (<<<) operator:
read -a arr <<< "$line"
For the first example, it is crucial not to use quotes around $line since that is what allows the string to get split into multiple elements.
See also: https://github.com/koalaman/shellcheck/wiki/SC2206
In: arr=( $line ). The "split" comes associated with "glob".
Wildcards (*,? and []) will be expanded to matching filenames.
The correct solution is only slightly more complex:
IFS=' ' read -a arr <<< "$line"
No globbing problem; the split character is set in $IFS, variables quoted.
Try this:
arr=(`echo ${line}`);
If you need parameter expansion, then try:
eval "arr=($line)"
For example, take the following code.
line='a b "c d" "*" *'
eval "arr=($line)"
for s in "${arr[#]}"; do
echo "$s"
done
If the current directory contained the files a.txt, b.txt and c.txt, then executing the code would produce the following output.
a
b
c d
*
a.txt
b.txt
c.txt
line="1 1.50 string"
arr=$( $line | tr " " "\n")
for x in $arr
do
echo "> [$x]"
done

Resources