Bash - Concatenating backslash while joining an array - arrays

I've been trying to figure out a bash script to determine the server directory path, such as D:\xampp\htdocs, and the project folders name, such as "my_project", while Grunt is running my postinstall script. So far I can grab the projects folder name, and I can get an array of the remaining indices that comprise the server root path on my system, but I can't seem to join the array with an escaped backslash. This is probably not the best solution (definitely not the most elegant) so if you have any tips or suggestions along the way I'm amendable.
# Determine project folder name and server root directory path
bashFilePath=$0 # get path to post_install.sh
IFS='\' bashFilePathArray=($bashFilePath) # split path on \
len=${#bashFilePathArray[#]} # get array length
# Name of project folder in server root directory
projName=${bashFilePathArray[len-3]} # returns my_project
ndx=0
serverPath=""
while [ $ndx -le `expr $len - 4` ]
do
serverPath+="${bashFilePathArray[$ndx]}\\" # tried in and out of double quotes, also in separate concat below
(( ndx++ ))
done
echo $serverPath # returns D: xampp htdocs, works if you sub out \\ for anything else, such as / will produce D:/xampp/htdocs, just not \\

You can only prefix command invocations, not variable assignments, with IFS, so your line
IFS='\' bashFilePathArray=($bashFilePath)
is just a pair of assignments; the expansion of $bashFilePath is unaffected by the assignment to IFS. Instead, use the read builtin.
IFS='\' read -ra bashFilePathArray <<< "$bashFilePath"
Later, you can use a subshell to easily join the first few elements of the array into a single string.
serverPath=$(IFS='\'; echo "${bashFilePathArray[*]:0:len-3}")
The semi-colon is required, since the argument to echo is expanded before echo actually runs, meaning IFS needs to be modified "globally" rather than just for the echo command. Also, [*] is required in place of the more commonly recommended [#] because here we are making explicit use of the property that the elements of such an array expansion will produce a single word rather than a sequence of words.

Related

Why are loop generated Bash array values concatenated together?

I'm writing a short script to automate output filenames. The testing folder has the following files:
test_file_1.fa
test_file_2.fa
test_file_3.fa
So far, I have the following:
#!/bin/bash
filenames=$(ls *.fa*)
output_filenames=$()
output_suffix=".output.faa"
for name in $filenames
do
output_filenames+=$name$output_suffix
done
for name in $output_filenames
do
echo $name
done
The output for this is:
test_file_1.fa.output.faatest_file_2.fa.output.faatest_file_3.fa.output.faa
Why does this loop 'stick' all of the filenames together as one array variable?
shell arrays require particular syntax.
output_filenames=() # not $()
output_suffix=".output.faa"
for name in *.fa* # don't parse `ls`
do
output_filenames+=("$name$output_suffix") # parentheses required
done
for name in "${output_filenames[#]}" # braces and index and quotes required
do
echo "$name"
done
https://tldp.org/LDP/abs/html/arrays.html has more examples of using arrays.
"Don't parse ls" => https://mywiki.wooledge.org/ParsingLs

Bash array with spaces and no spaces in elements

I know this question has been asked in different manners, and I've referred to some answers on here to get to where I am now.
I'm trying to create a script to essentially watch a folder and virus scan files once they're not being written to.
The files/strings I need it to handle will sometimes contain spaces and sometimes not, as well sometimes special characters. At the moment it will actually work only on files with spaces in the name (as in actually scan and move them), but not for files without spaces. Also after each file (spaces or not) the while loop breaks thus stopping the script with the following output;
./howscan.sh: line 29: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
./howscan.sh: line 34: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
I had it working to handle file names without any spaces, but since I introduced the "${files[$i]}" method to use the array elements it only works on files with spaces and outputs the above error.
Feel free to omit the sepscan part of this, as I'm sure if I can get it working with the other tasks it'll work for that too (just wanted to show the full script for a complete understanding).
Current Script:
#!/bin/bash
set -x
workingpath='/mnt/Incoming/ToScan/'
outputpath='/mnt/Incoming/Scanned/'
logfile='/var/log/howscan.log'
faildir='/mnt/Incoming/ScanFailed/'
sepscan='/opt/Symantec/symantec_antivirus/sav manualscan -c'
# Change to working directory
cd $workingpath
# Exclude files with a given extension, in this case .aspx, and declare the remaining files as the array "files"
shopt -s extglob nullglob
# Loop the below - ie it's a watch folder
while true
do
# Exclude files with .aspx in the file name
files=( !(*.aspx) )
# If the array files is not empty then...
if [ ${#files[#]} -ne 0 ]; then
for i in ${files[*]}
# For every item in the array files process them as follows
# Declare any variables you wish to happen on files here, not globally
# Check each file to see if it's in use using fuser, do nothing and log it if its still in use, or process it and log the results
do
fileopen=`fuser "${files[$i]}" | wc -c`
# Here for 'fileopen' we are checking if the file is being writen to.
if [ $fileopen -ne 0 ]; then
echo `date` "${files[$i]}" is being used so has been ignored >> $logfile
else
echo `date` File "${files[$i]}" not being used or accessed >> $logfile
sepscan=`$sepscan "${files[$i]}" | wc -c`
if [ $sepscan = 0 ]; then
mv "${files[$i]}" $outputpath
echo `date` "${files[$i]}" has been virus scanned and moved to $outputpath >> $logfile
else
echo `date` "${files[$i]}" has been moved to $faildir as a virus or error was detected >> $logfile
fi
fi
done
fi
echo `date` 'No more files to process. Waiting 60 seconds...' >> $logfile
sleep 60
done
Let me know if I can provide anything else to help clarify my issue.
Update:
There is a file in the /mnt/Incoming/ToScan/ directory called snmp.conf by the way.
for i in ${files[*]}
should be
for i in ${!files[*]}
# or
for i in ${!files[#]}
${files[*]} expands to the contents of the array and undergoes word splitting. The above syntax expands to a list of indices of the array.
You might also need to double quote the variables, e.g.
if [ "$fileopen" -ne 0 ]; then

bash: looping over the files with extra conditions

In the working directory there are several files grouped into several groups based on the end-suffix of the file name. Here is the example for 4 groups:
# group 1 has 5 files
NpXynWT_apo_300K_1.pdb
NpXynWT_apo_300K_2.pdb
NpXynWT_apo_300K_3.pdb
NpXynWT_apo_300K_4.pdb
NpXynWT_apo_300K_5.pdb
# group 2 has two files
NpXynWT_apo_340K_1.pdb
NpXynWT_apo_340K_2.pdb
# group 3 has 4 files
NpXynWT_com_300K_1.pdb
NpXynWT_com_300K_2.pdb
NpXynWT_com_300K_3.pdb
NpXynWT_com_300K_4.pdb
# group 4 has 1 file
NpXynWT_com_340K_1.pdb
I have wrote a simple bash workflow to
List item pre-process each of the fille via SED: add something within each of file
cat together the pre-processed files that belongs to the same group
Here is my script for the realisation of the workflow where I created an array with the names of the groups and looped it according to file index from 1 to 5
# list of 4 groups
systems=(NpXynWT_apo_300K NpXynWT_apo_340K NpXynWT_com_300K NpXynWT_com_340K)
# loop over the groups
for model in "${systems[#]}"; do
# loop over the files inside of each group
for i in {0001..0005}; do
# edit file via SED
sed -i "1 i\This is $i file of the group" "${pdbs}"/"${model}"_"$i"_FA.pdb
done
# after editing cat the pre-processed filles
cat "${pdbs}"/"${model}"_[1-5]_FA.pdb > "${output}/${model}.pdb"
done
The questions to improve this script:
1) how it would be possible to add within the inner (while) loop some checking conditions (e.g. by means of IF statement) to consider only existing files? In my example the script always loops 5 files (for each group) according to the maximum number in one of the group (here 5 files in the first group)
for i in {0001..0005}; do
I would rather to loop along all of the existing files of the given group and break the while loop in the case if the file does not exist (e.g. considering the 4th group with only 1 file). Here is the example, which however does not work properly
# loop over the groups with the checking of the presence of the file
for model in "${systems[#]}"; do
i="0"
# loop over the files inside of each group
for i in {0001..9999}; do
if [ ! -f "${pdbs}/${model}_00${i}_FA.pdb" ]; then
echo 'File '${pdbs}/${model}_00${i}_FA.pdb' does not exits!'
break
else
# edit file via SED
sed -i "1 i\This is $i file of the group" "${pdbs}"/"${model}"_00"$i"_FA.pdb
i=$[$i+1]
fi
done
done
Would it be possible to loop over any number of existing filles from the group (rather than just restricting to given e.g. very big number of files by
for i in {0001..9999}; do?
You can check if a file exists with the -f test, and break if it doesn't:
if [ ! -f "${pdbs}/${model}_${i}_FA.pdb" ]; then
break
fi
You existing cat command already does only count the existing files in each group, because "${pdbs}"/"${model}"_[1-5]_FA.pdb bash is performing filename expansion here, not simply expanding the [1-5] to all possible values. You can see this in the following example:
> touch f1 f2 f5 # files f3 and f4 do not exist
> echo f[1-5]
f1 f2 f5
Notice that f[1-5] did not expand to f1 f2 f3 f4 f5.
Update:
If you want your glob expression to match files ending in numbers bigger than 9, the [1-n] syntax will not work. The reason is that the [...] syntax defines a pattern that matches a single character. For instance, the expression foo[1-9] will match files foo1 through foo9, but not foo10 or foo99.
Doing something like foo[1-99] does not work, because it doesn't mean what you might think it means. The inside of the [] can contain any number of individual characters, or ranges of characters. For example, [1-9a-nxyz] would match any character from '1' through '9', from 'a' through 'n', or any of the characters 'x', 'y', or 'z', but it would not match '0', 'q', 'r', etc. Or for that matter, it would also not match any uppercase letters.
So [1-99] is not interpreted as the range of numbers from 1-99, it is interpreted as the set of characters comprised of the range from '1' to '9', plus the individual character '9'. Therefore the patterns [1-9] and [1-99] are equivalent, and will only match characters '1' through '9'. The second 9 in the latter expression is redundant.
However, you can still achieve what you want with extended globs, which you can enable with the command shopt -s extglob:
> touch f1 f2 f5 f99 f100000 f129828523
> echo f[1-99999999999] # Doesn't work like you want it to
f1 f2 f5
> shopt -s extglob
> echo f+([0-9])
f1 f2 f5 f99 f100000 f129828523
The +([0-9]) expression is an extended glob expression composed of two parts: the [0-9], whose meaning should be obvious at this point, and the enclosing +(...).
The +(pattern) syntax is an extglob expression that means match one or more instances of pattern. In this case, our pattern is [0-9], so the extglob expression +([0-9]) matches any string of digits 0-9.
However, you should note that this means it also matches things like 000000000. If you are only interested in numbers greater than or equal to 1, you would instead do (with extglob enabled):
> echo f[1-9]*([0-9])
Note the *(pattern) here instead of +(pattern). The * means match zero or more instances of pattern. Which we want because we've already matched the first digit with [1-9]. For instance, f[1-9]+([0-9]) does not match the filename f1.
You may not want to leave extglob enabled in your whole script, particularly if you have any regular glob expression elsewhere in your script that might accidentally be interpreted as an extglob expression. To disable extglob when you're done with it, do:
shopt -u extglob
There's one other important thing to note here. If a glob pattern doesn't match any files, then it is interpreted as a raw string, and is left unmodified.
For example:
> echo This_file_totally_does_not_exist*
This_file_totally_does_not_exist*
Or more to the point in your case, suppose there are zero files in your 4th case, e.g. there are no files containing NpXynWT_com_340K. In this case, if you try to use a glob containing NpXynWT_com_340K, you get the entire glob as a literal string:
> shopt -s extglob
> echo NpXynWT_com_340K_[1-9]*([0-9])
echo NpXynWT_com_340K_[1-9]*([0-9])
This is obviously not what you want, especially in the middle of your script where you are trying to cat the matching files. Luckily there is another option you can set to make non-matching globs expand to nothing:
> shopt -s nullglob
> echo This_file_totally_does_not_exist* # prints nothing
As with extglob, there may be unintended behavior elsewhere in your script if you leave nullglob on.

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

I have a list of strings that I have in a variable and would like to remove those strings from a list of filenames. I'm pulling that string from a file that I can add to and modify over time. Some of the strings in the variable may include part of the item needed to be removed while the other may be another line in the list. Thats why I need to loop thru the entire variable list.
I'm familiar using a while loop to loop thru a list but not sure how I can loop thru each line to remove all strings from that filename.
Here's an example:
getstringstoremove=$(cat /text/from/some/file.txt)
echo "$getstringstoremove"
# Or the above can be an array
getstringstoremove=$(cat /text/from/some/file.txt)
declare -a arr=($getstringstoremove)
the above 2 should return the following lines
-SOMe.fil
(Ena)M-3_1
.So[Me].filEna)M-3_2
SOMe.fil(Ena)M-3_3
Here's the loop I was running to grab all filenames from a directory and remove anything other than the filenames
ls -l "/files/in/a/folder/" | awk -v N=9 '{sep=""; for (i=N; i<=NF; i++) {printf("%s%s",sep,$i); sep=OFS}; printf("\n")}' | while read line; do
echo "$line"
returns the following result after each loop
# 1st loop
ilikecoffee1-SOMe.fil(Ena)M-3_1.jpg
# iterate thru $getstringstoremove to remove all strings from the above file.
# 2nd loop
ilikecoffee2.So[Me].filEna)M-3_2.jpg
# iterate thru $getstringstoremove again
# 3rd loop
ilikecoffee3SOMe.fil(Ena)M-3_3.jpg
# iterate thru $getstringstoremove and again
done
the final desired output would be the following
ilikecoffee1.jpg
ilikecoffee2.jpg
ilikecoffee3.jpg
I'm running this in bash on Mac.
I hope this makes sense as I'm stuck and can use some help.
If someone has a better way of doing this by all means it doesn't have to be the way I have it listed above.
You can get the new filenames with this awk one-liner:
$ awk 'NR==FNR{a[$0];next} {for(i in a){n=index($0,i);if(n){$0=substr($0,0,n-1)substr($0,n+length(i))}}} 1' rem.txt files.lst
This assumes your exclusion strings are in rem.txt and there's a files list in files.lst.
Spaced out for easier commenting:
NR==FNR { # suck the first file into the indices of an array,
a[$0]
next
}
{
for (i in a) { # for each file we step through the array,
n=index($0,i) # search for an occurrence of this string,
if (n) { # and if found,
$0=substr($0,0,n-1)substr($0,n+length(i))
# rewrite the line with the string missing,
}
}
}
1 # and finally, print the line.
If you stow the above script in a file, say foo.awk, you could run it as:
$ awk -f foo.awk rem.txt files.lst
to see the resultant files.
Note that this just shows you how to build new filenames. If what you want is to do this for each file in a directory, it's best to avoid running your renames directly from awk, and use shell constructs designed for handling files, like a for loop:
for f in path/to/*.jpg; do
mv -v "$f" "$(awk -f foo.awk rem.txt - <<<"$f")"
done
This should be pretty obvious except perhaps for the awk options, which are:
-f foo.awk, use the awk script from this filename,
rem.txt, your list of removal strings,
-, a hyphen indicating that standard input should be used IN ADDITION to rem.txt, and
<<<"$f", a "here-string" to provide that input to awk.
Note that this awk script will work with both gawk and the non-GNU awk that is included in macos.
I think I have understood what you mean, and I would do it with Perl which comes built-in to the standard macOS - so nothing to install.
I assume you have a file called remove.txt with your list of stuff to remove, and that you want to run the script on all files in your current directory. If so, the script would be:
#!/usr/local/bin/perl -w
use strict;
# Load the strings to remove into array "strings"
my #strings = `cat remove.txt`;
for(my $i=0;$i<$#strings;$i++){
# Strip carriage returns and quote metacharacters - e.g. *()[]
chomp($strings[$i]);
$strings[$i] = quotemeta($strings[$i]);
}
# Iterate over all filenames
my #files = glob('*');
foreach my $file (#files){
my $new = $file;
# Iterate over replacements
foreach my $string (#strings){
$new =~ s/$string//;
}
# Check if name would change
if($new ne $file){
if( -f $new){
printf("Cowardly refusing to rename %s as %s since it involves overwriting\n",$file,$new);
} else {
printf("Rename %s as %s\n",$file,$new);
# rename $file,$new;
}
}
}
Then save that in your HOME directory as renamer. Make it executable - only necessary once - with this command in Terminal:
chmod +x $HOME/renamer
Then you can go in any directory where you madly named files are and run the script like this:
cd path/to/mad/files
$HOME/renamer
As with all things you download off the Internet, make a backup first and just run on a small, copied, subset of your files till you get the idea of how it works.
If you use homebrew as your package manager, you could install rename using:
brew install rename
You could then take all the Perl from my other answer and condense it down to a couple of lines and embed it in a rename command which would give you the added benefit of being able to do dry-runs etc. The code below does exactly the same as my other answer but is somewhat harder to read for non_perl folk.
Your command would simply be:
rename --dry-run '
my #strings = map { s/\r|\n//g; $_=quotemeta($_) } `cat remove.txt`;
foreach my $string (#strings){ s/$string//; } ' *
Sample Output
'ilikecoffee(Ena)M-3_1' would be renamed to 'ilikecoffee'
'ilikecoffee-SOMe.fil' would be renamed to 'ilikecoffee'
'ilikecoffee.So[Me].filEna)M-3_2' would be renamed to 'ilikecoffee'
To try and understand it, remember:
the rename part applies the following Perl to each file because of the asterisk at the end
the #strings part reads all the strings from the file remove.txt and removes any carriage returns and linefeeds from them and quotes any metacharacters
the foreach applies each of the deletions to the current filename which rename stores in $_ for you
Note that this method trades simplicity for performance somewhat. If you have millions of files to do, the other method will be quicker because here I read the remove.txt file for each and every file whose name is checked, but if you only have a few hundred/thousand files, I doubt you'll notice it.
This should be much the same, just shorter:
rename --dry-run '
my #strings = `cat remove.txt`; chomp #strings;
foreach my $string (#strings){ s/\Q$string\E//; } ' *

Saving directory content to an array (bash) [duplicate]

This question already has answers here:
How do you store a list of directories into an array in Bash (and then print them out)?
(4 answers)
Closed 7 years ago.
I need to save content of two directories in an array to compare them later. Thats the solution i write:
DirContent()
{
#past '$1' directorys to 'directorys'
local DIRECTORYS=`ls -l --time-style="long-iso" $1 | egrep '^d' | awk '{print $8}'`
local CONTENT
local i
for DIR in $DIRECTORYS
do
i=+1
CONTENT[i]=${DIR}
done
echo $CONTENT
}
Then when I try to print this array I get empty output. Both directories are not empty. Please tell me what am I doing wrong here.
Thanks, Siery.
The core of this question is answered in the one I marked as a duplicate. Here are a few more pointers:
All uppercase variable names are discouraged as they are more likely to clash with environment variables.
You assign to DIRECTORYS (should probably be "directories") the output of a complicated command, which suffers from a few deficiencies:
Instead of backticks as in var=`command`, the syntax var=$(command) is preferred.
egrep is deprecated and grep -E is preferred.
The grep and awk commands could be combined to awk /^d/ '{ print $8 }'.
There are better ways to get directories, for example find, but the output of find shouldn't be parsed either.
You shouldn't process the output of ls programmatically: filenames can contain spaces, newlines, other special characters...
DIRECTORYS is now just one long string, and you rely on word splitting to iterate over it. Again, spaces in filenames will trip you up.
DIR isn't declared local.
To increase i, you'd use (( ++i )).
CONTENT[i]=${DIR} is actually okay: the i is automatically expanded here and doesn't have to be prepended by a $. Normally you'd want to quote your variables like "$dir", but in this case we happen to know that it won't be split any further as it already is the result of word splitting.
Array indices start at zero and you're skipping zero. You should increase the counter after the assignment.
Instead of using a counter, you can just append to an array with content+=("$dir").
To print the contents of an array, you'd use echo "${CONTENT[#]}".
But really, what you should do instead of all this: a call DirContent some_directory is equivalent to echo some_directory/*/, and if you want that in an array, you'd just use
arr=(some_directory/*/)
instead of the whole function – this even works for weird filenames. And is much, much shorter.
If you have hidden directories (names starts with .), you can use shopt -s dotglob to include them as well.
You can try
for((i=0;i<${#CONTENT[*]};i++))
do
echo ${CONTENT[$i]}
done
instead of echo $CONTENT
Also these change are required
((i=+1))
CONTENT[$i]=${DIR}
in your above code

Resources