Bash array with spaces and no spaces in elements - arrays

I know this question has been asked in different manners, and I've referred to some answers on here to get to where I am now.
I'm trying to create a script to essentially watch a folder and virus scan files once they're not being written to.
The files/strings I need it to handle will sometimes contain spaces and sometimes not, as well sometimes special characters. At the moment it will actually work only on files with spaces in the name (as in actually scan and move them), but not for files without spaces. Also after each file (spaces or not) the while loop breaks thus stopping the script with the following output;
./howscan.sh: line 29: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
./howscan.sh: line 34: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
I had it working to handle file names without any spaces, but since I introduced the "${files[$i]}" method to use the array elements it only works on files with spaces and outputs the above error.
Feel free to omit the sepscan part of this, as I'm sure if I can get it working with the other tasks it'll work for that too (just wanted to show the full script for a complete understanding).
Current Script:
#!/bin/bash
set -x
workingpath='/mnt/Incoming/ToScan/'
outputpath='/mnt/Incoming/Scanned/'
logfile='/var/log/howscan.log'
faildir='/mnt/Incoming/ScanFailed/'
sepscan='/opt/Symantec/symantec_antivirus/sav manualscan -c'
# Change to working directory
cd $workingpath
# Exclude files with a given extension, in this case .aspx, and declare the remaining files as the array "files"
shopt -s extglob nullglob
# Loop the below - ie it's a watch folder
while true
do
# Exclude files with .aspx in the file name
files=( !(*.aspx) )
# If the array files is not empty then...
if [ ${#files[#]} -ne 0 ]; then
for i in ${files[*]}
# For every item in the array files process them as follows
# Declare any variables you wish to happen on files here, not globally
# Check each file to see if it's in use using fuser, do nothing and log it if its still in use, or process it and log the results
do
fileopen=`fuser "${files[$i]}" | wc -c`
# Here for 'fileopen' we are checking if the file is being writen to.
if [ $fileopen -ne 0 ]; then
echo `date` "${files[$i]}" is being used so has been ignored >> $logfile
else
echo `date` File "${files[$i]}" not being used or accessed >> $logfile
sepscan=`$sepscan "${files[$i]}" | wc -c`
if [ $sepscan = 0 ]; then
mv "${files[$i]}" $outputpath
echo `date` "${files[$i]}" has been virus scanned and moved to $outputpath >> $logfile
else
echo `date` "${files[$i]}" has been moved to $faildir as a virus or error was detected >> $logfile
fi
fi
done
fi
echo `date` 'No more files to process. Waiting 60 seconds...' >> $logfile
sleep 60
done
Let me know if I can provide anything else to help clarify my issue.
Update:
There is a file in the /mnt/Incoming/ToScan/ directory called snmp.conf by the way.

for i in ${files[*]}
should be
for i in ${!files[*]}
# or
for i in ${!files[#]}
${files[*]} expands to the contents of the array and undergoes word splitting. The above syntax expands to a list of indices of the array.
You might also need to double quote the variables, e.g.
if [ "$fileopen" -ne 0 ]; then

Related

Bash: Using a config file to set variables, but doing it more safely

Extending this question and answer, I'd like some help exploring some solutions to making this exercise of using source to bring a config file into a Bash file more "safely." I say "more safely" because I recognize it may be impossible to do with 100% safety.
I want to use a config file to set variables and arrays and have some comments throughout. Everything else should be disallowed.
The above Q&A suggested starting a regex line to check for things we want, versus what we don't want, before passing it to source.
For example, the regex could be:
(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)
But I'm looking for help in both refactoring that regex, or considering other pathways to get what we're after in the Bash script below, especially after wondering if this approach is futile in the first place?
Example
This is what the desired config file would look like:
#!/bin/bash
disks=([0-UUID]=1234567890123 [0-MountPoint]='/some/path/')
disks+=([1-UUID]=4567890123456 [1-MountPoint]='/some/other/path')
# ...
someNumber=1
rsyncExclude=('.*/' '/dev/' '/proc/' '/sys/' '/tmp/' '/mnt/' '/media/' '/lost+found' '.Trash-*/' '[$]RECYCLE.BIN/' '/System Volume Information/' 'pagefile.sys' '/temp/' '/Temp/' '/Adobe/')
remote='this#123.123.123.123'
# there should be nothing in the config more complicated than above
And this is a simplified version of the bash script it will go into, using the example from #Erman in the Q/A linked to above, to do the checking:
#!/bin/bash
configFile='/blah/blah/config.file'
if [[ -f "${configFile}" ]]; then
# check if the config file contains any commands because that is unexpected and unsafe
disallowedSyntax="(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)"
if egrep -q -iv "${disallowedSyntax}" "${configFile}"; then
printf "%s\n" 'The configuration file is not safe!' >&2 # print to STDERR
exit 1
else
# config file might be okay
if result=$( bash -n "${configFile}" 2>&1 ); then
# set up the 'disk' associative array first and then import
declare -A disks
source <(awk '/^\s*\w++?=/' "${configFile}")
# ...
else
# config has syntax error
printf '%s\n' 'The configuration file has a syntax error.' >&2
exit 1
fi
fi
else
# config file doesn't exist?
printf '%s\n' "The configuration file doesn't exist." >&2
exit 1
fi
I imagine below is ideally what we want to be allowed and disallowed as a starting point?
Allowed
# whole numbers only
var=1
var=123
# quoted stuff
var='foo bar'
var="foo bar"
# arrays
var=('foo' 'bar')
var=("foo" "bar")
var=([0-foo]=1 [0-bar]='blah' ...
var+=(...
# vars with underscores, same format as above
foo_bar=1
...
foo_bar+=(...
# and that's it?
Not allowed*
* Not an exhaustive list (and I'm certain I'm missing things) but the idea is to at least disallow anything not quoted (unless it's a number), and then also anything else that would allow unleash_virus to be run:
var=notquoted
...
var=notquoted unleash_virus
var=`unleash_virus`
...
var='foo bar' | unleash_virus
...
var="foo bar"; unleash_virus
var="foo bar" && unleash_virus
var="foo bar $(unleash_virus)"
...
At least one issue you might encounter is the ${configFile} changing between the syntax check and the subsequent sourcing:
# configFile might seem save according to your syntax rules:
if egrep -q -iv "${disallowedSyntax}" "${configFile}"; then
printf "%s\n" 'The backup configuration file is not safe!' >&2
exit 1
else
if result=$( bash -n "${configFile}" 2>&1 ); then
declare -A disks
# Warning: config file might have changed
source "${configFile}"
If you cannot guarantee that the contents of the config file remain the same then your regex-check won't help you much.
Since you wanted specific feedback on the regex; Here is a variable assignment with a quoted value that is not allowed by the regex:
some_regex_config='\s'
Note that this was the regex of time of answering:
(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)
Here's a start, thanks to #SasaKanjuh.
Instead of checking for disallowed syntax, we could use awk to only pass parts of the config file that match formatting we expect to eval, and nothing else.
For example, we expect that variables must have some kind of quoting (unless they solely contain a number); arrays start and end with () as usual; and everything else should be ignored...
Here's the awk line that does this:
awk '/^\s*\w+\+?=(\(|[0-9]+$|["'\''][^0-9]+)/ && !/(\$\(|&&|;|\||`)/ { print gensub("(.*[\"'\''\\)]).*", "\\1", 1) }' ./example.conf
first part captures line starting with variable name, until =
then after = sign, it is looking for (, numerical value, or ' or " followed by a string
second part excludes lines with $(), &&, ; and |
and gensub captures everything including last occurrence of ' or " or ), ignoring everything after.
#!/bin/bash
configFile='./example.conf'
if [[ -f "${configFile}" ]]; then
# config file exists, check if it has OK bash syntax
if result=$( bash -n "${configFile}" 2>&1 ); then
# seems parsable, import the config file
# filter the contents using `awk` first so we're only accepting vars formatted like this:
# var=1
# var='foo bar'
# var="foo bar"
# var=('array' 'etc')
# var+=('and' "so on")
# and everything else should be ignored:
# var=unquoted
# var='foo bar' | unleash_virus
# var='foo bar'; unleash_virus
# var='foo' && unleash_virus
# var=$(unleash_virus)
# var="$(unleash_virus)"
# ...etc
if config=$(awk '/^\s*\w+\+?=(\(|[0-9]+$|["'\''][^0-9]+)/ && !/(\$\(|&&|;|\||`)/ { print gensub("(.*[\"'\''\\)]).*", "\\1", 1) }' "${configFile}"); then
# something matched
# now actually insert the config data into this session by passing it to `eval`
eval "${config}"
else
# no matches from awk
echo "No config content to work with."
exit 1
fi
else
# config file didn't pass the `bash -n` test
echo "Config contains invalid syntax."
exit 1
fi
else
# config file doesn't exist or isn't a file
echo "There is no config file."
exit 1
fi

Bash Array Script Exclude Duplicates

So I have written a bash script (named music.sh) for a Raspberry Pi to perform the following functions:
When executed, look into one single directory (Music folder) and select a random folder to look into. (Note: none of these folders here have subdirectories)
Once a folder within "Music" has been selected, then play all mp3 files IN ORDER until the last mp3 file has been reached
At this point, the script would go back to the folders in the "Music" directory and select another random folder
Then it would again play all mp3 files in that folder in order
Loop indefinitely until input from user
I have this code which does all of the above EXCEPT for the following items:
I would like to NOT play any other "album" that has been played before
Once all albums played once, then shutdown the system
Here is my code so far that is working (WITH duplicates allowed):
#!/bin/bash
folderarray=($(ls -d /home/alphekka/Music/*/))
for i in "${folderarray[#]}";
do
folderitems=(${folderarray[RANDOM % ${#folderarray[#]}]})
for j in "${folderitems[#]}";
do
echo `ls $j`
cvlc --play-and-exit "${j[#]}"
done
done
exit 0
Please note that there isn't a single folder or file that has a space in the name. If there is a space, then I face some issues with this code working.
Anyways, I'm getting close, but I'm not quite there with the entire functionality I'm looking for. Any help would be greatly appreciated! Thank you kindly! :)
Use an associative array as a set. Note that this will work for all valid folder and file names.
#!/bin/bash
declare -A folderarray
# Each folder name is a key mapped to an empty string
for d in /home/alphekka/Music/*/; do
folderarray["$d"]=
done
while [[ "${!folderarray[*]}" ]]; do
# Get a list of the remaining folder names
foldernames=( "${!folderarray[#]}" )
# Pick a folder at random
folder=${foldernames[RANDOM%${#foldernames[#]}]}
# Remove the folder from the set
# Must use single quotes; see below
unset folderarray['$folder']
for j in "$folder"/*; do
cvlc --play-and-exit "$j"
done
done
Dealing with keys that contain spaces (and possibly other special characters) is tricky. The quotes shown in the call to unset above are not syntactic quotes in the usual sense. They do not prevent $folder from being expanded, but they do appear to be used by unset itself to quote the resulting string.
Here's another solution: randomize the list of directories first, save the result in an array and then play (my script just prints) the files from each element of the array
MUSIC=/home/alphekka/Music
OLDIFS=$IFS
IFS=$'\n'
folderarray=($(ls -d $MUSIC/*/|while read line; do echo $RANDOM $line; done| sort -n | cut -f2- -d' '))
for folder in ${folderarray[*]};
do
printf "Folder: %s\n" $folder
fileArray=($(find $folder -type f))
for j in ${fileArray[#]};
do
printf "play %s\n" $j
done
done
For the random shuffling I used this answer.
One liner solution with mpv, rl (randomlines), xargs, find:
find /home/alphekka/Music/ -maxdepth 1 -type d -print0 | rl -d \0 | xargs -0 -l1 mpv

Bash file creation from variable

I am trying to create files from an array called columnHeaders[]
Within the array, I have test values that are currently:
id, source, EN, EN-GB, French-FR, French-DE
When I run my code I get
EN
EN.xml
EN-GB
EN-GB.xml
French-FR
French-FR.xml
French-DE
.xmlch-DE
NOTE that the FRENCH-DE filename gets morphed into .xmlch-DE
Why is this? I can't for the life of me figure out whey only that file ends up looking like this. It's driving me crazy!
Thanks for any help.
below is the snippet of my code that is causing me problems:
# take all the languages and either find the files or create new files for them. The language options
# should be stored in the columnHeader array in positions 3 - n
# cycle through all the output languages (so exclude "id, source' inputs)
# in my example, numWordsInLine is 6
c=2
while [ $c -lt $numWordsInLine ]; do
OUTPUT_LANG="${columnHeaders[$c]}"
echo "$OUTPUT_LANG"
OUTPUT_FILE="$OUTPUT_LANG".xml
# HERE'S WHERE YOU CAN SEE OUTPUT_FILE IS WRONG FOR FRENCH_DE
echo "$OUTPUT_FILE"
OUTPUT_BAK="$OUTPUT_LANG".bak
TMP_FILE="~tmp.xml"
if [ -f "$OUTPUT_BAK" ]; then
rm "$OUTPUT_BAK"
fi
# make a backup of the original language.xml file in case of program error or interruption
if [ -f "$OUTPUT_FILE" ]; then
mv "$OUTPUT_FILE" "$OUTPUT_BAK"
fi
if [ -f "$TMP_FILE" ]; then
rm "$TMP_FILE"
fi
c=$(expr $c + 1)
done
I'm betting that you are reading that line of data from a file with DOS newlines.
I'm also betting that the contents of the variable are "fine" but include a trailing carriage return.
Try printf %q\\n "$OUTPUT_FILE" or echo "$OUTPUT_FILE" | cat -v to see.
Then use something like dos2unix on the file to convert it.
Extra (unrelated) comments:
There's also no reason to use expr. ((c++)) will do what you want.
You could even turn the loop itself into for ((c=2;c < $numWordsInLine; c++)); do if you wanted to.
$numWordsInLine is also unnecessary if $columnHeaders is already split into the right "words" since you can use ${#columnHeaders} to get the length.

declare global array in shell [duplicate]

This question already has an answer here:
How to use global arrays in bash?
(1 answer)
Closed 6 years ago.
Here is the code which I need to separate the files in array, but using the PIPE it is generating subshell so am not able to get access to arrays normal, executable and directory.and its not printing anything or don't know what is happening after #////////.Please help me regarding this.
i=0
j=0
k=0
normal[0]=
executable[0]=
directory[0]=
ls | while read line
do
if [ -f $line ];then
#echo "this is normal file>> $line"
normal[i]=$line
i=$((i+1))
fi
if [ -x $line ];then
#echo "this is executable file>> $line"
executable[j]=$line
j=$((j+1))
fi
if [ -d $line ];then
#echo "this is directory>> $line"
directory[k]=$line
k=$((k+1))
fi
done
#//////////////////////////////////////
echo "normal files are"
for k in "${normal[#]}"
do
echo "$k"
done
echo "executable files are"
for k in "${executable[#]}"
do
echo "$k"
done
echo "directories are"
for k in "${directory[#]}"
do
echo "$k"
done
There are several flaws to your script :
Your if tests should be written with [[, not [, which is for binary comparison (more info : here). If you want to keep [ or are not using bash, you will have to quote your line variable, i.e. write all your tests like this : if [ -f "$line" ];then
Don't use ls to list the current directory as it misbehaves in some cases. A glob would be more suited in your case (more info: here)
If you want to avoid using a pipe, use a for loop instead. Replace ls | while read line with for line in $(ls) or, to take my previous point in acount, for line in *
After doing that, I tested your script and it worked perfectly fine. You should note that some folders will be listed under both under "executable files" and "directories", due to them having +x rights (I don't know if this is the behaviour you wanted).
As a side note, you don't need to declare variables in bash before using them. Your first 6 lines are thus un-necessary. Variables i,j,k are not necessary as well as you can dynamicaly increment an array with the following syntax : normal+=("$line").
The simplest thing to do is to keep the subshell open until you no longer need the arrays. In other words:
ls | { while read line; do
...
echo "directories: ${directory[#]}" | tr ' ' \\n
}
In other words, add an open brace before the while and a closing brace at the end of the script.

Check whether files in a file list exist in a certain directory

The runtime arguments are as follows: $1 is the path to the file containing the list of files
$2 is the path to the directory containing the files
What I want to do is check that each file listed in $1 exists in the $2 directory
I'm thinking something like:
for f in 'cat $1'
do
if (FILEEXISTSIN$2DIRECTORY)
then echo '$f exists in $2'
else echo '$f is missing in $2' sleep 5 exit
fi
done
As you can see, I want it so that if any of the files listed in $1 don't exist in $2 directory, the script states this then closes. The only part I can't get my head around is the (FILEEXISTSIN$2DIRECTORY) part. I know that you can do [ -e $f ], but I don't know how you can make sure its checking that it exists in the $2 directory.
Edit: Thinking further upon this, perhaps I could use nested for loops?
If your specified input file contains a newline-separated list of files to check, then the following solution (using a while read loop) is robust enough to handle file names with spaces properly.
Generally, you should never make use of a loop of the form for i in $(command), and instead opt for a while loop. See http://mywiki.wooledge.org/DontReadLinesWithFor for more details.
while read -r file; do
if [[ -e "$2/$file" ]]; then
echo "$f exists in $2"
else
echo "$f does not exist in $2"
sleep 5
exit 1
fi
done < "$1"
Since you're dealing with a list of file names without spaces in the names (because the $(cat $1) notation will split things up like that), it is relatively straight forward:
for file in $(cat $1)
do
if [ -e "$2/$file" ]
then echo "$file exists in $2"
else echo "$file is missing in $2"; sleep 5; exit 1
fi
done
Basically, use the built-in string concatenation facilities to build the full path to the file, and use the test or [ operator to check the files existence.
The complexities arise if you have to deal with arbitrary file names, especially if one of the arbitrary characters in an arbitrary file name can be the newline character. Suffice to say, they complicate the issue sufficiently that I won't deal with it unless you say you need it dealt with, and even then, I'll negotiate on whether newlines in names need to be handled. The double-quoted variable expansion is a key part of the strategy for dealing with it. The other part of the problem is how to get the file names accurately into a variable.

Resources