IF depending on number of files per folder --unix - file

I want to do different actions based on the number of files in each folder which start with same two letters --- if files in TS are less than or equal 6 to do one set of actions and otherwise do another set
my data looks like this
files/TS01 -- which has 2 files
files/TS02 -- which has 5 files
files/TS03 -- which has 2 files
files/TS04 -- which has 7 files
files/TS05 -- which has 9 files
I have tried
FILES="$TS*"
for W in $FILES
do
doc=$(basename $W)
if [ $W -le 6 ]
then
....
done ...
fi
done
but I get an error saying "integer expression expected"
I have tried to
if [ ls $W -le 6 ]
I get another erros saying "too many arguments"
Can you please help

To get the number of lines I would recomend piping ls -l into wc -l, this will spit out the number of lines in your directory as follows...
Atlas $ ls -l | wc -l
19
I've made a small script which shows how you could then use this result to conditionally do one thing or another...
#!/bin/bash
amount=$(ls -l | wc -l)
if [ $amount -le 5 ]; then
echo -n "There aren't that many files, only "
else
echo -n "There are a lot of files, "
fi
echo $amount
When executed on a folder with 19 files it echoes..
Atlas $ ./howManyFiles.sh
There are a lot of files, 19
and on one with less then 5 files...
Atlas $ ./howManyFiles.sh
There aren't that many files, only 3
Hopefully this helps show you how to get a useable file count from a folder, and then how to use those results in an "if" statement!

Related

Bash array with spaces and no spaces in elements

I know this question has been asked in different manners, and I've referred to some answers on here to get to where I am now.
I'm trying to create a script to essentially watch a folder and virus scan files once they're not being written to.
The files/strings I need it to handle will sometimes contain spaces and sometimes not, as well sometimes special characters. At the moment it will actually work only on files with spaces in the name (as in actually scan and move them), but not for files without spaces. Also after each file (spaces or not) the while loop breaks thus stopping the script with the following output;
./howscan.sh: line 29: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
./howscan.sh: line 34: snmp.conf: syntax error: invalid arithmetic operator (error token is ".conf")
I had it working to handle file names without any spaces, but since I introduced the "${files[$i]}" method to use the array elements it only works on files with spaces and outputs the above error.
Feel free to omit the sepscan part of this, as I'm sure if I can get it working with the other tasks it'll work for that too (just wanted to show the full script for a complete understanding).
Current Script:
#!/bin/bash
set -x
workingpath='/mnt/Incoming/ToScan/'
outputpath='/mnt/Incoming/Scanned/'
logfile='/var/log/howscan.log'
faildir='/mnt/Incoming/ScanFailed/'
sepscan='/opt/Symantec/symantec_antivirus/sav manualscan -c'
# Change to working directory
cd $workingpath
# Exclude files with a given extension, in this case .aspx, and declare the remaining files as the array "files"
shopt -s extglob nullglob
# Loop the below - ie it's a watch folder
while true
do
# Exclude files with .aspx in the file name
files=( !(*.aspx) )
# If the array files is not empty then...
if [ ${#files[#]} -ne 0 ]; then
for i in ${files[*]}
# For every item in the array files process them as follows
# Declare any variables you wish to happen on files here, not globally
# Check each file to see if it's in use using fuser, do nothing and log it if its still in use, or process it and log the results
do
fileopen=`fuser "${files[$i]}" | wc -c`
# Here for 'fileopen' we are checking if the file is being writen to.
if [ $fileopen -ne 0 ]; then
echo `date` "${files[$i]}" is being used so has been ignored >> $logfile
else
echo `date` File "${files[$i]}" not being used or accessed >> $logfile
sepscan=`$sepscan "${files[$i]}" | wc -c`
if [ $sepscan = 0 ]; then
mv "${files[$i]}" $outputpath
echo `date` "${files[$i]}" has been virus scanned and moved to $outputpath >> $logfile
else
echo `date` "${files[$i]}" has been moved to $faildir as a virus or error was detected >> $logfile
fi
fi
done
fi
echo `date` 'No more files to process. Waiting 60 seconds...' >> $logfile
sleep 60
done
Let me know if I can provide anything else to help clarify my issue.
Update:
There is a file in the /mnt/Incoming/ToScan/ directory called snmp.conf by the way.
for i in ${files[*]}
should be
for i in ${!files[*]}
# or
for i in ${!files[#]}
${files[*]} expands to the contents of the array and undergoes word splitting. The above syntax expands to a list of indices of the array.
You might also need to double quote the variables, e.g.
if [ "$fileopen" -ne 0 ]; then

Putting files in directory into array variable

I'm writing bash code that will search for specific files in the directory it is run in and add them into an array variable. The problem I am having is formatting the results. I need to find all the compressed files in the current directory and display both the names and sizes of the files in order of last modified. I want to take the results of that command and put them into an array variable with each line element containing the file's name and corresponding size but I don't know how to do that. I'm not sure if I should be using command "find" instead of "ls" but here is what I have so far:
find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )
I'm not sure exactly what format you want the array to be in, but here is a snippet that creates an associative array keyed by filename with the size as the value:
$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26 test.zip
$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -A sizes=(["'test.zip'"]="0" ["'test.bz2'"]="0" )
$
And if you just want an array of literally "filename size" entries, that's even easier:
$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -a sizes=([0]="'test.zip' 0" [1]="'test.bz2' 0")
$
Both of these solutions work, and were tested via copy paste from this post.
The first is fairly slow. One problem is external program invocations within a loop - date for example, is invoked for every file. You could make it quicker by not including the date in the output array (see Notes below). Particularly for method 2 - that would result in no external command invocations inside the while loop. But method 1 is really the problem - orders of magnitude slower.
Also, somebody probably knows how to convert an epoch date to another format in awk for example, which could be faster. Maybe you could do the sort in awk too. Perhaps just keep the epoch date?
These solutions are bash / GNU heavy and not portable to other environments (bash here strings, find -printf). OP tagged linux and bash though, so GNU can be assumed.
Solution 1 - capture any compressed file - using file to match (slow)
The criteria for 'compressed' is if file output contains the word compress
Reliable enough, but perhaps there is a conflict with some other file type description?
file -l | grep compress (file 5.38, Ubuntu 20.04, WSL) indicates for me there are no conflicts at all (all files listed are compression formats)
I couldn't find a way of classifying any compressed file other than this
I ran this on a directory containing 1664 files - time (real) was 40 seconds
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
[[ "$(file --brief "${path%% *}")" == *compress* ]] &&
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" -type f -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - to test
printf '%s\n' "${compressed_files[#]}"
Solution 2 - use file extensions - orders of magnitude faster
If you know exactly what extensions you are looking for, you can
compose them in a find command
This is alot faster
On the same directory as above, containing 1664 files - time (real) was 200 miliseconds
This example looks for .gz, .zip, and .7z (gzip, zip and 7zip respectively)
I'm not sure if -type f -and -regex '.*[.]\(gz\|zip\|7z\) -and printf may be faster again, now I think of it. I started with globs cause I assumed that was quicker
That may also allow for storing the extension list in a variable..
This method avoids a file analysis on every file in your target
It also makes the while loop shorter - you're only iterating matches
Note the repetition of -printf here, this is due to the logic that
find uses: -printf is 'True'. If it were included by itself, it would
act as a 'match' and print all files
It has to be used as a result of a name match being true (using -and)
Perhaps somebody has a better composition?
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
while IFS= read -r -d '' path; do
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" \
-type f -and -name '*.gz' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.zip' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.7z' -and -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - for testing
printf '%s\n' "${compressed_files[#]}"
Sample output (of either method):
$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020
Note:
You can add a literal K to indicate the block size / units (kilobytes)
If you want to print the path only from this array, you can use suffix removal: printf '%s\n' "${files[#]&& *}"
For no date in the array (it's used to sort, but then its job may be done), simply remove $(date -d #${path##* }) (incl. the space).
Kind of tangential, but to use different date formats, replace $(date -d #${path##* }) with:
$(date -I -d #${path##* }) ISO format - note that short opts style: date -Id #[date] did not work for me
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) like ISO, but w/ seconds
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) same again, but w/ nanoseconds (find gives you nano seconds)
Sorry for the long post, hopefully it's informative.

Bash Array Script Exclude Duplicates

So I have written a bash script (named music.sh) for a Raspberry Pi to perform the following functions:
When executed, look into one single directory (Music folder) and select a random folder to look into. (Note: none of these folders here have subdirectories)
Once a folder within "Music" has been selected, then play all mp3 files IN ORDER until the last mp3 file has been reached
At this point, the script would go back to the folders in the "Music" directory and select another random folder
Then it would again play all mp3 files in that folder in order
Loop indefinitely until input from user
I have this code which does all of the above EXCEPT for the following items:
I would like to NOT play any other "album" that has been played before
Once all albums played once, then shutdown the system
Here is my code so far that is working (WITH duplicates allowed):
#!/bin/bash
folderarray=($(ls -d /home/alphekka/Music/*/))
for i in "${folderarray[#]}";
do
folderitems=(${folderarray[RANDOM % ${#folderarray[#]}]})
for j in "${folderitems[#]}";
do
echo `ls $j`
cvlc --play-and-exit "${j[#]}"
done
done
exit 0
Please note that there isn't a single folder or file that has a space in the name. If there is a space, then I face some issues with this code working.
Anyways, I'm getting close, but I'm not quite there with the entire functionality I'm looking for. Any help would be greatly appreciated! Thank you kindly! :)
Use an associative array as a set. Note that this will work for all valid folder and file names.
#!/bin/bash
declare -A folderarray
# Each folder name is a key mapped to an empty string
for d in /home/alphekka/Music/*/; do
folderarray["$d"]=
done
while [[ "${!folderarray[*]}" ]]; do
# Get a list of the remaining folder names
foldernames=( "${!folderarray[#]}" )
# Pick a folder at random
folder=${foldernames[RANDOM%${#foldernames[#]}]}
# Remove the folder from the set
# Must use single quotes; see below
unset folderarray['$folder']
for j in "$folder"/*; do
cvlc --play-and-exit "$j"
done
done
Dealing with keys that contain spaces (and possibly other special characters) is tricky. The quotes shown in the call to unset above are not syntactic quotes in the usual sense. They do not prevent $folder from being expanded, but they do appear to be used by unset itself to quote the resulting string.
Here's another solution: randomize the list of directories first, save the result in an array and then play (my script just prints) the files from each element of the array
MUSIC=/home/alphekka/Music
OLDIFS=$IFS
IFS=$'\n'
folderarray=($(ls -d $MUSIC/*/|while read line; do echo $RANDOM $line; done| sort -n | cut -f2- -d' '))
for folder in ${folderarray[*]};
do
printf "Folder: %s\n" $folder
fileArray=($(find $folder -type f))
for j in ${fileArray[#]};
do
printf "play %s\n" $j
done
done
For the random shuffling I used this answer.
One liner solution with mpv, rl (randomlines), xargs, find:
find /home/alphekka/Music/ -maxdepth 1 -type d -print0 | rl -d \0 | xargs -0 -l1 mpv

(pdksh) Looping through n files and store their dates in array

Using pdksh
stat() command is unavailable on system.
I need to loop through the amount of files found and store their dates in an array. $COMMAND stores the number of files found in $location as can be seen below.
Can someone help me please?
COMMAND=`find $location -type f | wc -l`
CMD_getDate=$(find $location -type f | xargs ls -lrt | awk '{print $6} {print $7}')
Well, first, you don't need to do the wc. The size of the array will tell you how many there are. Here's a simple way to build an array of dates and names (designed for pdksh; there are better ways to do it in AT&T ksh or bash):
set -A files
set -A dates
find "$location" -type f -ls |&
while read -p inum blocks symode links owner group size rest; do
set -A files "${files[#]}" "${rest##* }"
set -A dates "${dates[#]}" "${rest% *}"
done
Here's one way to examine the results:
print "Found ${#files[#]} files:"
let i=0
while (( i < ${#files[#]} )); do
print "The file '${files[i]}' was modified on ${dates[i]}."
let i+=1
done
This gives you the full date string, not just the month and day. That might be what you want - the date output of ls -l (or find -ls) is variable depending on how long ago the file was modified. Given these files, how does your original format distinguish between the modification times of a and b?
$ ls -l
total 0
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 2014 a
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 04:05 b
As written, the code above would yields this for the above directory with location=.:
Found 2 files:
The file './a' was modified on Feb 3 2014.
The file './b' was modified on Feb 3 00:00.
It would help if you indicated what the actual end goal is..

Need bash to separate cat'ed string to separate variables and do a for loop

I need to get a list of files added to a master folder and copy only the new files to the respective backup folders; The paths to each folder have multiple folders, all named by numbers and only 1 level deep.
ie /tester/a/100
/tester/a/101 ...
diff -r returns typically "Only in /testing/a/101: 2093_thumb.png" per line in the diff.txt file generated.
NOTE: there is a space after the colon
I need to get the 101 from the path and filename into separate variables and copy them to the backup folders.
I need to get the lesserfolder var to get 101 without the colon
and mainfile var to get 2093_thumb.png from each line of the diff.txt and do the for loop but I can't seem to get the $file to behave. Each time I try testing to echo the variables I get all the wrong results.
#!/bin/bash
diff_file=/tester/diff.txt
mainfolder=/testing/a
bacfolder= /testing/b
diff -r $mainfolder $bacfolder > $diff_file
LIST=`cat $diff_file`
for file in $LIST
do
maindir=$file[3]
lesserfolder=
mainfile=$file[4]
# cp $mainfolder/$lesserFolder/$mainfile $bacfolder/$lesserFolder/$mainfile
echo $maindir $mainfile $lesserfolder
done
If I could just get the echo statement working the cp would work then too.
I believe this is what you want:
#!/bin/bash
diff_file=/tester/diff.txt
mainfolder=/testing/a
bacfolder= /testing/b
diff -r -q $mainfolder $bacfolder | egrep "^Only in ${mainfolder}" | awk '{print $3,$4}' > $diff_file
cat ${diff_file} | while read foldercolon mainfile ; do
folderpath=${foldercolon%:}
lesserFolder=${folderpath#${mainfolder}/}
cp $mainfolder/$lesserFolder/$mainfile $bacfolder/$lesserFolder/$mainfile
done
But it is much more reliable (and much easier!) to use rsync for this kind of backup. For example:
rsync -a /testing/a/* /testing/b/
You could try a while read loop
diff -r $mainfolder $bacfolder | while read dummy dummy dir file; do
echo $dir $file
done

Resources