(pdksh) Looping through n files and store their dates in array - arrays

Using pdksh
stat() command is unavailable on system.
I need to loop through the amount of files found and store their dates in an array. $COMMAND stores the number of files found in $location as can be seen below.
Can someone help me please?
COMMAND=`find $location -type f | wc -l`
CMD_getDate=$(find $location -type f | xargs ls -lrt | awk '{print $6} {print $7}')

Well, first, you don't need to do the wc. The size of the array will tell you how many there are. Here's a simple way to build an array of dates and names (designed for pdksh; there are better ways to do it in AT&T ksh or bash):
set -A files
set -A dates
find "$location" -type f -ls |&
while read -p inum blocks symode links owner group size rest; do
set -A files "${files[#]}" "${rest##* }"
set -A dates "${dates[#]}" "${rest% *}"
done
Here's one way to examine the results:
print "Found ${#files[#]} files:"
let i=0
while (( i < ${#files[#]} )); do
print "The file '${files[i]}' was modified on ${dates[i]}."
let i+=1
done
This gives you the full date string, not just the month and day. That might be what you want - the date output of ls -l (or find -ls) is variable depending on how long ago the file was modified. Given these files, how does your original format distinguish between the modification times of a and b?
$ ls -l
total 0
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 2014 a
-rw-rw-r--+ 1 mjreed staff 0 Feb 3 04:05 b
As written, the code above would yields this for the above directory with location=.:
Found 2 files:
The file './a' was modified on Feb 3 2014.
The file './b' was modified on Feb 3 00:00.
It would help if you indicated what the actual end goal is..

Related

How to get a list of files of the current directory sorted by modification date in a bash script? [duplicate]

This question already has an answer here:
Find files in current directory sorted by modified time and store result in an array
(1 answer)
Closed 12 months ago.
I would like to get a list (or array) of all files in my current directory which is sorted by modification date. In the terminal, something like ls -lt works, but that should not be used in a bash script (http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29)...
I tried to use the -nt opterator (https://tips.tutorialhorizon.com/2017/11/18/nt-file-test-operator-in-bash/) but I am hoping that there is a more simple and elegant solution to this.
This might help you:
In bash with GNU extensions:
Creating an array
mapfile -d '' a < <(find -maxdepth 1 -type f "%T# %p\0" | sort -z -k1,1g | cut -z -d ' ' -f2)
or looping over the files:
while read -r -d '' _ file; do
echo "${file}"
done < <(find -maxdepth 1 -type f "%T# %p\0" | sort -z -k1,1g)
Here we build up a list of files with the NULL-character as the delimiter. The field itself consists of the modification date in the epoch followed by a space and the file name. We use sort to sort that list by modification date. The output of this is passed to a while loop that reads the fields per zero-terminated record. The first field is the modification date which we read in _ and the remainder is passed to file.
In ZSH:
If you want to use another shell like zsh, you can just do something like:
a=( *(Om) )
or
for file in *(Om); do echo "${file}"; done
here Om is a glob-modifier that tells ZSH to sort the output by modification date.

Putting files in directory into array variable

I'm writing bash code that will search for specific files in the directory it is run in and add them into an array variable. The problem I am having is formatting the results. I need to find all the compressed files in the current directory and display both the names and sizes of the files in order of last modified. I want to take the results of that command and put them into an array variable with each line element containing the file's name and corresponding size but I don't know how to do that. I'm not sure if I should be using command "find" instead of "ls" but here is what I have so far:
find_files="$(ls -1st --block-size=MB)"
arr=( ($find_files) )
I'm not sure exactly what format you want the array to be in, but here is a snippet that creates an associative array keyed by filename with the size as the value:
$ ls -l test.{zip,bz2}
-rw-rw-r-- 1 user group 0 Sep 10 13:27 test.bz2
-rw-rw-r-- 1 user group 0 Sep 10 13:26 test.zip
$ declare -A sizes; while read SIZE FILENAME ; do sizes["$FILENAME"]="$SIZE"; done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -A sizes=(["'test.zip'"]="0" ["'test.bz2'"]="0" )
$
And if you just want an array of literally "filename size" entries, that's even easier:
$ while read SIZE FILENAME ; do sizes+=("$FILENAME $SIZE"); done < <(find * -prune -name '*.zip' -o -name *.bz2 | xargs stat -c "%Y %s %N" | sort | cut -f 2,3 -d " ")
$ echo "${sizes[#]#A}"
declare -a sizes=([0]="'test.zip' 0" [1]="'test.bz2' 0")
$
Both of these solutions work, and were tested via copy paste from this post.
The first is fairly slow. One problem is external program invocations within a loop - date for example, is invoked for every file. You could make it quicker by not including the date in the output array (see Notes below). Particularly for method 2 - that would result in no external command invocations inside the while loop. But method 1 is really the problem - orders of magnitude slower.
Also, somebody probably knows how to convert an epoch date to another format in awk for example, which could be faster. Maybe you could do the sort in awk too. Perhaps just keep the epoch date?
These solutions are bash / GNU heavy and not portable to other environments (bash here strings, find -printf). OP tagged linux and bash though, so GNU can be assumed.
Solution 1 - capture any compressed file - using file to match (slow)
The criteria for 'compressed' is if file output contains the word compress
Reliable enough, but perhaps there is a conflict with some other file type description?
file -l | grep compress (file 5.38, Ubuntu 20.04, WSL) indicates for me there are no conflicts at all (all files listed are compression formats)
I couldn't find a way of classifying any compressed file other than this
I ran this on a directory containing 1664 files - time (real) was 40 seconds
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
# Make the array
# A here string (<<<) must be used, to keep array in the global environment
while IFS= read -r -d '' path; do
[[ "$(file --brief "${path%% *}")" == *compress* ]] &&
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" -type f -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - to test
printf '%s\n' "${compressed_files[#]}"
Solution 2 - use file extensions - orders of magnitude faster
If you know exactly what extensions you are looking for, you can
compose them in a find command
This is alot faster
On the same directory as above, containing 1664 files - time (real) was 200 miliseconds
This example looks for .gz, .zip, and .7z (gzip, zip and 7zip respectively)
I'm not sure if -type f -and -regex '.*[.]\(gz\|zip\|7z\) -and printf may be faster again, now I think of it. I started with globs cause I assumed that was quicker
That may also allow for storing the extension list in a variable..
This method avoids a file analysis on every file in your target
It also makes the while loop shorter - you're only iterating matches
Note the repetition of -printf here, this is due to the logic that
find uses: -printf is 'True'. If it were included by itself, it would
act as a 'match' and print all files
It has to be used as a result of a name match being true (using -and)
Perhaps somebody has a better composition?
#!/bin/bash
# Capture all files, recursively, in $TARGET, that are
# compressed files. In an indexed array. Using file name
# extensions to match.
# Initialise variables, and check the target is valid
declare -g c= compressed_files= path= TARGET=$1
[[ -r "$TARGET" ]] || exit 1
while IFS= read -r -d '' path; do
compressed_files[c++]="${path% *} $(date -d #${path##* })"
done < \
<(
find "$TARGET" \
-type f -and -name '*.gz' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.zip' -and -printf '%p %s %T#\0' -or \
-type f -and -name '*.7z' -and -printf '%p %s %T#\0' |
awk '{$2 = ($2 / 1024); print}' |
sort -n -k 3
)
# Print results - for testing
printf '%s\n' "${compressed_files[#]}"
Sample output (of either method):
$ comp-find.bash /tmp
/tmp/comptest/websters_english_dictionary.tmp.tar.gz 265.148 Thu Sep 10 07:53:37 AEST 2020
/tmp/comptest/What_is_Systems_Architecture_PART_1.tar.gz 1357.06 Thu Sep 10 08:17:47 AEST 2020
Note:
You can add a literal K to indicate the block size / units (kilobytes)
If you want to print the path only from this array, you can use suffix removal: printf '%s\n' "${files[#]&& *}"
For no date in the array (it's used to sort, but then its job may be done), simply remove $(date -d #${path##* }) (incl. the space).
Kind of tangential, but to use different date formats, replace $(date -d #${path##* }) with:
$(date -I -d #${path##* }) ISO format - note that short opts style: date -Id #[date] did not work for me
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) like ISO, but w/ seconds
$(date -d #${path##* } +%Y-%M-%d_%H-%m-%S) same again, but w/ nanoseconds (find gives you nano seconds)
Sorry for the long post, hopefully it's informative.

change folder modified date based on most recent file modified date in folder

I have a number of project folders that all got their date modified set to the current date & time somehow, despite not having touched anything in the folders. I'm looking for a way to use either a batch applet or some other utility that will allow me to drop a folder/folders on it and have their date modified set to the date modified of the most recently modified file in the folder. Can anyone please tell me how I can do this?
In case it matters, I'm on OS X Mavericks 10.9.5. Thanks!
If you start a Terminal, and use stat you can get the modification times of all the files and their corresponding names, separated by a colon as follows:
stat -f "%m:%N" *
Sample Output
1476985161:1.png
1476985168:2.png
1476985178:3.png
1476985188:4.png
...
1476728459:Alpha.png
1476728459:AlphaEdges.png
You can now sort that and take the first line, and remove the timestamp so you have the name of the newest file:
stat -f "%m:%N" *png | sort -rn | head -1 | cut -f2 -d:
Sample Output
result.png
Now, you can put that in a variable, and use touch to set the modification times of all the other files to match its modification time:
newest=$(stat -f "%m:%N" *png | sort -rn | head -1 | cut -f2 -d:)
touch -r "$newest" *
So, if you wanted to be able to do that for any given directory name, you could make a little script in your HOME directory called setMod like this:
#!/bin/bash
# Check that exactly one parameter has been specified - the directory
if [ $# -eq 1 ]; then
# Go to that directory or give up and die
cd "$1" || exit 1
# Get name of newest file
newest=$(stat -f "%m:%N" * | sort -rn | head -1 | cut -f2 -d:)
# Set modification times of all other files to match
touch -r "$newest" *
fi
Then make that executable, just necessary one time, with:
chmod +x $HOME/setMod
Now, you can set the modification times of all files in /tmp/freddyFrog like this:
$HOME/setMod /tmp/freddyFrog
Or, if you prefer, you can call that from Applescript with a:
do shell script "$HOME/setMod " & nameOfDirectory
The nameOfDirectory will need to look Unix-y (like /Users/mark/tmp) rather than Apple-y (like Macintosh HD:Users:mark:tmp).

Count ip repeat in log from bash

bash as I can tell from the repetition of an IP within a log through a specific search?
By example:
#!/bin/bash
# Log line: [Sat Jul 04 21:55:35 2015] [error] [client 192.168.1.39] Access denied with status code 403.
grep "status\scode\s403" /var/log/httpd/custom_error_log | while read line ; do
pattern='^\[.*?\]\s\[error\]\s\[client\s(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\].*?403'
[[ $line =~ $pattern ]]
res_remote_addr="${BASH_REMATCH[1]}.${BASH_REMATCH[2]}.${BASH_REMATCH[3]}.${BASH_REMATCH[4]}"
echo "Remote Addr: $res_remote_addr"
done
I need to know the end results obtained a few times each message 403 ip, if possible sort highest to lowest.
By example output:
200.200.200.200 50 times.
200.200.200.201 40 times.
200.200.200.202 30 times.
... etc ...
This we need to create an html report from a monthly log of apache in a series of events (something like awstats).
there are better ways. following is my proposal, which should be more readable and easier to maintain:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' log_file | sort | uniq -c | sort -k1,1 -r -n
output should be in a form of:
count1 ip1
count2 ip2
update:
filter only 403:
grep -P -o '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?=.*403)' log_file | sort | uniq -c | sort -k1,1 -r -n
notice that a look ahead would suffice.
If log file is in the format as mentioned in question, the best is to use awk to filter out the status code needed plus output only the IP. Then use the uniq command to count each occurence:
awk '/code 403/ {print $8}' error.log | sort | uniq -c |sort -n
In awk, we filter by regexp /code 403/ and then for matching lines we print the 8th value (values are separated by whitespace), which is the IP.
Then we need to sort the output, so that the same IPs are one after another - this is requirement of the uniq program.
uniq -c prints each unique line from input only once - and preceded by the number of occurences. Finnaly we sort this list numericaly to get the IPs sorted by count.
Sample output (first is no. of occurences, second is IP):
1 1.1.1.1
10 2.2.2.2
12 3.3.3.3

IF depending on number of files per folder --unix

I want to do different actions based on the number of files in each folder which start with same two letters --- if files in TS are less than or equal 6 to do one set of actions and otherwise do another set
my data looks like this
files/TS01 -- which has 2 files
files/TS02 -- which has 5 files
files/TS03 -- which has 2 files
files/TS04 -- which has 7 files
files/TS05 -- which has 9 files
I have tried
FILES="$TS*"
for W in $FILES
do
doc=$(basename $W)
if [ $W -le 6 ]
then
....
done ...
fi
done
but I get an error saying "integer expression expected"
I have tried to
if [ ls $W -le 6 ]
I get another erros saying "too many arguments"
Can you please help
To get the number of lines I would recomend piping ls -l into wc -l, this will spit out the number of lines in your directory as follows...
Atlas $ ls -l | wc -l
19
I've made a small script which shows how you could then use this result to conditionally do one thing or another...
#!/bin/bash
amount=$(ls -l | wc -l)
if [ $amount -le 5 ]; then
echo -n "There aren't that many files, only "
else
echo -n "There are a lot of files, "
fi
echo $amount
When executed on a folder with 19 files it echoes..
Atlas $ ./howManyFiles.sh
There are a lot of files, 19
and on one with less then 5 files...
Atlas $ ./howManyFiles.sh
There aren't that many files, only 3
Hopefully this helps show you how to get a useable file count from a folder, and then how to use those results in an "if" statement!

Resources