bash, list filenames, array - arrays

I have a dir structure like
$ ls /comp/drive/
2009 2010 2011 2012 2013 2014
$ ls 2009
01 02 03 04 05 06 07 09 10 11 12
$ ls 2013
01 02 04 05 06 08 09 10 12
$ ls 2013/04/*.nc
file4.nc file44.nc file45.nc file49.nc
There are dirs like years and each year there are few months dirs and inside are .nc files.
What I want to do is get the array of filenames provided start and end years/months.
e.g. sYear=2011; eYear=2013; sMonth=03; eMonth=08
So, I want to get the array of all filenames from year 2011/03 to 2013/08 only without going inside the dirs.
Any bash trick?

sYear=2011; eYear=2013; sMonth=03; eMonth=08
# prevent bugs from interpreting numbers as hex
sMonth=$(( 10#$sMonth ))
eMonth=$(( 10#$eMonth ))
files=( )
for (( curYear=sYear; curYear <= eYear; curYear++ )); do
# include only months after sMonth
for monthDir in "$curYear"/*/; do
[[ -e $monthDir ]] || continue # ignore years that don't exist
curMonth=${monthDir##*/}
(( curMonth )) || continue # ignore non-numeric directory names
(( curYear == sYear )) && (( 10#$curMonth < sMonth )) && continue
(( curYear == eYear )) && (( 10#$curMonth > eMonth )) && continue
files+=( "$monthDir"/*.nc )
done
done
printf '%q\n' "${files[#]}"

Try this:
sYear=2011
sMonth=03
eYear=2013
eMonth=08
shopt -s nullglob
declare -a files
for year in *; do
(( ${year} < ${sYear} || ${year} > ${eYear} )) && continue
for year_month in ${year}/*; do
month=${year_month##*/}
(( ${year} == ${sYear} && ${month##0} < ${sMonth##0} )) && continue;
(( ${year} == ${eYear} && ${month##0} > ${eMonth##0} )) && continue;
files+=(${year_month}/*.nc)
done
done
echo "${files[#]}"
# printf "$(pwd)/%q\n" "${files[#]}" # for full path

Related

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'
Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'
With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'
Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

Print an element from a sorted array

I wrote a program to sort an array and print the element where the unsorted array and sorted array matches. In theory, this should work, but it isn't. All the elements in the sorted array for some reason combine it all and output an array element of 1.
#!/bin/bash
arr=(6 2 15 90 9 1 4 30 1 3)
function sort(){
local array=($#) max=$(($# - 1))
while ((max > 0))
do
local i=0
while ((i < max)); do
if [ ${array[$i]} \> ${array[$((i + 1))]} ]
then
local t=${array[$i]}
array[$i]=${array[$((i + 1))]}
array[$((i + 1))]=$t
fi
((i++))
done
((max--))
done
echo ${array[#]}
}
arr_sort=($(sort ${arr[#]}))
for ((j=0; j<(( ${#arr[#]} -1 )); j++)); do
for ((k=0; k<(( ${#arr[#]} -1 )); k++)); do
if (( ${arr[j]:-0} == ${arr_sort[k]:-0} )); then
echo ${arr[j]}
break
fi
done
Try this:
#!/bin/bash
arr=(6 2 15 90 9 1 4 30 1 3)
arr_sort=( $(echo ${arr[#]} | tr ' ' '\n' | sort -n) )
for ((j=0; j<${#arr[#]}; j++)); do
if (( ${arr[i]} == ${arr_sort[i]} )); then
echo "Match ${arr[j]} at position $j (starting from 0)"
fi
done
Since there are no matches between the unsorted array
6 2 15 90 9 1 4 30 1 3
and the sorted one
1 1 2 3 4 6 9 15 30 90
in the example you gave, you will have no output.

array substitution throwing set -a error

I have following piece of code which is working fine if executed standalone -
I am facing a weird error, code is executed fine when executed standalone but throws error when embedded with another piece of code described below in the post
date=$1
set -A max_month 0 31 28 31 30 31 30 31 31 30 31 30 31
eval $(echo $date|sed 's!\(....\)\(..\)\(..\)!year=\1;month=\2;day=\3!')
(( year4=year%4 ))
(( year100=year%100 ))
(( year400=year%400 ))
if [ \( $year4 -eq 0 -a \
$year100 -ne 0 \) -o \
$year400 -eq 0 ]
then
set -A max_month 0 31 29 31 30 31 30 31 31 30 31 30 31
fi
day=$((day+1))
echo $day ${max_month[$month]}
if [ $day -gt ${max_month[$month]} ]
then
day=1
month=$((month+1))
if [ $month -gt 12 ]
then
year=$((year+1))
month=1
fi
fi
new_date=$(printf "%4.4d%2.2d%2.2d" $year $month $day)
echo $new_date
When I try to embed it into following code highlighted in red it throws error, obviously I replaced date=$1 to julian_date_14 -
#!/bin/bash
cd
unset project_env
cd /wload/baot/home/baotasa0/UKRB_UKBE/sandboxes/EXTRACTS/UK/RB/UKBA/ukrb_ukba_pbe_acq
. ab* . >> project_setup.log 2>&1
echo unset the environment is doNe
cd /wload/baot/app/data_abinitio/serial/uk_cust
param1=$1
param2=$2
param3=$3
email=$4
header_date_14=$(m_dump /wload/baot/app/data_abinitio/serial/uk_cust/ukrb_ukba_acnt_bde27_src.dml $param1 | head -35)
hdr_dt_14=$(echo "$header_date_14" | awk '$1=="bdfo_run_date" {print $2}')
julian_date_14=$(m_eval '(date("YYYYMMDD"))( unsigned integer(2)) '$hdr_dt_14'') 2>&1
header_date_15=$(m_dump /wload/baot/app/data_abinitio/serial/uk_cust/ukrb_ukba_acnt_bde27_src.dml $param2 | head -35)
hdr_dt_15=$(echo "$header_date_15" | awk '$1=="bdfo_run_date" {print $2}')
julian_date_15=$(m_eval '(date("YYYYMMDD"))( unsigned integer(2)) '$hdr_dt_15'')
header_date_16=$(m_dump /wload/baot/app/data_abinitio/serial/uk_cust/ukrb_ukba_acnt_bde27_src.dml $param3 | head -35)
hdr_dt_16=$(echo "$header_date_16" | awk '$1=="bdfo_run_date" {print $2}')
julian_date_16=$(m_eval '(date("YYYYMMDD"))( unsigned integer(2)) '$hdr_dt_16'')
echo $julian_date_16
if [ "$julian_date_14" = "$julian_date_15" -a "$julian_date_15" = "$julian_date_16" ]
then
echo all are same
else
echo check the file date please
fi
DATE=`echo $julian_date_14 | cut -c8-9`
Date_minus_1=`expr $DATE - 1`
DATE_1=`echo $julian_date_14 | cut -c2-7`
DATE_FINAL="$DATE_1$Date_minus_1"
echo $DATE_FINAL
Error is below -
./auto1.sh: line 70: set: -A: invalid option
set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
Any help will be greatly appreciated.
Thank you in advance!

zsh process file script

I'm attempting to have this process a number of files but I don't want it in a look so I don't have to monitor it.
#!/usr/local/bin/zsh
X=${1-20}
for (( N=1; N<=X; N++ )); do
for p in *.xml; do
curl -X POST -H "Content-Type:application/xml" -d "#${p}" "https://url /postAPI" > "post_${p}"
sleep 1
done
done
When doing ./work.sh 5 this loops forever!
What's causing infinate loop?
Edit Based on a comment below
/tmp/tmp.KeFYeM9Z % ls -l
total 4
-rwxr-xr-x 1 naes wheel 218 Nov 20 14:42 work.sh
#!/usr/local/bin/zsh
X=${1-20}
for (( N=1; N<=X; N++ )); do
for p in /tmp/tmp.u6RnKaJ3/*.xml; do
curl -X POST -H "Content-Type:application/xml" -d "#${p}" "https://url /postAPI"
sleep 1
done
done
This still continues the infinite loop
This doesn't:
% cat work1.sh
#!/usr/local/bin/zsh
X=${1-20}
for (( N=1; N<=X; N++ )); do
date
sleep 1
done
% ./work1.sh 5
Thu Nov 20 15:22:27 PST 2014
Thu Nov 20 15:22:28 PST 2014
Thu Nov 20 15:22:29 PST 2014
Thu Nov 20 15:22:30 PST 2014
Thu Nov 20 15:22:31 PST 2014
What in my loop causes the infinite?
You are writing to the same directory you are reading from. So while reading the xml files you are writing xml files making it essentially loop forever. Although it's not really an infinite loop, it's a very large one.
Let's say you have 10 files, in that case you'll have this result:
N=1: |p| = 100
N=2: |p| = 200
N=3: |p| = 400
N=4: |p| = 800
N=5: |p| = 1600
So... it groes quite fast.
This should do the trick:
#!/usr/local/bin/zsh
X=${1-20}
OUTPUT_DIR=/tmp/output/
mkdir -p $OUTPUT_DIR
cd /tmp/tmp.u6RnKaJ3
for (( N=1; N<=X; N++ )); do
echo "Attempt $N"
for p in *.xml; do
curl -X POST -H "Content-Type:application/xml" -d "#${p}" "https://url /postAPI" > "${OUTPUT_DIR}post_${p}"
sleep 1
done
done

shell programming: define array including zero-padded values

I just started using shell programming. I want to automatically change directories and then rename some files in there. Here's my problem: The name of the directories are numbered but directories < 10 are zero-padded (01 02...09). How can I define an array using some sort of sequencing without typing each directory name manually?
This is what I've tried so far:
array = (printf "%.2d " {1..8} {11..27} {29..32} {34..50}) ## should say 01 02 03 ..08 11..27 29..32 34..50
for i in "${array[#]}"
do
echo "dir_a/dir_b/sub$i/dir_c/"
done
However, it doesn't work and the result looks like: "subprintf", "sub%.2s", "sub1" etc.
Can you help me there?
In a next step I want to filter certain numbers in the array, e.g. 03, 09, 10, 28, 33 as these directories don't exist. Is there some easy solution to create such an array without concatenating 5 separate arrays?
Many thanks in advance,
Kati
Is there a need to use arrays? Otherwise, for bash 4, you can do
for i in {01..08} {11..27} {29..32} {34..50}; do
echo "dir_a/dir_b/sub${i}/dir_c/"
done
For an older version of bash you have to add the 0 yourself:
for i in 0{1..8} {11..27} {29..32} {34..50}; do
echo "dir_a/dir_b/sub${i}/dir_c/"
done
Of course, if you want to have an array, you can do
array=({01..08} {11..27} {29..32} {34..50})
or
array=(0{1..8} {11..27} {29..32} {34..50})
You could do this:
declare -a dirs=('01' '02' '03' '04' '05' '06' '07' '08')
echo ${dirs[#]}
01 02 03 04 05 06 07 08
# Make up next sequence
declare -a b=`seq 11 18`
echo ${b[#]}
11 12 13 14 15 16 17 18
# Add sequences together
dirs=("${dirs[#]}" ${b})
echo ${dirs[#]}
01 02 03 04 05 06 07 08 11 12 13 14 15 16 17 18
find [0-9][0-9] -type d | while read dirname
do
if [ $(echo "${dirname}" | sed -n '/01/p') ]
then
cd "${dirname}"
mv foo bar
cd ..
fi
done
Then you can just write another elif and sed check for every directory which contains files you want to rename. I know it's not what you asked for, but it is infinitely simpler. If you're allowed to, I'd also strongly recommend renaming that directory tree, as well.
#/bin/bash
raw=({01..08} {11..27} {29..32} {34..50})
filter=(03 09 10 28 33)
is_in() {
for e in "${#:2}"; do [[ "$e" == "$1" ]] && return 0; done
return 1
}
for i in ${raw[#]}; do
is_in $i ${filter[#]} || echo "dir_a/dir_b/sub$i/dir_c"
done
It'll take the numbers in the raw array and exclude every occurance of the ones in the filter array.

Resources