Shell - Looping Array with command and increment command values - arrays

var1=$(echo $getDate | awk '{print $1} {print $2}')
var2=$(echo $getDate | awk '{print $3} {print $4}')
var3=$(echo $getDate | awk '{print $5} {print $6}')
Instead of repeating like the code above, I need to:
loop the same command
increment the values ({print $1} {print $2})
store the value in an array
I was doing something like below but I am stuck maybe someone can help me please:
COMMAND=`find $locationA -type f | wc -l`
getDate=$(find $locationA -type f | xargs ls -lrt | awk '{print $6} {print $7}')
a=1
b=2
for i in $COMMAND
do
i=$(echo $getDate | awk '{print $a} {print $b}')
myarray+=('$i')
a=$((a+1))
b=$((b+1))
done
PS - using ksh
Problem: $COMMAND stores the number of files found in $locationA. I need to loop through the amount of files found and store their dates in an array.

I don't get the meaning of your example code (what is the 'for' loop supposed to do? What is the content of the variable COMMAND?), but in your question you ask to store something in an array, while in the code you wish to simplify, you don't use an array, but simple variables (var1, var2, ....).
If I understand your requirement correctly, your variable getDate contains a string of several words, which are separated by spaces, and you want to assign the first two words to var1, the following two words to var2, and so on. Is this correct?

Now the edited code is at least a bit clearer, though I still don't understand, why you use i as a loop variable, and overwrite it in the first statement inside the loop.
However, a few comments:
If you push '$i' into your array, you will get a literal '$' sign, followed by the letter 'i'. To add a variable i containing to numbers, you need double quotes ("$i").
I don't understand why you want to loop over the cotnent of the variable COMMAND. This variable will always hold a single number, which means that the loop will be executed exactly once.
You could use a counting loop, incrementing loop variable by 2 on each iteration. You would have to precalculate the number of iterations beforehand.
Perhaps an easier alternative, which would work in bash or in zsh (I did not try other shells) is to first turn your variable in an array,
tmparr=($(echo $getDate|fmt -w 1))
and then use a loop to collect pairs of this element:
myarray=()
for ((i=0; i<${#tmparr[*]}; i+=2))
do
myarray+=("${tmparr[$i]} ${tmparr[$((i+1))]}")
done
${myarray[0]} will hold a string consisting of the first to words from getDate, etc.

This one should work on zsh, at least with newer versions:
myarray=()
echo $g|fmt -w 1|paste -s -d " \n"|while read s; do myarray+=("$s"); done
This leaves the first pair in ${myarray[1]}, etc.
It doesn't work with bash (and old zsh versions), because these shells would execute the body of the loop in a subshell.
ADDED:
On a second thought, in zsh this one would be simpler:
myarray=("${(f)$(echo $g|fmt -w 1|paste -s -d ' \n')}")

Related

How to implement awk using loop variables for the row?

I have a file with n rows and 4 columns, and I want to read the content of the 2nd and 3rd columns, row by row. I made this
awk 'NR == 2 {print $2" "$3}' coords.txt
which works for the second row, for example. However, I'd like to include that code inside a loop, so I can go row by row of coords.txt, instead of NR == 2 I'd like to use something like NR == i while going over different values of i.
I'll try to be clearer. I don't want to wxtract the 2nd and 3rd columns of coords.txt. I want to use every element idependently. For example, I'd like to be able to implement the following code
for (i=1; i<=20; i+=1)
awk 'NR == i {print $2" "$3}' coords.txt > auxfile
func(auxfile)
end
where func represents anything I want to do with the value of the 2nd and 3rd columns of each row.
I'm using SPP, which is a mix between FORTRAN and C.
How could I do this? Thank you
It is of course inefficient to invoke awk 20 times. You'd want to push the logic into awk so you only need to parse the file once.
However, one method to pass a shell variable to awk is with the -v option:
for ((i=1; i<20; i+=2)) # for example
do
awk -v line="$i" 'NR == line {print $2, $3}' file
done
Here i is the shell variable, and line is the awk variable.
something like this should work, there is no shell loop needed.
awk 'BEGIN {f="aux.aux"}
NR<21 {close(f); print $2,$3 > f; system("./mycmd2 "f)}' file
will call the command with the temp filename for the first 20 lines, the file will be overwritten at each call. Of course, if your function takes arguments or input from stdin instead of file name there are easier solution.
Here ./mycmd2 is an executable which takes a filename as an argument. Not sure how you call your function but this is generic enough...
Note also that there is no error handling for the external calls.
the hideous system( ) only way in awk would be like
system("printf \047%s\\n\047 \047" $2 "\047 \047" $3 "\047 | func \047/dev/stdin\047; ");
if the func( ) OP mentioned can be directly called by GNU parallel, or xargs, and can take in values of $2 + $3 as its $1 $2, then OP can even make it all multi-threaded like
{mawk/mawk2/gawk} 'BEGIN { OFS=ORS="\0"; } { print $2, $3; } (NR==20) { exit }' file \
\
| { parallel -0 -N 2 -j 3 func | or | xargs -0 -n 2 -P 3 func }

Understanding IN Statement in awk

I have the problem to understand the in statement in bash... First of all heres the code:
#! /bin/bash
dns=()
while read line; do
up=$(nslookup $line | awk -F ': ' 'NR==6 {print $2} ')
dns+=($up)
done < dns.blacklist.txt.txt
awk '{if( $1 in dns ) print $1 " Blacklisted"; else print $1}' thttpd2.log
So thttp2.log is just a list of IPs, while nslookup is getting the IP of hostnames (for blacklist puposes). So now I want to check if the IP that connected in the log, was on the blacklist, in the code in the dns array.
All IPs and lookups from nslookup are good: Dns=81.169.145.82 192.0.3.45 and awk $1=81.169.145.82 . So how can I check in the awk statement at the lower part, if $1 is in dns?
I've been trying for half a day now... I am pretty sure I have not understood "in" so can someone please give me at least a tip?
PS: Current result is just:
81.169.145.82
81.169.145.82
81.169.145.82
192.0.3.45
Goal:
81.169.145.82 Blacklisted
81.169.145.82 Blacklisted
81.169.145.82 Blacklisted
192.0.3.45
It seems better to use the output of your while loop as awk input, there is no reason to use in the middle a bash array, awk prefers a stream than any bash variables.
So, you produce a stream of ips reading your blacklist.txt file and parsing the nslookup output. I see that part as a black box in my answer, I assume you get good results and want to run your logic with the other file. Also it is not efficient to run one nslookup and one awk per line, in case of a large input, but I don't know what you do in that part, I leave it as is.
while read -r line; do
nslookup "$line" | awk -F ': ' 'NR==6 {print $2}'
done < blacklist.txt | awk 'FNR==NR {dns[$0]; next}
{print ($1 in dns)? $1 " Blacklisted": $1}' - thttpd2.log
You could also give directly the blacklist file to awk, and inside awk to have a call to the external bash command you use. But I think it is simpler like this.
So how can I check in the awk statement at the lower part, if $1 is in dns?
awk is not shell and shell is not awk. Shell variable is unrelated to any awk variable and awk variables are unrelated to shell. awk is a separate program with separate syntax unrelated to shell and shell is a separate program with it's own syntax unrelated to awk.
The construct subscript in array is part of awk syntax to check if in awk the subscript subscript is one of subscripts inside awk array array. It's unrelated to shell variables and bash arrays. Note that subscript is not value of the element, it's the index. "array[subscript]=value"
Understanding IN - Statement in Linux bash
The in in bash shell is used only as a keyword in case statement:
case something in
pattern) ;;
esac
It's usage is unrelated to awk usage, because shell is not awk.
please give me at least a tip?
First read the input into awk as subscripts of array dns. After that, you may use the awk construct something in dns to check if something is a subscript of an array.
You already got answers explaining what in means but also - since nslookup can read a list of domain names from stdin:
$ cat dns.blacklist.txt.txt
google.com
yahoo.com
$ nslookup < dns.blacklist.txt.txt
Default Server: cdns01.foo.net
Address: 2222:555:beef::1
> Server: cdns01.foo.net
Address: 2222:555:beef::1
Non-authoritative answer:
Name: google.com
Addresses: 2607:f8b0:4009:804::200e
172.217.9.78
> Server: cdns01.foo.net
Address: 2222:555:beef::1
Non-authoritative answer:
Name: yahoo.com
Addresses: 2001:4998:44:3507::8000
2001:4998:124:1507::f001
2001:4998:124:1507::f000
2001:4998:44:3507::8001
2001:4998:24:120d::1:1
2001:4998:24:120d::1:0
98.137.11.163
74.6.143.25
74.6.231.21
98.137.11.164
74.6.143.26
74.6.231.20
you don't need to wrap anything in a shell loop, e.g. (untested):
nslookup < dns.blacklist.txt.txt |
awk '
NR==FNR {
if ( sub(/^Addresses:/,"") ) { inAddrs=1 }
if ( inAddrs ) {
if ( NF ) { dns[$1] }
else { inAddrs=0 }
}
next
}
{ print $1, ($1 in dns ? "Blacklisted" : "" }
' - thttpd2.log
Note that nslookup can output a list of IP addresses for a given domain, not just 1 as your existing script expects, and the above script will accommodate that.

Sed: Better way to address the n-th line where n are elements of an array

We know that the sed command loops over each line of a file and for each line, it loops over the given commands list and does something. But when the file is extremely large, the time and resource cost on the repeating operation may be terrible.
Suppose that I have an array of line numbers which I want to use as addresses to delete or print with sed command (e.g. A=(20000 30000 50000 90000)) and there is a VERY LARGE object file.
The easiest way may be:
(Remark by #John1024, careful about the line number changes for each loop)
( for NL in ${A[#]}; do sed "$NL d" $very_large_file; done; )>.temp_file;
cp .temp_file $very_large_file; rm .temp_file
The problem of the code above is that, for each indexed line number of the array, it needs to loop over the whole file.
To avoid this, one can:
#COMM=`echo "${A[#]}" | sed 's/\s/d;/g;s/$/d'`;
#sed -i "$COMM" $very_large_file;
#Edited: Better with direct parameter expansion:
sed -i "${A[#]/%/d;}" $very_large_file;
It first print the array and replace its SPACE and the END_OF_LINE with the d command of sed, so that the string looks like "20000d;30000d;50000d;90000d"; on the second line, we treat this string as the command list of sed. The result is that with this code, it only loops over the file for once.
More over, for in-place operation (argument -i), one cannot quit using q with sed even though the greatest line number of interest has passed, because if so, the lines after the that line (e.g. 90001+) will disappear (It seems that the in-place operation is just overwriting the file with stdout).
Better ideas?
(Reply to #user unknown:) I think it could be even more efficient if we manage to "quit" the loop once all indexed lines have passed. We can't, using sed -i, for the aforementioned reasons. Printing each line to a file cost more time than copying a file (e.g. cat file1 > file2 and cp file1 file2). We may benefit from this concept, using any other methods or tools. This is what I expect.
PS: The points of this question are "Lines location" and "Efficiency"; the "delete lines" operation is just an example. For real tasks, there are much more - append/insert/substituting, field separating, cases judgement followed by read from/write to files, calculations etc.
In order words, it may invoke all kind of operations, creating sub-shells or not, caring about the variable passing, ... so, the tools to use should allow me to line processing, and the problem is how to get myself onto the lines of interest, doing all kinds operations.
Any comments are appreciated.
First make a copy to a testfile for checking the results.
You want to sort the linenumbers, highest first.
echo "${a[#]}" | sed 's/\s/\n/g' | sort -rn
You can feed commands into ed using printf:
printf "%s\n" "command1" "command2" w q testfile | ed -s testfile
Combine these
printf "%s\n" $(echo "${a[#]}" | sed 's/\s/\n/g' | sort -rn | sed 's/$/d/') w q |
ed -s testfile
Edit (tx #Ed_Morton):
This can be written in less steps with
printf "%s\n" $(printf '%sd\n' "${a[#]}" | sort -rn ) w q | ed -s testfile
I can not remove the sort, because each delete instruction is counting the linenumbers from 1.
I tried to find a command for editing the file without redirecting to another, but I started with the remark that you should make a copy. I have no choice, I have to upvote the straight forward awk solution that doesn't need a sort.
sed is for doing s/old/new, that is all, and when you add a shell loop to the mix you've really gone off the rails (see https://unix.stackexchange.com/q/169716/133219). To delete lines whose numbers are stored in an array is (using seq to generate input since no sample input/output provided in the question):
$ a=( 3 7 8 )
$ seq 10 |
awk -v a="${a[*]}" 'BEGIN{split(a,tmp); for (i in tmp) nrs[tmp[i]]} !(NR in nrs)'
1
2
4
5
6
9
10
and if you wanted to stop processing with awk once the last target line has been deleted and let tail finish the job then you could figure out the max value in the array up front and then do awk on just the part up to that last target line:
max=$( printf '%s\n' "${a[#]}" | sort -rn | head -1 )
head -"$max" file | awk '...' file > out
tail +"$((max+1))" file >> out
idk if that'd really be any faster than just letting awk process the whole file since awk is very efficient, especially when you're not referencing any fields and so it doesn't do any field splitting, but you could give it a try.
You could generate an intermediate sed command file from your lines.
echo ${A[#]} | sort -n > lines_to_delete
min=`head -1` lines_to_delete
max=`head -1` lines_to_delete
# skip to first and from last line, delete the others
sed -i -e 1d -e ${linecount}d -e 's#$#d#' lines_to_delete
head -${min} input > output
sed -f lines_to_delete input >> output
tail -${max} input >> output
mv output input

Spaces in array content getting broken with grep

I am using array to tackle with spaces in line of my file. But when i am using grep to filter with value of array it is breaking because of spaces.
For example my line is as per below
bbbh.cone.abc.com:/home 'bbbh.cone.abc.com
As it has spaces i am using array as per below.
object1=$(echo "$line" | awk '{print $1}' )
object2=$(echo "$line" | awk '{print $2}' )
object3=$(echo "$line" | awk '{print $3}' )
object4=$(echo "$line" | awk '{print $4}' )
hiteshcharry=("$object1" "$object2" "$object3" "$object4")
grep "${hiteshcharry[#]}" <filename>
It give me error because of spaces.
Below is the example.
I have below line in my file.
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
So i have 2 spaces in my above line. I have written my script in such way so that it can handle a line with maximum 4 spaces.
When i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
it give me error because of spaces. However when i print the value of array it show me correct value.
Example : -
one of line from my file is as below( it has 2 spaces)
st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space 'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'
I am putting this value in my array named as hiteshcharry. when i am running below command
omnidb -session "$sessionid" -detail | grep "${hiteshcharry[#]}"
It is giving me error because of spaces in value of array. In output it should filter the line having value equal to array named hiteshcharry.
I hope this is clear now.
Output of omnidb command is in picture. So i want to grep the lines having
"st.cone.abc.com:/platform/sun4v/lib/sparcv9/libc_psr.so.1 space
'st.cone.abc.com space [/platform/sun4v/lib/sparcv9/libc_psr.so.1]'" from
output of omnidb command which is in picture
enter image description here
Thanks. i have added declare -p hiteshcharry and it start printing the each elements of array. But i am error shown in picture .
enter image description here
When you pass your array to grep through "${array[#]}", grep will see each array element as a separate argument. So, the first element would become the pattern to search for, and the second element onwards would become the file names to be searched on. Obviously, that's not what you want.
You can use process substitution to make grep match the strings contained in your array, like this:
omnidb -session "$sessionid" -detail | grep -Fxf <(printf '%s\n' "${hiteshcharry[#]}")
printf will print your array elements one line per element
grep -Fxf treats the about output as a file containing strings to be searched (-F option treats them as strings, not patterns, -x matches the whole line of omnidb output, preventing any partial matches)

Assigning command's output to shell variable and get the variables size

I have a file consisting of digits. Usually, each line contains one single number. I would like to count the number of lines in the file that begin with digit '0'. If it's the case, then I would like to do some post-processing.
Although I'm able to retrieve correctly the corresponding line numbers, the total number of retrieved lines is not correct. Below, I'm posting the code that I'm using.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
# linesToRemove=$(grep -n "^0" ${inputFile} | cut -d":" -f1);
linesNr=${#linesToRemove} # <- here, the error
# linesNr=${#linesToRemove[#]} # <- here, the error
if [ "${linesNr}" -gt "0" ]; then
# do something here, e.g. remove corresponding lines.
awk -v n=$linesToRemove 'NR == n {next} {print}' ${anotherFile} > ${outputFile}
fi
Also, as for the awk-based command, how could I use a shell-variable? I tried the command below, but it's not working correctly, since 'myIndex' is interpreted as a text and not as a variable.
linesToRemove=$(awk -v myIndex="$myIndex" '/^myIndex/ { print NR;}' ${inputFile});
Given the line numbers starting with 0 found in ${inputFile}, I would like to remove the corresponding lines numbers from ${anotherFile}. An example for both ${inputFile} and ${anotherFile} is given below:
// ${inputFile}
0
1
3
0
// ${anotherFile}
2.617300e+01 5.886700e+01 -1.894697e-01 1.251225e+02
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
2.177940e+01 1.249531e+02 1.538853e-01 1.527150e+02
// ${outputFile}
5.707397e+01 2.214040e+02 8.607959e-02 1.229114e+02
1.725900e+01 1.734360e+02 -1.298053e-01 1.250318e+02
In the example above, I need to delete lines 0 and 3 from ${anotherFile}, given that those lines correspond to the lines starting with 0 in ${inputFile}.
If you want to count the number of lines in the file that begins with 0, then this line is wrong.
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
The above says to print the line number when the line start with 0, and your linesToRemove variable will contain all the line numbers, not the total number of lines. Use END{} block to capture the total. eg
linesToRemove=$(awk '/^0/ {c++}END{print c}' ${inputFile});
As for your 2nd question on using variable inside awk, use the regex operator ~. And then set your myIndex variable to include the ^ anchor
linesToRemove=$(awk -v myIndex="^$myIndex" '$0 ~ myIndex{ print NR;}' ${inputFile});
finally, if you just want to remove those lines that start with 0, then just simply remove it
awk '/^0/{next}{print $0>FILENAME}' file
If you want to remove lines from another file using what is captured in input file, here's one way
paste -d"|" inputfile anotherfile | awk '!/^0/{gsub(/^.*\|/,"");print}'
Or just one awk command
awk 'FNR==NR && /^0/{a[FNR]} NR>FNR && (!(FNR in a))' inputfile anotherfile
crude explanation: FNR==NR && /^0/ means process the first file whole line starts with 0 and put its line number into array a. NR>FNR means process the next file and if line number not in array, print the line. See the gawk documentation for what FNR,NR etc means
I think you have to do the following to assign an array:
linesToRemove=( $(awk '/^0/ { print NR; }' ${inputFile}) )
And to get the number of elements do (as you have in a commented line):
linesNr=${#linesToRemove[#]}
To remove the lines from from the file you could do something like:
sedCmd=""
for lineNr in ${linesToRemove[#]}; do
sedCmd="$sedCmd;${lineNr}d"
done
sed "$sedCmd" ${anotherFile} > ${outputFile}
In general if you do this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
instead of this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${#linesToRemove}
use this:
linesToRemove=$(awk '/^0/ { print NR; }' ${inputFile});
linesNr=${echo $linesToRemove|awk '{print NF}'}
POC :
cat temp.sh
#!/usr/bin/ksh
lines=$(awk '/^d/{print NR}' script.sh)
nooflines=$(echo $lines|awk '{print NF}')
echo $nooflines
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay> temp.sh
8
torinoco!DBL:/oo_dgfqausr/test/dfqwrk12/vijay>
It greatly depends on the post-processing you are doing, but do you really need the actual count? Why not do something like this:
if grep ^0 $inputfile > /dev/null; then
# There is at least one line with a leading 0
:
fi
grep -v ^0 $inputfile | process-lines-without-leading-zero
grep ^0 $inputfile | process-lines-with-leading-zero
Or, even just:
if grep ^0 $inputfile | process-lines-with-leading-zero; then
# some post processing
:
fi
--EDIT--
Based on what you've said in your comment, I would recommend a different approach. If I understand you correctly, you want to read file a, looking for lines of the form ^0[0-9]*,
and then remove those line numbers from file b. Doing it one line at a time is pretty slow if the files get big. Just do:
cmd=$( grep '^0[0-9]*$' a | sed 's/$/d;/g' )
sed "$cmd" b
The assignment to cmd forms a sed command to delete the lines. Invoking sed on b will omit those lines. You'll need to redirect the sed output appropriately (perhaps to a temp file and then back to b, or just use 'sed -i' if you're using gnu sed.)
Given the large number of edits to this question, it seems easiest to start a new answer. Your problem can be solved with a simple one-liner:
$ sed "$( grep -n ^0 $inputFile | sed 's/:.*/d;/g' )" $anotherFile > $outputFile

Resources