How to increment array dynamically with 'awk'? - arrays

I have file that contains list of IPs:
1.1.1.1
2.2.2.2
3.3.3.3
5.5.5.5
1.1.1.1
5.5.5.5
I want to create file that prints list of counters of above mentioned IPs like:
1.1.1.1: 2
2.2.2.2: 1
3.3.3.3: 1
5.5.5.5: 2
Where 2,1,1,2 are counters.
I started to write script that work for final count IPs and known count but don't know how to continue
./ff.sh file_with_IPs.txt
script
#!/bin/sh
file=$1
awk '
BEGIN {
for(x=0; x<4; ++x)
count[x] = 0;
ip[0] = "1.1.1.1";
ip[1] = "2.2.2.2";
ip[2] = "3.3.3.3";
ip[3] = "5.5.5.5";
}
{
if($1==ip[0]){
count[0] += 1;
} else if($1==ip[1]){
count[1] += 1;
}else if($1==ip[2]){
count[2] += 1;
}else if($1==ip[3]){
count[3] += 1;
}
}
END {
for(x=0; x<4; ++x) {
print ip[x] ": " count[x]
}
}
' $file > newfile.txt
The main problem that I don't know how many IPs stored in file and how they look like.
So I need to increment array ip each time when awk catch new IP.

I think it is quite easier with sort -u, but with awk this can do it:
awk '{a[$0]++; next}END {for (i in a) print i": "a[i]}' file_with_IPs.txt
Output:
1.1.1.1: 2
3.3.3.3: 1
5.5.5.5: 2
2.2.2.2: 1
(with a little help of this tutorial that sudo_O recommended me)

You can use uniq for that task, like:
sort IPFILE | uniq -c
(Note, that this prints the occurrences in front of the IP.)
Or with awk (if there are only IP addresses on the lines):
awk '{ips[$0]++} END { for (k in ips) { print k, ips[k] } }' IPFILE
(Note, this prints the IP addresses unordered, but you can do it with awk, read the docs, for asort, asorti, or simply append a sort after a pipe. )

Related

Parse string to array with empty elements in bash

folks.
Currently I'm trying to parse lsblk output into array for further analysis in bash. The only problem I have is the empty columns that cause read -a to shift the fallowing values into incorrect place of the elements that are suppose to be empty. Here is the example:
# lsblk -rno MOUNTPOINT,FSTYPE,TYPE,DISC-GRAN
/usr/portage ext2 loop 4K
disk 512B
part 512B
ext2 part 512B
/ ext4 part 512B
/mnt/extended ext4 part 512B
The MOUNTPOINT and FSTYPE of second and third lines are empty as well as MOUNTPOINT of forth line, but 'traditional' read like this:
while read -r LINE; do
IFS=' ' read -a ARRAY <<< $LINE
echo "${ARRAY[0]};${ARRAY[1]};${ARRAY[2]};${ARRAY[3]}"
done < <(lsblk -rno MOUNTPOINT,FSTYPE,TYPE,DISC-GRAN)
Will produce an incorrect result with shifted columns an 0'th and 1'st element filled instead of 2nd and 3rd:
/usr/portage;ext2;loop;4K
disk;512B;;
part;512B;;
ext2;part;512B;
/;ext4;part;512B
/mnt/extended;ext4;part;512B
So, I wonder if there is some 'elegant' solution of this case or should I just do it the old way like I used to with a help of string.h in trusty C?
To be more clear, I need this values as array elements in for loop for analyses, not just to print this. Echoing is done just as an example of misbehavior.
Why don't you specify -P option to lsblk to output in the form of key="value" pairs and read the output?
Then you can say something like:
while read -ra a; do
for ((i=0; i<${#a[#]}; i++)); do
a[i]=${a[i]#*=} # extract rvalue
a[i]=${a[i]//\"/} # remove surrounding quotes
done
(IFS=';'; echo "${a[*]}")
done < <(lsblk -Pno MOUNTPOINT,FSTYPE,TYPE,DISC-GRAN)
Use awk rather than bash for this. Use NF to assign the variables depending on how many fields are in the line.
awk -v 'OFS=;' 'NF == 4 { mountpoint = $1; fstype = $2; type = $3; gran = $4 }
NF == 3 { mountpoint = ""; fstype = $1; type = $2; gran = $3 }
NF == 2 { mountpoint = ""; fstype = ""; type = $1; gran = $2 }
{print mountpoint, fstype, type, gran}' < <(lsblk -rno MOUNTPOINT,FSTYPE,TYPE,DISC-GRAN)

awk array that overtypes itself when printed

this is my first question so please let me know if I miss anything.
This is an awk script that uses arrays to make key-value pairs.
I have a file that has a header information separated by colons. The data is below it and separated by colons as well. My goal is to make key-value pairs that print out to a new file. I have everything set to be placed in arrays and it prints out almost perfectly.
Here is the input:
...:iscsi_name:iscsi_alias:panel_name:enclosure_id:canister_id:enclosure_serial_number
...:iqn.1111-00.com.abc:2222.blah01.blah01node00::11BLAH00:::
Here is the code:
#!/bin/awk -f
BEGIN {
FS = ":"
}
{
x = 1
if (NR==1) {
num_fields = NF ###This is done incase there are uneven head fields to data fields###
while (x <= num_fields) {
head[x] = $x
x++
}
}
y = 2
while (y <= NR) {
if (NR==y) {
x = 1
while (x <= num_fields) {
data[x] = $x
x++
}
x = 1
while (x <= num_fields) {
print head[x]"="data[x]
x++
}
}
y++
}
}
END {
print "This is the end of the arrays and the beginning of the test"
print head[16]
print "I am head[16]-"head[16]"- and now I'm going to overwrite everything"
print "I am data[16]-"data[16]"- and I will not overwrite everything, also there isn't any data in data[16]"
}
Here is the output:
...
iscsi_name=iqn.1111-00.com.abc
iscsi_alias=2222.blah01.blah01node00
panel_name=
enclosure_id=11BLAH00
canister_id=
=nclosure_serial_number ### Here is my issue ###
This is the end of the arrays and the beginning of the test
enclosure_serial_number
- and now I'm going to overwrite everything
I am data[16]-- and I will not overwrite everything, also there isn't any data in data[16]
NOTE: data[16] is not at the end of a line, for some reason, there is an extra colon on the data lines, hence the num_fields note above
Why does head[16] overwrite itself? Is it that there is a newline (\n) at the end of the field? If so, how do I get rid of it? I have tried adding subtracting the last character, no luck. I have tried to limit the number of characters the array can take in on that field, no luck. I have tried many more ideas, no luck.
Full Disclosure: I am relatively new to all of this, I might have messed up these previous fixes!
Does anyone have any ideas as to why this is happening?
Thanks!
-cheezter88
your script is unnecessarily complex. If you want to adjust the record size with the first row, do it so.
(I replaced "..." prefix with "x")
awk -F: 'NR==1 {n=split($0,h); next} # populate header fields and record size
NR==2 {for(i=1;i<=n;i++) # do the assignment up to header size
print h[i]"="$i}' file
x=x
iscsi_name=iqn.1111-00.com.abc
iscsi_alias=2222.blah01.blah01node00
panel_name=
enclosure_id=11BLAH00
canister_id=
enclosure_serial_number=
if you want to do this for the rest of the records, remove the NR==2 condition,

How to keep track of printed items in a for loop?

I was recently dealing with a hash that I wanted to print in a nice manner.
To simplify, it is just n array with two fields a['name']="me", a['age']=77 and the data I want to print like key1:value1,key2:value2,... and end with a new line. That is:
name=me,age=77
Since it is not an array whose indices are autoincremented values, I do not know how to loop through them and know when I am processing the last one.
This is important because it allows to use a different separator on the case I am in the last one. Like this, a different character can be printed in this case (new line) instead of the one that is printed after the rest of the files (comma).
I ended up using a counter to compare to the length of the array:
awk 'BEGIN {a["name"]="me"; a["age"]=77;
n = length(a);
for (i in a) {
count++;
printf "%s=%s%s", i, a[i], (count<n?",":ORS)
}
}'
This works well. However, is there any other better way to handle this? I don't like the fact of adding an extra count++.
In general when you know the end point of the loop you put the OFS or ORS after each field:
for (i=1; i<=n; i++) {
printf "%s%s", $i, (i<n?OFS:ORS)
}
but if you don't then you put the OFS before the second and subsequent fields and print the ORS after the loop:
for (idx in array) {
printf "%s%s", (++i>1?OFS:""), array[idx]
}
print ""
I do like the:
n = length(array)
for (idx in array) {
printf "%s%s", array[idx], (++i<n?OFS:ORS)
}
idea to get the end of the loop too, but length(array) is gawk-specific and the resulting code isn't any more concise or efficient than the 2nd loop above:
$ cat tst.awk
BEGIN {
OFS = ","
array["name"] = "me"
array["age"] = 77
for (idx in array) {
printf "%s%s=%s", (++i>1?OFS:""), array[idx], idx
}
print ""
}
vs
$ cat tst.awk
BEGIN {
OFS = ","
array["name"] = "me"
array["age"] = 77
n = length(array) # or non-gawk: for (idx in array) n++
for (idx in array) {
printf "%s=%s%s", array[idx], idx, (++i<n?OFS:ORS)
}
}

How to get index no. of the element of an array which matches regex in bash?

I have a file like this:
c
a
b<
d
f
I need to get the index no. of letter which has < as suffix in a bash script. I thought of reading the file into an array then matching it with the regex .<$. But how do I get the index no. of that element which matches this regex?
I need the index no. because I want to modify this file to get the letter which is pointed to, move the < to the next line, and if it is at the last line, shuffle the order of the lines and place < after the first line.
you need awk '/<$/ { print NR; }' <your-file>
Grep could be used also:
grep -n \< infile
Then:
grep -n \< infile|cut -d : -f 1
So I build the source file,
$ cat file
c
a
b<
d
f<
with below awk, it will move the < to next line, but if it is last line, < will be moved to fist line.
awk '{ if (/</) a[NR]
sub(/</,"")
b[NR]=$0 }
END{ for (i in a)
{ if (i==NR) { b[1]=b[1] "<" }
else{ b[i+1]=b[i+1] "<"}
}
for (i=1;i<=NR;i++) print b[i]
}' file
c<
a
b
d<
f

How can I search for simple if statements in C source code?

I'd like to do a search for simple if statements in a collection of C source files.
These are statements of the form:
if (condition)
statement;
Any amount of white space or other sequences (e.g. "} else ") might appear on the same line before the if. Comments may appear between the "if (condition)" and "statement;".
I want to exclude compound statements of the form:
if (condition)
{
statement;
statement;
}
I've tried each of the following in awk:
awk '/if \(.*\)[^{]+;/ {print NR $0}' file.c # (A) No results
awk '/if \(.*\)[^{]+/ {print NR $0}' file.c # (B)
awk '/if \(.*\)/ {print NR $0}' file.c # (C)
(B) and (C) give different results. Both include items I'm looking for and items I want to exclude. Part of the problem, obviously, is how to deal with patterns that span multiple lines.
Edge cases (badly formed comments, odd indenting or curly braces in odd places, etc.) can be ignored.
How can I accomplish this?
Based on Al's answer, but with fixes for a couple of problems (plus I decided to check for simple else clauses, too (also, it prints the full if block):
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
#ifdef NEW
my $block = "";
#endif /* NEW */
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
$block = $block . $line_number . "+ " . $_;
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
$block = "";
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line;
print $block;
$block = "";
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/(if \(.*\)|[^#]else)(.*)/)
{
# Store the line contents
$if_line = $line_number . ": " . $_;
# If the end of the line has a semicolon, report it
if ($2 =~ ';')
{
print $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($2 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm not sure how you'd do this with a one liner (I'm sure you could by using sed's 'n' command to read the next line, but it would be very complicated), so you probably want to use a script for this. How about:
perl parse_if.pl file.c
Where parse_if.pl contains:
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line_number . ": " . $if_line;
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/^[^#]*\b(?:if|else|while) \(.*\)(.*)/)
{
# Store the line contents
$if_line = $_;
$if_line_number = $line_number;
# If the end of the line has a semicolon, report it
if ($1 =~ ';')
{
print $if_line_number . ": " . $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($1 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm sure you could do something fairly easily in any other language (including awk) if you wanted to; I just thought that I could do it quickest in perl by way of an example.
In awk, each line is treated as a record and "\n" is the record separator. As all the records are parsed line by line so you need to keep track of next line after if. I don't know how can you do this in awk..
In perl, you can do this easily as
open(INFO,"<file.c");
$flag=0;
while($line = <INFO>)
{
if($line =~ m/if\s*\(/ )
{
print $line;
$flag = 1;
}
else
{
print $line && $flag ;
$flag = 0 if($flag);
}
}
Using Awk you can do this by:
awk '
BEGIN { flag=0 }
{
if($0 ~ /if/) {
print $0;
flag=NR+1
}
if(flag==NR)
print $0
}' try.c

Resources