Sorting Awk array by last name

Sorting Awk array by last name - arrays

In my script, I start with a file of campaign contributors and anyone who donates a collective $500 is eligible for a contest. Anyone who meets that criteria I add to an array with an incrementing index to adjust the size as needed. Each index is formatted as outlined below, with the X's being a phone number. In the END portion of the script, I need to sort this array by last name($2) for printing. I've done some searching but come up empty handed. I'm not asking for someone to type the script for me, merely to point me in a better direction of search or offer advice. I need help sorting the array contestants as currently it will be filled properly with the string values the way I need them for the assignment.
Where v1,2, & 3 are the campaign contributions, I am using -F'[ :]' in my command to get both spaces and colons as field separators.
Input File lab4.data
Fname Lname:Phone__Number:v1:v2:v3
Mike Harrington:(510) 548-1278:250:100:175
Christian Dobbins:(408) 538-2358:155:90:201
Susan Dalsass:(206) 654-6279:250:60:50
Archie McNichol:(206) 548-1348:250:100:175
Jody Savage:(206) 548-1278:15:188:150
Guy Quigley:(916) 343-6410:250:100:175
Dan Savage:(406) 298-7744:450:300:275
Nancy McNeil:(206) 548-1278:250:80:75
John Goldenrod:(916) 348-4278:250:100:175
Chet Main:(510) 548-5258:50:95:135
Tom Savage:(408) 926-3456:250:168:200
Elizabeth Stachelin:(916) 440-1763:175:75:300
Array to hold anyone > $500, $8 is created and holds the value $5+$6+$7:
the array is initialized and filled in for loop given below
$8 = $5+$6+$7;
contestants[len++]
Loop to check add people to contestant array.
name and number are arrays that hold their respective values for later use.
for(i=0;i<=NR;i++)if(contrib[i]>500){contestants[len++]= name[i]" "number[i] }
Formatting of indexes(desired array values for contestant[len++]):
[0] Mike Harrington (510) 548-1278
[1] Archie McNichol (206) 548-1348
[2] Guy Quigley (916) 343-6410
[3] Dan Savage (406) 298-7744
[4] John Goldenrod (916) 348-4278
[5] Tom Savage (408) 926-3456
[6] Elizabeth Stachelin (916) 440-1763
Loop to print/check that array has been correctly filled(it is)
for (i=0; i <len; i++) {print contestants[i]}
Output:
Mike Harrington (510) 548-1278
Archie McNichol (206) 548-1348
Guy Quigley (916) 343-6410
Dan Savage (406) 298-7744
John Goldenrod (916) 348-4278
Tom Savage (408) 926-3456
Elizabeth Stachelin (916) 440-1763
Desired Final Output: Ignore formatting as it correctly displays in my terminal I just hard a hard time getting it all nice in here.
***FIRST QUARTERLY REPORT***
***CAMPAIGN 2004 CONTRIBUTIONS***
Name Phone Jan | Feb | Mar | Total Donated
Mike Harrington (510)548-1278 $ 250 $ 100 $ 175 $ 525
Christian Dobbins (408)538-2358 $ 155 $ 90 $ 201 $ 446
Susan Dalsass (206)654-6279 $ 250 $ 60 $ 50 $ 360
Archie McNichol (206)548-1348 $ 250 $ 100 $ 175 $ 525
Jody Savage (206)548-1278 $ 15 $ 188 $ 150 $ 353
Guy Quigley (916)343-6410 $ 250 $ 100 $ 175 $ 525
Dan Savage (406)298-7744 $ 450 $ 300 $ 275 $ 1025
Nancy McNeil (206)548-1278 $ 250 $ 80 $ 75 $ 405
John Goldenrod (916)348-4278 $ 250 $ 100 $ 175 $ 525
Chet Main (510)548-5258 $ 50 $ 95 $ 135 $ 280
Tom Savage (408)926-3456 $ 250 $ 168 $ 200 $ 618
Elizabeth Stachelin (916)440-1763 $ 175 $ 75 $ 300 $ 550
-----------------------------------------------------------------------------
SUMMARY
-----------------------------------------------------------------------------
The campaign received a total of $6137.00 for this quarter.
The average donation for the 12 contributors was $511.42.
The highest total contribution was $1025.00 made by Dan Savage.
***Thank you Dan Savage***
The following people donated over $500 to the campaign.
They are eligible for the quarterly drawing!!
Listed are their names(sorted by last names) and phone numbers.
John Goldenrod (916) 348-4278
Mike Harrington (510) 548-1278
Archie McNichol (206) 548-1348
Guy Quigley (916) 343-6410
Dan Savage (406) 298-7744
Tom Savage (408) 926-3456
Elizabeth Stachelin (916) 440-1763
Thank you all for your continued support!!

Using gawk, this is straightforward to do with the in-built sort functions, e.g.
BEGIN {
data["Jane Doe (123) 456-7890"] = 600;
data["Fred Adams (123) 456-7891"] = 800;
data["John Smith (123) 456-7892"] = 900;
exit;
}
END {
for (i in data) {
split(i,x," ")
data1[x[2] " " x[1] " " x[3] " " x[4]] = i;
}
asorti(data1,sdata1);
for (i in sdata1) {
print data1[sdata1[i]],"\t",data[data1[sdata1[i]]];
}
}
... which produces:
Fred Adams (123) 456-7891 800
Jane Doe (123) 456-7890 600
John Smith (123) 456-7892 900
In plain awk, the same result can be achieved by writing the array indices to a file, sorting that file and then reading the file back using getline.

The way to approach this is to produce the pre-SUMMARY output as you read the data so you don't need to store all of your data in an array, just the people who contributed more than $500 and just insert them into the array in the desired order using an insertion sort algorithm.
You would do it something like this:
awk -F':' '
NR==1 {
print "header stuff"
next
}
{
tot = $3 + $4 + $5
printf "%-20s%10s $%5s $%5s $%5s $%5s\n", $1, $2, $3, $4, $5, tot
}
tot > 500 {
split($1,name,/ /)
surname = name[2]
numContribs++
# insertion sort, check the algorithm:
for (i=1; i<=numContribs; i++) {
if (surname > surnames[i]) {
for (j=numContribs; j>i; j--) {
surnames[j+1] = surnames[j]
contribs[j+1] = contribs[j]
}
surnames[i] = surname
contribs[i] = $1 " " $2
break
}
}
}
END {
print "SUMMARY and text below it and then the list of $500+ contributors:"
for (i=1; i<=numContribs; i++) {
print contribs[i]
}
}
' lab4.data
The above is not a fully functional program. It's just intended to show you the right approach per your request.

Related

How to use AWK to print associative array in a loop correctly?

Beth 45 0
Danny 33 0
Thomas 22 40
Mark 65 100
Mary 29 121
Susie 39 76.5
Joey 51 189.52
Peter 23 78.26
Maximus 34 289.71
Rebecca 21 45.79
Sophie 26 28.44
Barbara 24 107.36
Elizabeth 35 105.69
Peach 40 102.69
Lily 41 123
The above is a data file which has three fields: name, age, salary.
I want to print average salary, number, and names for people aged above 30 and under 30.
In this exercise, I want to practise using strings as subscripts.
Here is my AWK code:
BEGIN { OFS = "\t\t" }
{
if ($2 < 30)
{
a = "age below 30";
salary[a] += $NF;
count[a]++;
name[a] = name[a] $1 "\t";
}
else
{
a = "age equals or above 30";
salary[a] += $NF;
count[a]++;
name[a] = name[a] $1 "\t";
}
}
END {
for (a in salary)
for (a in count)
for (a in name)
{
print "The average salary of " a " is " salary[a] / count[a];
print "There are " count[a] " people " a ;
print "Their names are " name[a];
print "********************************************************";
}
}
The following is the output:
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The output is very difficult for me to understand.
What I anticipated should look like this:
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age equals or above 30 is 109.679
There are 6 people age below 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age equals or above 30 is 109.679
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age below 30 is 70.1417
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 9 people age equals or above 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age below 30 is 70.1417
There are 6 people age below 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
So my first question is : Where did I understand wrong?
And my second question is :
I actually don't need so many loops. I just need
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Thomas Mary Peter Rebecca Barbara Sophie
********************************************************
for (a in salary, count, names) doesn't work. Is there a better way ?

for (x in salary)
for (y in count)
for (z in name)
print "foo"
says for every index in salary, loop through every index in count and while doing so, for every index in count loop through every index in name and print "foo" each time. So if salary, count, and name each had 3 entries then you'd print "foo" 3*3*3 = 9 times.
It gets more complicated than that in your code though because you're using the same variable to hold the index value of each array at every level of the nested loop:
for (a in salary)
for (a in count)
for (a in name)
so I'm not sure what awk is going to do with that - it may even be undefined behavior.
Since all 3 arrays have the same indices, just pick one of the arrays and loop on it's indices and then you can access all 3 arrays using that same index.
$ cat tst.awk
{
bracket = "age " ($2 < 30 ? "under" : "equals or above") " 30"
names[bracket] = (bracket in names ? names[bracket] "\t" : "") $1
count[bracket]++
salary[bracket] += $NF
}
END {
for (bracket in names) {
print "The average salary of", bracket, "is", salary[bracket] / count[bracket]
print "There are", count[bracket], "people", bracket
print "Their names are", names[bracket]
print "********************************************************"
}
}
$ awk -f tst.awk file
The average salary of age equals or above 30 is 109.679
There are 9 people age equals or above 30
Their names are Beth Danny Mark Susie Joey Maximus Elizabeth Peach Lily
********************************************************
The average salary of age under 30 is 70.1417
There are 6 people age under 30
Their names are Thomas Mary Peter Rebecca Sophie Barbara
********************************************************

Eliminate duplicate columns based on separate field using awk array?

I am trying to eliminate a set of duplicate rows based on a separate field.
cat file.txt
1 345 a blue
1 345 b blue
3 452 c blue
3 342 d green
3 342 e green
1 345 f green
I would like to remove duplicates rows based on field 1 and 2, but separately for each colour. Desired output:
1 345 a blue
3 452 c blue
3 342 d green
1 345 f green
I can achieve this output using a for loop that iterates over the colours:
for i in $(awk '{ print $4 }' file.txt | sort -u); do
grep -w ${i} |
awk '!x[$1,$2]++' >> output.txt
done
But this is slow. Is there any way to get this output without use of a loop?
Thank you.

At least for the example, it is simple as:
$ awk 'arr[$1,$2,$4]++{next} 1' file
1 345 a blue
3 452 c blue
3 342 d green
1 345 f green
Or, you can negate that:
$ awk '!arr[$1,$2,$4]++' file
You can also use GNU sort for the same which may be faster:
$ sort -k4,4 -k2,2 -k1,1 -u file

Could you please try this too:
awk '!a[$1,$2,$4]++' Input_file

copy set of every 3 colums to a new file in unix

I have a file with 30 columns ( repeated ID Name and Place) and I need to extract 3 columns at time and put them into a new file every-time :
ID Name Place ID Name Place ID Name Place ID Name place ...
19 john NY 23 Key NY 22 Tom Ny 24 Jeff NY....
20 Jen NY 22 Jill NY 22 Ki LA 34 Jack Roh ....
So I will have 10 files like these -
Output1.txt
ID Name Place
19 john NY
20 Jen NY
Output2.txt
ID Name Place
23 Key NY
22 Jill NY
and 8 more files like these. I can print columns like
awk '{print $1,$2,$3}' Input.txt > Output1.txt but it may be too cumbersome for 10 files. Is there anyway I can make it faster?
Thanks!

$ awk '{for (i=1;i<=NF;i+=3) {print $i,$(i+1),$(i+2) > ("output" ((i+2)/3) ".txt")}}' file.txt
# output1.txt
ID Name Place
19 john NY
20 Jen NY
# output2.txt
ID Name Place
23 Key NY
22 Jill NY
# output3.txt
ID Name Place
22 Tom Ny
22 Ki LA
# output4.txt
ID Name place
24 Jeff NY
34 Jack Roh

Tweaking a bit from this wonderful Ed Morton's answer,
awk -v d=3 '{sfx=0; for(i=1;i<=NF;i+=d) {str=fs=""; for(j=i;j<i+d;j++) \
{str = str fs $j; fs=" "}; print str > ("output_file_" ++sfx)} }' file
will do the split-up of files as you requested.
Remember the awk variable d defines the number of columns to split-upon which is 3 in your case.

$ awk '{for(i=0;i<=NF/3-1;i++) print $(i*3+1), $(i*3+2), $(i*3+3)>i+1".txt"}' file
$ cat 1.txt
ID Name Place
19 john NY
20 Jen NY

How to find and extract all words appearing between brackets?

How to put into array all words appearing between brackets in text file and replace it with random one from that array?
cat math.txt
First: {736|172|201|109} {+|-|*|%|/} {21|62|9|1|0}
Second: John had {22|12|15} apples and lost {2|4|3}
I need output like:
First: 172-9
Second: John had 15 apples and lost 4

This is trivial in awk:
$ cat tst.awk
BEGIN{ srand() }
{
for (i=1; i<=NF; i++) {
if ( match($i,/{[^}]+}/) ) {
n = split(substr($i,RSTART+1,RLENGTH-2),arr,/\|/)
idx = int(rand() * (n-1)) + 1
$i = arr[idx]
}
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
$ awk -f tst.awk file
First: 172 - 9
Second: John had 22 apples and lost 2
$ awk -f tst.awk file
First: 201 - 9
Second: John had 12 apples and lost 2
$ awk -f tst.awk file
First: 201 + 62
Second: John had 12 apples and lost 2
$ awk -f tst.awk file
First: 201 + 1
Second: John had 12 apples and lost 4
Just check the math on the line where idx is set - I think it's right but didn't put much thought into it.

$ awk 'BEGIN{srand()} {for (i=1;i<=NF;i++) {if (substr($i,1,1)=="{") {split(substr($i,2,length($i)-2),a,"|"); j=1+int(rand()*length(a)); $i=a[j]}}; print}' math.txt
First: 172 + 1
Second: John had 12 apples and lost 3
How it works
BEGIN{srand()}
This initializes the random number generator.
for (i=1;i<=NF;i++) {if (substr($i,1,1)=="{") {split(substr($i,2,length($i)-2),a,"|"); j=1+int(rand()*length(a)); $i=a[j]}
This loops through each field. If any field starts with {, then substr is used to remove the first and last characters of the field and the remainder is split with | as the divider into array a. Then, a random index j into array a is chosen. Lastly, the field is replaced with a[j].
print
The line, as revised above, is printed.
The same code as above, but reformatted over multiple lines, is:
awk 'BEGIN{srand()}
{
for (i=1;i<=NF;i++) {
if (substr($i,1,1)=="{") {
split(substr($i,2,length($i)-2),a,"|")
j=1+int(rand()*length(a))
$i=a[j]
}
}
print
}' math.txt
Revised Problem with Spaces
Suppose that match.txt now looks like:
$ cat math.txt
First: {736|172|201|109} {+|-|*|%|/} {21|62|9|1|0}
Second: John had {22|12|15} apples and lost {2|4|3}
Third: John had {22 22|12 12|15 15} apples and lost {2 2|4 4|3 3}
The last line has spaces inside the {...}. This changes how awk divides up the fields. For this situation, we can use:
$ awk -F'[{}]' 'BEGIN{srand()} {for (i=2;i<=NF;i+=2) {n=split($i,a,"|"); j=1+int(n*rand()); $i=a[j]}; print}' math.txt
First: 736 + 62
Second: John had 12 apples and lost 3
Third: John had 15 15 apples and lost 2 2
How it works:
-F'[{}]'
This tells awk to use either } or { as field separators.
BEGIN{srand()}
This initializes the random number generator
{for (i=2;i<=NF;i+=2) {n=split($i,a,"|"); j=1+int(n*rand()); $i=a[j]}
With our new definition for the field separator, the even numbered fields are the ones inside braces. Thus, we split these fields on | and randomly select one piece and assign the field to that piece: $i=a[j].
print
Having modified the line as above, we now print it.

You can try this awk:
awk -F'[{}]' 'function rand2(n) {
srand();
return 1 + int(rand() * n);
}
{
for (i=1; i<=NF; i++)
if (i%2)
printf $i OFS;
else {
split($i, arr, "|");
printf arr[rand2(length(arr))]
};
printf ORS
}' math.txt
First: 736 - 62
Second: John had 22 apples and lost 2
More runs of above may produce:
First: 172 * 9
Second: John had 12 apples and lost 4
First: 109 / 0
Second: John had 15 apples and lost 3
First: 201 % 1
Second: John had 12 apples and lost 3
...
...

how to sort a database

I have a database that i now combined using this function
def ReadAndMerge():
library1=input("Enter 1st filename to read and merge:")
with open(library1, 'r') as library1names:
library1contents = library1names.read()
library2=input("Enter 2nd filename to read and merge:")
with open(library2, 'r') as library2names:
library2contents = library2names.read()
print(library1contents)
print(library2contents)
combined_contents = library1contents + library2contents # concatenate text
print(combined_contents)
return(combined_contents)
The two databases originally looked like this
Bud Abbott 51 92.3
Mary Boyd 52 91.4
Hillary Clinton 50 82.1
and this
Don Adams 51 90.4
Jill Carney 53 76.3
Randy Newman 50 41.2
After being combined they now look like this
Bud Abbott 51 92.3
Mary Boyd 52 91.4
Hillary Clinton 50 82.1
Don Adams 51 90.4
Jill Carney 53 76.3
Randy Newman 50 41.2
if i wanted to sort this database by last names how would i go about doing that?
is there a sort function built in to python like lists? is this considered a list?
or would i have to use another function that locates the last name then orders them alphabetically

You sort with the sorted() method. But you can't sort just a big string, you need to have the data in a list or something similar. Something like this (untested):
def get_library_names(): # Better name of function
library1 = input("Enter 1st filename to read and merge:")
with open(library1, 'r') as library1names:
library1contents = library1names.readlines()
library2=input("Enter 2nd filename to read and merge:")
with open(library2, 'r') as library2names:
library2contents = library2names.readlines()
print(library1contents)
print(library2contents)
combined_contents = sorted(library1contents + library2contents)
print(combined_contents)
return(combined_contents)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Sorting Awk array by last name - arrays

Related

How to use AWK to print associative array in a loop correctly?

Eliminate duplicate columns based on separate field using awk array?

copy set of every 3 colums to a new file in unix

How to find and extract all words appearing between brackets?

how to sort a database

Categories

Resources