How to initialize an array of arrays in awk? - arrays

Is it possible to initialize an array like this in AWK ?
Colors[1] = ("Red", "Green", "Blue")
Colors[2] = ("Yellow", "Cyan", "Purple")
And then to have a two dimensional array where Colors[2,3]="Purple".
From another thread I understand that it's not possible ( "sadly, there is no way to set an array all at once without abusing split()" ). Anyways I want to be 100% sure and I'm sure that there are others with the same question.
I am looking for the easiest method to initialize arrays like the one above, will be nice to have it well written.

If you have GNU awk, you can use a true multidimensional array. Although this answer uses the split() function, it most certainly doesn't abuse it. Run like:
awk -f script.awk
Contents of script.awk:
BEGIN {
x=SUBSEP
a="Red" x "Green" x "Blue"
b="Yellow" x "Cyan" x "Purple"
Colors[1][0] = ""
Colors[2][0] = ""
split(a, Colors[1], x)
split(b, Colors[2], x)
print Colors[2][3]
}
Results:
Purple

You can create a 2-dimensional array easily enough. What you can't do, AFAIK, is initialize it in a single operation. As dmckee hints in a comment, one of the reasons for not being able to initialize an array is that there is no restriction on the types of the subscripts, and hence no requirement that they are pure numeric. You can do multiple assignments as in the script below. The subscripts are formally separated by an obscure character designated by the variable SUBSEP, with default value 034 (U+001C, FILE SEPARATOR). Clearly, if one of the indexes contains this character, confusion will follow (but when was the last time you used that character in a string?).
BEGIN {
Colours[1,1] = "Red"
Colours[1,2] = "Green"
Colours[1,3] = "Blue"
Colours[2,1] = "Yellow"
Colours[2,2] = "Cyan"
Colours[2,3] = "Purple"
}
END {
for (i = 1; i <= 2; i++)
for (j = 1; j <= 3; j++)
printf "Colours[%d,%d] = %s\n", i, j, Colours[i,j];
}
Example run:
$ awk -f so14063783.awk /dev/null
Colours[1,1] = Red
Colours[1,2] = Green
Colours[1,3] = Blue
Colours[2,1] = Yellow
Colours[2,2] = Cyan
Colours[2,3] = Purple
$

Thanks for the answers.
Anyways, for those who want to initialize unidimensional arrays, here is an example:
SColors = "Red_Green_Blue"
split(SColors, Colors, "_")
print Colors[1] " - " Colors[2] " - " Colors[3]

The existing answers are helpful and together cover all aspects, but I thought I'd give a more focused summary.
The question conflates two aspects:
initializing arrays in Awk in general
doing so to fill a two-dimensional array in particular
Array initialization:
Awk has no array literal (initializer) syntax.
The simplest workaround is to:
represent the array elements as a single string and
use the split() function to split that string into the elements of an array.
$ awk 'BEGIN { n=split("Red Green Blue", arr); for (i=1;i<=n;++i) print arr[i] }'
Red
Green
Blue
This is what the OP did in their own helpful answer.
If the elements themselves contain whitespace, use a custom separator that's not part of the data, | in this example:
$ awk 'BEGIN { n=split("Red (1)|Green (2)", arr, "|"); for (i=1;i<=n;++i) print arr[i] }'
Red (1)
Green (2)
Initialization of a 2-dimensional array:
Per POSIX, Awk has no true multi-dimensional arrays, only an emulation of it using a one-dimensional array whose indices are implicitly concatenated with the value of built-in variable SUBSEP to form a single key (index; note that all Awk arrays are associative).
arr[1, 2] is effectively the same as arr[1 SUBSEP 2], where 1 SUBSEP 2 is a string concatenation that builds the key value.
Because there aren't truly multiple dimensions - only a flat array of compound keys - you cannot enumerate the (pseudo-)dimensions individually with for (i in ...), such as to get all sub-indices for primary (pseudo-)dimension 1 only.
The default value of SUBSEP is the "INFORMATION SEPARATOR ONE" character, a a rarely used control character that's unlikely to appear in date; in ASCII and UTF-8 it is represented as single byte 0x1f; if needed, you change the value.
By contrast, GNU Awk, as a nonstandard extension, does have support for true multi-dimensional arrays.
Important: You must then always specify the indices separately; e.g., instead of arr[1,2] you must use arr[1][2].
POSIX-compliant example (similar to TrueY's helpful answer):
awk 'BEGIN {
n=split("Red Green Blue", arrAux); for (i in arrAux) Colors[1,i] = arrAux[i]
n=split("Yellow Cyan Purple", arrAux); for (i in arrAux) Colors[2,i] = arrAux[i]
print Colors[1,2]
print "---"
# Enumerate all [2,*] values - see comments below.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
}'
Green
---
Yellow
Cyan
Purple
Note that the emulation of multi-dimensional arrays with a one-dimensional array using compound keys has the following inconvenient implications:
Auxiliary array auxArr is needed, because you cannot directly populate a given (pseudo-)dimension of an array.
You cannot enumerate just one (pseudo-)dimension with for (i in ...), you can only enumerate all indices, across (pseudo-)dimensions.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
above shows how to work around that by enumerating all keys and then matching only the ones whose first constituent index is 2, which means that the key value must start with 2, followed by SUBSEP.
GNU Awk example (similar to Steve's helpful answer, improved with Ed Morton's comment):
GNU Awk's (nonstandard) support for true multi-dimensional arrays makes the inconveniences of the POSIX-compliant solution (mostly) go away
(GNU Awk also doesn't have array initializers, however):
gawk 'BEGIN {
Colors[1][""]; split("Red Green Blue", Colors[1])
Colors[2][""]; split("Yellow Cyan Purple", Colors[2])
# NOTE: Always use *separate* indices: [1][2] instead of [1,2]
print Colors[1][2]
print "---"
# Enumerate all [2][*] values
for (i in Colors[2]) print Colors[2][i]
}'
Note:
Important: As stated, to address a specific element in a multi-dimensional array, always use separate indices; e.g., [1][2] rather than [1,2].
If you use [1,2] you'll get the standard POSIX-mandated behavior, and you'll mistakenly create a new, single index (key) with (string-concatenated) value 1 SUBSEP 2.
split() can conveniently be used to directly populate a sub-array.
As a prerequisite, however, the 2-dimensional target arrays must be initialized:
Colors[1][""] and Colors[2][""] do just that.
Dummy index [""] is just there to create a 2-dimensional array; it is discarded when split() fills that dimension later.
Enumerating a specific dimension with for (i in ...) is supported:
for (i in Colors[2]) ... conveniently enumerates only the sub-indices of Colors[2].

A similar solution. SUBSEP=":" is not really needed, just set to any visible char for demo:
awk 'BEGIN{SUBSEP=":"
split("Red Green Blue",a); for(i in a) Colors[1,i]=a[i];
split("Yellow Cyan Purple",a); for(i in a) Colors[2,i]=a[i];
for(i in Colors) print i" => "Colors[i];}'
Or a little bit more cryptic version:
awk 'BEGIN{SUBSEP=":"
split("Red Green Blue Yellow Cyan Purple",a);
for(i in a) Colors[int((i-1)/3)+1,(i-1)%3+1]=a[i];
for(i in Colors) print i" => "Colors[i];}'
Output:
1:1 => Red
1:2 => Green
1:3 => Blue
2:1 => Yellow
2:2 => Cyan
2:3 => Purple

Related

Indexing perl array

I have below code
#ar1 = ('kaje','beje','ceje','danjo');
$m = 'kajes';
my($next) = grep $_ eq 'kaje',#ar1;
print("Position is $next\n");
print("Next is $ar1[$next+1]\n");
print("Array of c is $ar1[$m+3]\n");
print("Array at m is $ar1[$m]\n");
Output seen:
Position is kaje
Next is beje
Array of c is danjo
Array at m is kaje
I want to understand how this works. Here $next is the matched string and i am able to index on that string for given array. Also $m is not found in the array, yet we get output as first element of array.
ALWAYS use use strict; use warnings;. It answers your question.
In one place, you treat the string kaje as a number.
In two places, you treat the string kajes as a number.
Since these two strings aren't numbers, Perl will use zero and issue a warning.
In other words, Next is beje only "works" because kaje happens to be at index 0. I have no idea how why you believe the last two lines work seeing as kajes is nowhere in the array.
You want
# Find the index of the element with value `kaje`.
my ($i) = grep $ar1[$_] eq 'kaje', 0..$#ar1;

Why is Perl giving "Can't modify string in scalar output" error?

I'm pretty new to Perl and this is my most complex project yet. Apologies if any parts of my explanation don't make sense or I miss something out - I'll be happy to provide further clarification. It's only one line of code that's causing me an issue.
The Aim:
I have a text file that contains a single column of data. It reads like this:
0
a,a,b,a
b,b,b,a
1
a,b,b,a
b,b,b,a
It continues like this with a number in ascending order up to 15, and the following two lines after each number are a combination of four a's or b's separated by commas. I have tied this file to an array #diplo so I can specify specific lines of it.
I also have got a file that contains two columns of data with headers that I have converted into a hash of arrays (with each of the two columns being an array). The name of the hash is $lookup and the array names are the names of the headings. The actual arrays only start from the first value in each column that isn't a heading. This file looks like this:
haplo frequency
"|5,a,b,a,a|" 0.202493719
"|2,b,b,b,a|" 0.161139191
"|3,b,b,b,a|" 0.132602458
This file contains all of the possible combinations of a or b at the four positions combined with all numbers 0-14 and their associated frequencies. In other words, it includes all possible combinations from "|0,a,a,a,a|" followed be "|1,a,a,a,a|" through to "|13,b,b,b,b|" and "|14,b,b,b,b|".
I want my Perl code to go through each of the combinations of letters in #diplo starting with a,a,b,a and record the frequency associated with the row of the haplo array containing each number from 0-14, e.g. first recording the frequency associated with "|0,a,a,b,a|" then "|1,a,a,b,a|" etc.
The output would hopefully look like this:
0 #this is the number in the #diplo file and they increase in order from 0 up to 15
0.011 0.0023 0.003 0.0532 0.163 0.3421 0.128 0.0972 0.0869 0.05514 0.0219 0.0172 0.00824 0.00886 0.00196 #these are the frequencies associated with x,a,a,b,a where x is any number from 0 to 14.
My code:
And here is the Perl code I created to hopefully sort this out (there is more to create the arrays and such which I can post if required, but I didn't want to post a load of code if it isn't necessary):
my $irow = 1; #this is the row/element number in #diplo
my $lrow = 0; #this is the row/element in $lookup{'haplo'}
my $copynumber = 0;
#print "$copynumber, $diplo[2]";
while ($irow < $diplolines - 1) {
while ($copynumber < 15) {
while ($lrow < $uplines - 1) {
if ("|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]) { ##this is the only line that causes errors
if ($copynumber == 0) {
print "$diplo[$irow-1]\n";
#print "$lookup{'frequency'}[$lrow]\t";
}
print "$lookup{'frequency'}[$lrow]\t";
}
$lrow = $lrow + 1;
}
$lrow = 0;
$copynumber = $copynumber + 1;
}
$lrow = 0;
$copynumber = 0;
$irow = $irow + 1;
}
However, the line if ("|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]) is causing an error Can't modify string in scalar assignment near "]) ".
I have tried adding in speech marks, rounded brackets and apostrophes around various elements in this line but I still get some sort of variant on this error. I'm not sure how to get around this error.
Apologies for the long question, any help would be appreciated.
EDIT: Thanks for the suggestions regarding eq, it gets rid of the error and I now know a bit more about Perl than I did. However, even though I don't get an error now, if I put anything inside the if loop for this line, e.g. printing a number, it doesn't get executed. If I put the same command within the while loop but outside of the if, it does get executed. I have strict and warnings on. Any ideas?
= is assignment, == is numerical comparison, eq is string comparison.
You can't modify a string:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" = "abcdef") { print "match\n"; } '
Found = in conditional, should be == at -e line 1.
Can't modify string in scalar assignment at -e line 1, near ""abcdef") "
Execution of -e aborted due to compilation errors.
Nonnumerical strings act like zeroes in a numerical comparison:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" == 0) { print "match\n"; } '
Argument "abcdef" isn't numeric in numeric eq (==) at -e line 1.
match
A string comparison is probably what you want:
$ perl -e 'use strict; use warnings; my $foo="def";
if ("abc$foo" eq "abcdef") { print "match\n"; } '
match
This is the problematic expression:
"|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]
The equals sign (=) is an assignment operator. It assigns the value on its right-hand side to the variable on its left-hand side. Therefore, the left-hand operand needs to be a variable, not a string as you have here.
I don't think you want to do an assignment here at all. I think you're trying to check for equality. So don't use an assignment operator, use a comparison operator.
Perl has two equality comparison operators. == does a numeric comparison to see if its operands are equal and eq does a string comparison. Why does Perl need two operators? Well Perl converts automatically between strings and numbers so it can't possibly know what kind of comparison you want to do. So you need to tell it.
What's the difference between the two types of comparison? Well, consider this code.
$x = '0';
$y = '0.0';
Are $x and $y equal? Well it depends on the kind of comparison you do. If you compare them as numbers then, yes, they are the same value (zero is the same thing whether it's an integer or a real number). But if you compare them as strings, they are different (they're not the same length for a start).
So we now know the following
$x == $y # this is true as it's a numeric comparison
$x eq $y # this is false as it's a string comparison
So let's go back to your code:
"|$copynumber,$diplo[$irow]|" = $lookup{'haplo'}[$lrow]
I guess you started with == here.
"|$copynumber,$diplo[$irow]|" == $lookup{'haplo'}[$lrow]
But that's not right as |$copynumber,$diplo[$irow]| is clearly as string, not a number. And Perl will give you a warning if you try to do a numeric comparison using a value that doesn't look like a number.
So you changed it to = and that doesn't work either as you've now changed it to an assignment.
What you really need is a string comparison:
"|$copynumber,$diplo[$irow]|" eq $lookup{'haplo'}[$lrow]

Use of uninitialized value in array value

I am having an array named #Lst_1. One of my elements is 0 in array.
Whenever I call that element. In example below the value stored in second index of an array is 0.
$Log_Sheet->write(Row,0,"$Lst_1[2]",$Format);
I am getting a warning saying
Use of uninitialized value within #Lst_1 in string.
Please help me do that.
First index of array is 0. Second element will be $List_1[1];
#!/usr/bin/env perl
use v5.22;
use warnings;
my #array = qw(foo bar);
# number of elements in array
say scalar(#array);
# last index of array
say $#array;
# undefined element (warn)
say $array[ $#array + 1];
If you just want to silence the error,
$Log_Sheet->write(Row, 0, $Lst_1[2] // 0, $Format);
This does use a feature of perl 5.10, but that's ancient enough you really should be using a sufficiently new perl to have it. I mean, there's a lot of ancient perl bugs, so it behooves you to be using a newer version.
As far as understanding the issue, no, $Lst_1[2] doesn't contain a 0. It contains an undef, which just happens to be treated mostly like 0 in numeric contexts.
Yes, I did remove the quotes around $Lst_1[2] - that was necessary, because "$Lst_1[2]" treats that undef like a string, so it's become the empty string for the purpose of a "$Lst_1[2]" // 0 test. (The empty string also happens to be treated like 0, so that doesn't change the behavior in a numeric context.)
It's not clear from your short excerpt whether #Lst_1 is less than 3 elements long, or if there's an explicit undef in #Lst_1. You'd need to show a larger excerpt of your code - or possibly even the whole thing and the data it is processing - for us to be able to determine that by looking. You could determine it, however, by adding something like the following in front of the line you gave:
if (#Lst_1 < 3) {
print "\#Lst_1 only has " . #Lst_1 . " elements\n"
} elsif (not defined($Lst_1[2])) {
print "\$Lst_1[2] is set to undef\n";
}
There are two basic ways a list can have an explicit undef element in it. The following code demonstrates both:
my #List = map "Index $_", 0 .. 3;
$List[2] = undef;
$List[5] = "Index 5";
use Data::Dump;
dd #List;
This will output
("Index 0", "Index 1", undef, "Index 3", undef, "Index 5")
The first undef was because I set it, the second was because there wasn't a fifth element but I put something in the sixth slot.

The "invisible" FS in a comma delimited array entry

Suppose I have:
awk 'BEGIN{
c["1","2","3"]=1
c["12","3"]=2
c["123"]=3 # fleeting...
c["1","23"]=4
c["1" "2" "3"]=5 # will replace c["123"] above...
for (x in c) {
print length(x), x, c[x]
split(x, d, "") # is there something that would split c["12", "3"] into "12, 3"?
# better: some awk / gawkism in one step?
for (i=1; i <= length(x); i++)
printf("|%s|", d[i])
print "\n"
}
}'
Prints:
4 123 4
|1||||2||3|
3 123 5
|1||2||3|
4 123 2
|1||2||||3|
5 123 1
|1||||2||||3|
In each case, the use of the , in forming the array entry produces a visually similar result (123) when printed in the terminal but a distinct hash value. It would appear that there is an 'invisible' separator between the the elements that is lost when printing (i.e., what delimiter makes c["12", "3"] hash differently than c["123"])
What value would I use in split to be able to determine where in the string the comma was placed when the array index was created? i.e., if I created an array entry with c["12","3"] what is the easiest way to print "12","3" vs "123" as a visually distinctly different string (in the terminal) than c["123"]?
(I know that I could do c["12" "," "3"] when creating the array entry. But what makes c["12","3"] hash differently than c["123"] and how to print those so they are seen differently in the terminal...)
c["12","3"] = c["12" SUBSEP "3"]
See SUBSEP in the awk man pages. You can set SUBSEP=FS in the BEGIN section if you have a CSV and want to write c["12","3"] instead of c["12" FS "3"] and have commas printed as the separator in the array indices.

How to get a single column of emails from a html textarea into array

I was thinking I could do this on my own but I need some help.
I need to paste a list of email addresses from a local bands mail list into a textarea and process them my Perl script.
The emails are all in a single column; delimited by newlines:
email1#email.com
email2#email.com
email3#email.com
email4#email.com
email5#email.com
I would like to obviously get rid of any whitespace:
$emailgoodintheory =~ s/\s//ig;
and I am running them through basic validation:
if (Email::Valid->address($emailgoodintheory)) { #yada
I have tried all kinds of ways to get the list into an array.
my $toarray = CGI::param('toarray');
my #toarraya = split /\r?\n/, $toarray;
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Above is just to test to see if I was successful. I have no need to print them.
It just loops through, grabs the schedules .txt file and sends each member the band schedule. All that other stuff works but I cannot get the textarea into an array!
So, as you can see, I am pretty lost.
Thank you sir(s), may I have another quick lesson?
You seem a bit new to Perl, so I will give you a thorough explanation why your code is bad and how you can improve it:
1 Naming conventions:
I see that this seems to be symbolic code, but $emailgoodintheory is far less readable than $emailGoodInTheory or $email_good_in_theory. Pick any scheme and stick to it, just don't write all lowercase.
I suppose that $emailgoodintheory holds a single email address. Then applying the regex s/\s//g or the transliteration tr/\s// will be enough; space characters are not case sensitive.
Using a module to validate adresses is a very good idea. :-)
2 Perl Data Types
Perl has three man types of variables:
Scalars can hold strings, numbers or references. They are denoted by the $ sigil.
Arrays can hold an ordered sequence of Scalars. They are denoted by the # sigil.
Hashes can hold an unordered set of Scalars. Some people tend to know them as dicitonaries. All keys and all values must be Scalars. Hashes are denoted by the % sigil.
A word on context: When getting a value/element from a hash/array, you have to change the sigil to the data type you want. Usually, we only recover one value (which always is a scalar), so you write $array[$i] or $hash{$key}. This does not follow any references so
my $arrayref = [1, 2, 3];
my #array = ($arrayref);
print #array[0];
will not print 123, but ARRAY(0xABCDEF) and give you a warning.
3 Loops in Perl:
Your loop syntax is very weird! You can use C-style loops:
for (my $i = 0; $i < #array; $i++)
where #array gives the length of the array, because we have a scalar context. You could also give $i the range of all possible indices in your array:
for my $i (0 .. $#array)
where .. is the range operator (in list context) and $#array gives the highest available index of our array. We can also use a foreach-loop:
foreach my $element (#array)
Note that in Perl, the keywords for and foreach are interchangeable.
4 What your loop does:
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Here you put each element of #toarraya into the scalar $address. Then you try to use it as an array (wrong!) and get the index $arrcnt out of it. This does not work; I hope your program died.
You can use every loop type given above (you don't need to count manually), but the standard foreach loop will suit you best:
foreach my $address (#toarraya){
print "$address<br/>\n";
}
A note on quoting syntax: while qq~ quoted ~ is absolutely legal, this is the most obfuscated code I have seen today. The standard quote " would suffice, and when using qq, try to use some sort of parenthesis (({[<|) as delimiter.
5 complete code:
I assume you wanted to write this:
my #addressList = split /\r?\n/, CGI::param('toarray');
foreach my $address (#addressList) {
# eliminate white spaces
$address =~ s/\s//g;
# Test for validity
unless (Email::Valid->address($address)) {
# complain, die, you decide
# I recommend:
print "<strong>Invalid address »$address«</strong><br/>";
next;
}
print "$address<br/>\n";
# send that email
}
And never forget to use strict; use warnings; and possibly use utf8.

Resources