How to test value is inside array with awk - arrays

my-script.awk
#!/env/bin awk
BEGIN {
toggleValues="U+4E00,U+9FFF,U+3400,U+4DBF,U+20000,U+2A6DF,U+2A700,U+2B73F,U+2B740,U+2B81F,U+2B820,U+2CEAF,U+F900,U+FAFF"
split(toggleValues, boundaries, ",")
if ("U+4E00" in boundaries) {print "inside"}
}
Run
echo ''| awk -f my-script.awk
Question
Why I don't see inside printed?

awk stores arrays differently then what you expect. It's a key/value pair with the key (from split() is the integer index starting at 0 and the value is the string that was split() it into that element.
The awk in condition tests keys, not values. So your "U+4E00" in boundaries condition isn't going to pass. Instead you'll need to iterate your array and look for the value.
for (boundary in boundaries) { if(boundaries[boundary] == "U+4E00") { print "inside" }
Either that or you can create a new array based on the existing one, but with the values stored as the key so the in operator will work as is.
for (i in boundaries) {boundaries2[boundaries[i]] = ""}
if ("U+4E00" in boundaries2){print "inside"}
This second method is a little hackey since all your element values are set to "", but it's useful if you are going to iterate through large file and just want to use the in operator to test that a field is in your array (as opposed to iterating the array on each record, which might be more expensive).

Related

Indexing perl array

I have below code
#ar1 = ('kaje','beje','ceje','danjo');
$m = 'kajes';
my($next) = grep $_ eq 'kaje',#ar1;
print("Position is $next\n");
print("Next is $ar1[$next+1]\n");
print("Array of c is $ar1[$m+3]\n");
print("Array at m is $ar1[$m]\n");
Output seen:
Position is kaje
Next is beje
Array of c is danjo
Array at m is kaje
I want to understand how this works. Here $next is the matched string and i am able to index on that string for given array. Also $m is not found in the array, yet we get output as first element of array.
ALWAYS use use strict; use warnings;. It answers your question.
In one place, you treat the string kaje as a number.
In two places, you treat the string kajes as a number.
Since these two strings aren't numbers, Perl will use zero and issue a warning.
In other words, Next is beje only "works" because kaje happens to be at index 0. I have no idea how why you believe the last two lines work seeing as kajes is nowhere in the array.
You want
# Find the index of the element with value `kaje`.
my ($i) = grep $ar1[$_] eq 'kaje', 0..$#ar1;

Simplest way to modify comma-separated values in an array?

Let's say I have an array of n elements. Each element is a string of comma-separated x,y coordinate pairs, e.g. "581,284". There is no set character length to these x,y values.
Say I wanted to subtract 8 from each x value, and 5 from each y value.
What would be the simplest way to modify x and y, independently of each other, without permanently splitting the x and y values apart?
e.g the first array element "581,284" becomes "573,279", the second array element "1013,562" becomes "1005,274", and so forth.
I worked on this problem for a couple of hours (I'm an amateur at bash), and it seemed as if my approach was awfully convoluted.
Please note that the apostrophes above are only added for emphasis, and are not a part of the problem.
Thank you in advance, I've been racking my head over this for a while now!
Edit: The following excerpt is the approach I was taking. I don't know much about bash, as you can tell.
while read value
do
if [[ -z $offset_list ]]
then
offset_list="$value"
else
offset_list="$offset_list,$value"
fi
done < text.txt
new_offset=${offset_list//,/ }
read -a new_array <<< $new_offset
for value in "${new_array[#]}"
do
if [[ $((value%2)) -eq 1 ]]
then
value=$((value-8));
new_array[$counter]=$value
counter=$((counter+1));
elif [[ $((value%2)) -eq 0 ]]
then
value=$((value-5));
new_array[$counter]=$value
counter=$((counter+1));
fi
done
Essentially I had originally read the coordinate pairs, and stripped the commas from them, and then planned on modifying odd/even values which were populated into the new array. At this point I realized that there had to be a more efficient way.
I believe the following should achieve what you are looking for:
#!/bin/bash
input=("581,284" "1013,562")
echo "Initial array ${input[#]}"
for index in ${!input[#]}; do
value=${input[$index]}
x=${value%%,*}
y=${value##*,}
input[$index]="$((x-8)),$((y+5))"
done
echo "Modified array ${input[#]}"
${!input[#]} allows us to loop over the indexes of the bash array.
${value%%,*} and ${value##*,} relies on bash parameter substitution to remove the everything after or before the comma (respectively). This effectively splits your string into two variables.
From there, it's your required math and variable reassignment to mutate the array.

Not able to divide each row of a csv file in the form of array using perl

I am stucked in a problem wherein I am parsing a csv file. The CSV file looks like-
CPU Name,DISABLE,Memory,Encoding,Extra Encoding
,b,d,,
String1,YES,1TB,Enabled,Enabled
String2,NO,1TB,Enabled,Enabled
String3,YES,1TB,Enabled,Enabled
I want to capture the first two rows in two different arrays. The code that I am using to do it is-
my $row_no =0;
while(my $row=<$fi>){
chomp($row);
$row=~ s/\A\s+//g;
$row=~s/\R//g;
#say $row;
if($row_no==0)
{
#say $row;
my #name_initial = split(',',$row);
say length(#name_initial);
say #name_initial;
}
elsif($row_no==1)
{
#say $row;
#data_type_initial =split(',',$row);
say length(#data_type_initial);
say #data_type_initial;
}
$row_no++;
}
Now I formed two arrays from topmost two lines in file (#name_initial and #data_type_initial respectively).When I am printing these array I can see all the 5 values but when I am printing the length of array it is showing length of each array as 1. When I am printing the element using index of arrays I find each element in place then why it is showing length as 1. Also second array which is formed from second line of csv file is printed as "bd". All the null values are gone and although it is containing two values 'b' and 'd'. Its length is printed as 1.
I want to convert the row of csv file in array with all the null and non_NULL values so that I can iterate on the array elements and can give conditions based on null and non null values.How can I do that???
Have a look at perldoc length. It says this:
length EXPR
length
Returns the length in characters of the value of
EXPR. If EXPR is omitted, returns the length of $_. If EXPR is
undefined, returns undef.
This function cannot be used on an entire array or hash to find out
how many elements these have. For that, use scalar #array and scalar
keys %hash, respectively.
Like all Perl character operations, length normally deals in logical
characters, not physical bytes. For how many bytes a string encoded as
UTF-8 would take up, use length(Encode::encode('UTF-8', EXPR)) (you'll
have to use Encode first). See Encode and perlunicode.
In particular, the bit that says "This function cannot be used on an entire array or hash to find out how many elements these have. For that, use scalar #array and scalar keys %hash, respectively".
So you're using the wrong approach here. Instead of say length(#array), you need say scalar(#array).
To explain the results you're getting. length() expects to be given a scalar value (a string) to measure. So it treats your array as a scalar (effectively adding an invisible call to scalar()) and gets back the number of elements in the array (which is "5") and length() then tells you the number of elements in that string - which is 1.
It's also worth pointing out that you don't need to keep track of your own $row_no variable. Perl has a built-in variable called $. which contains the current record number.
Using that knowledge (and adding little whitespace) gives us something like this:
while (my $row = <$fi>) {
chomp($row);
$row =~ s/\A\s+//g;
$row =~s/\R//g;
#say $row;
if ($. == 0) {
#say $row;
my #name_initial = split(/,/, $row);
say scalar(#name_initial);
say #name_initial;
} elsif ($. == 1) {
#say $row;
#data_type_initial = split(/,/, $row);
say scalar(#data_type_initial);
say #data_type_initial;
}
}
Update: You sneaked a couple of extra questions in at the end of this one. I'd suggest that you raise those separately.

Perl unit testing: check if a string is an array

I have this function that I want to test:
use constant NEXT => 'next';
use constant BACK => 'back';
sub getStringIDs {
return [
NEXT,
BACK
];
}
I've tried to write the following test, but it fails:
subtest 'check if it contains BACK' => sub {
use constant BACK => 'back';
my $strings = $magicObject->getStringIDs();
ok($strings =~ /BACK/);
}
What am I doing wrong?
Your getStringIDs() method returns an array reference.
The regex binding operator (=~) expects a string on its left-hand side. So it converts your array reference to a string. And a stringified array reference will look something like ARRAY(0x1ff4a68). It doesn't give you any of the contents of the array.
You can get from your array reference ($strings) to an array by dereferencing it (#$strings). And you can stringify an array by putting it in double quotes ("#$strings").
So you could do something like this:
ok("#$strings" =~ /BACK/);
But I suspect, you want word boundary markers in there:
ok("#$strings" =~ /\bBACK\b/);
And you might also prefer the like() testing function.
like("#$strings", qr[\bBACK\b], 'Strings array contains BACK');
Update: Another alternative is to use grep to check that one of your array elements is the string "BACK".
# Note: grep in scalar context returns the number of elements
# for which the block evaluated as 'true'. If we don't care how
# many elements are "BACK", we can just check that return value
# for truth with ok(). If we care that it's exactly 1, we should
# use is(..., 1) instead.
ok(grep { $_ eq 'BACK' } #$strings, 'Strings array contains BACK');
Update 2: Hmm... the fact that you're using constants here complicates this. Constants are subroutines and regexes are strings and subroutines aren't interpolated in strings.
The return value of $magicObject->getStringIDs is an array reference, not a string. It looks like the spirit of your test is that you want to check if at least one element in the array pattern matches BACK. The way to do this is to grep through the dereferenced array and check if there are a non-zero number of matches.
ok( grep(/BACK/,#$strings) != 0, 'contains BACK' );
At one time, the smartmatch operator promised to be a solution to this problem ...
ok( $strings ~~ /BACK/ )
but it has fallen into disrepute and should be used with caution (and the no warnings 'experimental::smartmatch' pragma).
The in operator is your friend.
use Test::More;
use syntax 'in';
use constant NEXT => 'next';
use constant BACK => 'back';
ok BACK |in| [NEXT, BACK], 'BACK is in the arrayref';
done_testing;

How to slice a variable into array indexes?

There is this typical problem: given a list of values, check if they are present in an array.
In awk, the trick val in array does work pretty well. Hence, the typical idea is to store all the data in an array and then keep doing the check. For example, this will print all lines in which the first column value is present in the array:
awk 'BEGIN {<<initialize the array>>} $1 in array_var' file
However, it is initializing the array takes some time because val in array checks if the index val is in array, and what we normally have stored in array is a set of values.
This becomes more relevant when providing values from command line, where those are the elements that we want to include as indexes of an array. For example, in this basic example (based on a recent answer of mine, which triggered my curiosity):
$ cat file
hello 23
bye 45
adieu 99
$ awk -v values="hello adieu" 'BEGIN {split(values,v); for (i in v) names[v[i]]} $1 in names' file
hello 23
adieu 99
split(values,v) slices the variable values into an array v[1]="hello"; v[2]="adieu"
for (i in v) names[v[i]] initializes another array names[] with names["hello"] and names["adieu"] with empty value. This way, we are ready for
$1 in names that checks if the first column is any of the indexes in names[].
As you see, we slice into a temp variable v to later on initialize the final and useful variable names[].
Is there any faster way to initialize the indexes of an array instead of setting one up and then using its values as indexes of the definitive?
No, that is the fastest (due to hash lookup) and most robust (due to string comparison) way to do what you want.
This:
BEGIN{split(values,v); for (i in v) names[v[i]]}
happens once on startup and will take close to no time while this:
$1 in array_var
which happens once for every line of input (and so is the place that needs to have optimal performance) is a hash lookup and so the fastest way to compare a string value to a set of strings.
not an array solution but one trick is to use pattern matching. To eliminate partial matches wrap the search and array values with the delimiter. For your example,
$ awk -v values="hello adieu" 'FS values FS ~ FS $1 FS' file
hello 23
adieu 99

Resources