Finding values in array using Perl - arrays

I have two arrays #input0 and #input1. I would like a for loop that goes through every value in #input1 and if the value exists in #input0, the value is saved in a new array #input.
All arrays contain numbers only. There are a maximum of 10 numbers per array element (see below):
#input0 = {10061 10552 10553 10554 10555 10556 10557 10558 10559 10560, 10561 10562 10563 10564 10565 10566 10567 10573 10574 10575, ...}
#input1 = {20004 20182 ...}

The most concise and idiomatic way to achieve this in Perl is not via using "for" loop but map and grep
my %seen0 = map { ($_ => 1) } #input0;
my #input = grep { $seen0{$_} } #input1;
If you specifically want a for loop, please explain why map/grep approach does not work (unless it's a homework in which case the question should be tagged as one)

Short, sweet and slow:
my #input = grep $_ ~~ #input0, #input1;
Verbose and faster with for loop:
my %input0 = map {$_, 1} #input0;
my #input;
for (#input1) {
push #input, $_ if $input0{$_};
}

You could also use a hashslice + grep:
my %tmp ;
#tmp{#input0} = undef ; # Fill all elements of #input0 in hash with value undef
my #input = grep { exists $tmp{$_} } #input1 ; # grep for existing hash keys

dgw's answer was nearly there, but contained a couple of things which aren't best practice. I believe this is better:
my %input0_map;
#input0_map{ #input0 } = ();
my #input = grep { exists $input0_map{$_} } #input1;
You should not name a variable 'tmp' unless it's in a very small scope. Since this code snippet isn't wrapped in a brace-block, we don't know how big the scope is.
You should not assign into the hash slice with a single 'undef', because that means the first element is assigned with that literal undef, and the other elements are assigned with implicit undefs. It will work, but it's bad style. Either assign them all with a value, or have them ALL assigned implicitly (as happens if we assign from the empty list).

Related

Group similar element of array together to use in foreach at once in perl

i have an array which contents elements in which some elements are similiar under certain conditions (if we detete the "n and p" from the array element then the similiar element can be recognised) . I want to use these similiar element at once while using foreach statement. The array is seen below
my #array = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
Order of the array element need not to be in this way always.
i am editing this question again. Sorry if i am not able to deliver the question properly. I have to print a string multiple times in the file with the variable present in the above array . I am just trying to make you understand the question through below code, the below code is not right in any sense .... i m just using it to make you understand my question.
open (FILE, ">" , "test.v");
foreach my $xy (#array){
print FILE "DUF A1 (.pin1($1), .pin2($2));" ; // $1 And $2 is just used to explain that
} // i just want to print abc_n and abc_p in one iteration of foreach loop and followed by other pairs in successive loops respectively
close (FILE);
The result i want to print is as follows:
DUF A1 ( .pin1(abc_n), .pin2(abc_p));
DUF A1 ( .pin1(gg_n), .pin2(gg_p));
DUF A1 ( .pin1(munday_n_xy), .pin2(munday_p_xy));
DUF A1 ( .pin1(soc_n), .pin2(soc_p));
The scripting language used is perl . Your help is really appreciated .
Thank You.!!
Partitioning a data set depends entirely on how data are "similiar under certain conditions."
The condition given is that with removal of _n and _p the "similar" elements become equal (I assume that underscore; the OP says n and p). In such a case one can do
use warnings;
use strict;
use feature 'say';
my #data = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
my %part;
for my $elem (#data) {
push #{ $part{ $elem =~ s/_(?:n|p)//r } }, $elem;
}
say "$_ => #{$part{$_}}" for keys %part;
The grouped "similar" strings are printed as a demo since I don't understand the logic of the shown output. Please build your output strings as desired.
If this is it and there'll be no more input to process later in code, nor will there be a need to refer to those common factors, then you may want the groups in an array
my #groups = values %part;
If needed throw in a suitable sorting when writing the array, sort { ... } values %part.
For more fluid and less determined "similarity" try "fuzzy matching;" here is one example.

How to add each element of array to another array's each element?

I have these sets of arrays which has two elements for each.
#a = ("a", "b");
#i = (1, 2);
#s = ( "\\!", "\?");
How do I make the result such that it'll return
a1!, b2?
And I need them to be a new set of an array like
#new =(a1!,b2?)
I wrote the code for output of the answer
$i = length(#a);
for (0..$1) {
#array = push(#array, #a[$i], #s[$i];
}
print #array;
However, it only returned
syntax error at pra.pl line 10, near "];"
Thank you in advance.
use 5.008;
use List::AllUtils qw(zip_by);
⋮
my #new = zip_by { join '', #_ } \#a, \#i, \#s;
zip_by is a subroutine from the List::AllUtils module on CPAN. So it's not built-in.
use v6;
⋮
my #new = map { .join }, zip #a, #i, #s;
In Perl 6, zip is already part of the standard library. This additionaly solution is here for flavour, it's an opportunity to show off strengths: does the same job, but with less syntax in comparison, and works out of the box.
v6 is not strictly necessary, here I just used it for contrast to indicate the version. But at the beginning of a file it also has the nice property that if you accidentally run Perl 6 code in Perl 5, you'll get a nice error message instead of a cryptic syntax error. Try it! From the use VERSION documentation:
An exception is raised if VERSION is greater than the version of the current Perl
The basic idea you have is good, to iterate simultaneously using index of an array. But the code has many elementary errors and it also doesn't do what the examples show. I suggest to first make a thorough pass through a modern and reputable Perl tutorial.
The examples indicate that you want to concatenate (see . operator) elements at each index
use warnings;
use strict;
use feature 'say';
my #a1 = ('a', 'b');
my #a2 = (1, 2);
my #a3 = ('!', '?');
my #res;
foreach my $i (0..$#a1) {
push #res, $a1[$i] . $a2[$i] . $a3[$i];
}
say for #res;
where $#a1 is the index of the last element of array #a1. This assumes that all arrays are of the same size and that all their elements are defined.
This exact work can be done using map in one statement
my #res = map { $a1[$_] . $a2[$_] . $a3[$_] } 0..$#a1;
with the same, serious, assumptions. Even if you knew they held, do you know for sure, in every run on any data? For a robust approach see the answer by mwp.
There is also each_array from List::MoreUtils, providing a "simultaneous iterator" for all arrays
my $ea = each_array(#a1, #a2, #a3);
my #res;
while ( my ($e1, $e2, $e3) = $ea->() ) {
push #res, $e1 . $e2 . $e3
}
which is really useful for more complex processing.
A quick run through basics
Always have use warnings; and use strict; at the beginning of your programs. They will catch many errors that would otherwise take a lot of time and nerves.
Don't use single-letter variable names. We quickly forget what they meant and the code gets hard to follow, and they make it way too easy to make silly mistakes.
Array's size is not given by length. It is normally obtained using context -- when an array is assigned to a scalar the number of its elements is returned. For iteration over indices there is $#ary, the index of the last element of #ary. Then the list of indices is 0 .. $#ary, using the range (..) operator
The sigil ($, #, %) at the beginning of an identifier (variable name) indicates the type of the variable (scalar, array, hash). An array element is a scalar so it needs $ -- $ary[0]
The push doesn't return array elements but it rather adds to the array in its first argument the scalars in the list that follows.
The print #array; prints array elements without anything between them. When you quote it spaces are added, print "#array\n";. Note the handy feature say though, which adds the new line.
Always use strict; and use warnings; (and use my to declare variables).
You don't need to escape ? and ! in Perl strings, and you can use the qw// quote-like operator to easily build lists of terms.
You are using length(#a) to determine the last index, but Perl array indexes are zero-based, so the last index would actually be length(#a) - 1. (But that's still not right. See the next point...)
To get an array's length in Perl, you want to evaluate it in scalar context. The length function is for strings.
You have not accounted for the situation when the arrays are not all the same length.
You assign the last index to a variable $i, but then you reference the variable $1 on the next line. Those are two different variables.
You are iterating from zero to the last index, but you aren't explicitly assigning the iterator to a variable, and you aren't using the implicit iterator variable ($_).
To get a single array element by index in Perl, the syntax is $a[$i], not #a[$i]. Because you only want a single, scalar value, the expression has to start with the scalar sigil $. (If instead you wanted a list of values from an expression, you would start the expression with the array sigil #.)
push modifies the array given by the first argument, so there's no need to assign the result back to the array in the expression.
You are missing a closing parenthesis in your push expression.
In your same code, you have #new and #array, and you are only adding elements from #a and #s (i.e. you forgot about #i).
You are pushing elements onto the array, but you aren't concatenating them into the desired string format.
Here is a working version of your implementation:
use strict;
use warnings;
use List::Util qw{max};
my #a = ("a", "b");
my #i = ("1", "2");
my #s = ("!", "?");
my #array;
my $length = max scalar #a, scalar #i, scalar #s;
foreach my $i (0 .. $length - 1) {
push #array, ($a[$i] // '') . ($i[$i] // '') . ($s[$i] // '');
}
print #array;
(The // means "defined-or".)
Here's how I might write it:
use strict;
use warnings;
use List::Util qw{max};
my #a = qw/a b/;
my #i = qw/1 2/;
my #s = qw/! ?/;
my #array = map {
join '', grep defined, $a[$_], $i[$_], $s[$_]
} 0 .. max $#a, $#i, $#s;
print "#array\n";
(The $#a means "give me the index of the last element of the array #a.")
use warnings;
use strict;
use Data::Dumper;
my $result = [];
my #a = ("a", "b");
my #i = (1, 2);
my #s = ( "\!", "\?");
my $index = 0;
for my $a ( #a ) {
push( #$result, ($a[$index], $i[$index], $s[$index]) );
$index = $index + 1;
}
print Dumper(#$result);

Check whether an array contains a value from another array

I have an array of objects, and an array of acceptable return values for a particular method. How do I reduce the array of objects to only those whose method in question returns a value in my array of acceptable values?
Right now, I have this:
my #allowed = grep {
my $object = $_;
my $returned = $object->method;
grep {
my $value = $_;
$value eq $returned;
} #acceptableValues;
} #objects;
The problem is that this is a compound loop, which I'd like to avoid. This program is meant to scale to arbitrary sizes, and I want to minimize the number of iterations that are run.
What's the best way to do this?
You could transform the accepted return values into a hash
my %values = map { $_ => 1 } #acceptedValues;
And grep with the condition that the key exists instead of your
original grep:
my #allowed = grep $values{ $_->method }, #objects;
Anyway, grep is pretty fast in itself, and this is just an idea of a
common approach to checking if an element is in an array. Try not to
optimize what's not needed, since it would only be worth in really big
arrays. Then you could for example sort the accepted results array and
use a binary search, or cache results if they repeat. But again, don't
worry with this kind of optimisation unless you're dealing with hundreds
of thousands of items — or more.
Elements supposed to be present in given arrays seems unique. So, I will make a hash containing the count of elements from both arrays. If there is any element with count greater than 1, it means its present in both the arrays.
my %values;
my #allowed;
map {$values{$_}++} (#acceptableValues, #objects);
for (keys %values) {
push #allowed, $_ if $values{$_} > 1;
}

How to use an array reference to pairs of elements

I am considering this answer which uses a single array reference of points, where a point is a reference to a two-element array.
My original code of the question (function extract-crossing) uses two separate arrays $x and $y here which I call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
return extract_crossing(\#x, \#y);
The new code below based on the answer takes (x, y) and returns a single datatype, here x intercept points:
use strict; use warnings;
use Math::Geometry::Planar qw(SegmentLineIntersection);
use Test::Exception;
sub x_intercepts {
my ($points) = #_;
die 'Must pass at least 2 points' unless #$points >= 2;
my #intercepts;
my #x_axis = ( [0, 0], [1, 0] );
foreach my $i (0 .. $#$points - 1) {
my $intersect = SegmentLineIntersection([#$points[$i,$i+1],#x_axis]);
push #intercepts, $intersect if $intersect;
}
return \#intercepts;
}
which I try to call like this:
my #x = #{ $_[0] }; my #y = #{ $_[1] };
...
my $masi = x_intercepts(\#x);
return $masi;
However, the code does not make sense.
I am confused about passing the "double array" to the x_intercepts() function.
How can you make the example code clearer to the original setting?
If I am understanding the problem here, #ThisSuitIsBlackNot++ has written a function (x_intercepts which is available in the thread: Function to extract intersections of crossing lines on axes) that expects its argument to be a reference to a list of array references. The x_intercepts subroutine in turn uses a function from Math::Geometry::Planar which expects the points of a line segment to be passed as series of array references/anonymous arrays that contain the x,y values for each point.
Again - it is not entirely clear - but it seems your data is in two different arrays: one containing all the x values and one with the corresponding y values. Is this the case? If this is not correct please leave a comment and I will remove this answer.
If that is the source of your problem then you can "munge" or transform your data before you pass it to x_intercepts or - as #ThisSuitIsBlackNot suggests - you can rewrite the function. Here is an example of munging your existing data into an "#input_list" to pass to x_intercepts.
my #xs = qw/-1 1 3/;
my #ys = qw/-1 1 -1 /;
my #input_list ;
foreach my $i ( 0..$#ys ) {
push #input_list, [ $xs[$i], $ys[$i] ] ;
}
my $intercept_list = x_intercepts(\#input_list) ;
say join ",", #$_ for #$intercept_list ;
Adding the lines above to your script produces:
Output:
0,0
2,0
You have to be very careful doing this kind of thing and using tests to make sure you are passing the correctly transformed data in an expected way is a good idea.
I think a more general difficulty is that until you are familiar with perl it is sometimes tricky to easily see what sorts of values a subroutine is expecting, where they end up after they are passed in, and how to access them.
A solid grasp of perl data structures can help with that - for example I think what you are calling a "double array" or "double element" here is an "array of arrays". There are ways to make it easier to see where default arguments passed to a subroutine (in #_) are going (notice how #ThisSuitIsBlackNot has passed them to a nicely named array reference: "($points)"). Copious re-reading of perldocperbsub can help things seem more obvious.
References are key to understanding perl subroutines since to pass an array or hash to a subrouting you need to do so by references. If the argument passed x_intercepts is a list of two lists of anonymous arrays then when it is assigned to ($points), #$points->[0] #$points->[1] will be the arrays contain those lists.
I hope this helps and is not too basic (or incorrect). If #ThisSuitIsBlackNot finds the time to provide an answer you should accept it: some very useful examples have been provided.

In Perl, how do I create a hash whose keys come from a given array?

Let's say I have an array, and I know I'm going to be doing a lot of "Does the array contain X?" checks. The efficient way to do this is to turn that array into a hash, where the keys are the array's elements, and then you can just say if($hash{X}) { ... }
Is there an easy way to do this array-to-hash conversion? Ideally, it should be versatile enough to take an anonymous array and return an anonymous hash.
%hash = map { $_ => 1 } #array;
It's not as short as the "#hash{#array} = ..." solutions, but those ones require the hash and array to already be defined somewhere else, whereas this one can take an anonymous array and return an anonymous hash.
What this does is take each element in the array and pair it up with a "1". When this list of (key, 1, key, 1, key 1) pairs get assigned to a hash, the odd-numbered ones become the hash's keys, and the even-numbered ones become the respective values.
#hash{#array} = (1) x #array;
It's a hash slice, a list of values from the hash, so it gets the list-y # in front.
From the docs:
If you're confused about why you use
an '#' there on a hash slice instead
of a '%', think of it like this. The
type of bracket (square or curly)
governs whether it's an array or a
hash being looked at. On the other
hand, the leading symbol ('$' or '#')
on the array or hash indicates whether
you are getting back a singular value
(a scalar) or a plural one (a list).
#hash{#keys} = undef;
The syntax here where you are referring to the hash with an # is a hash slice. We're basically saying $hash{$keys[0]} AND $hash{$keys[1]} AND $hash{$keys[2]} ... is a list on the left hand side of the =, an lvalue, and we're assigning to that list, which actually goes into the hash and sets the values for all the named keys. In this case, I only specified one value, so that value goes into $hash{$keys[0]}, and the other hash entries all auto-vivify (come to life) with undefined values. [My original suggestion here was set the expression = 1, which would've set that one key to 1 and the others to undef. I changed it for consistency, but as we'll see below, the exact values do not matter.]
When you realize that the lvalue, the expression on the left hand side of the =, is a list built out of the hash, then it'll start to make some sense why we're using that #. [Except I think this will change in Perl 6.]
The idea here is that you are using the hash as a set. What matters is not the value I am assigning; it's just the existence of the keys. So what you want to do is not something like:
if ($hash{$key} == 1) # then key is in the hash
instead:
if (exists $hash{$key}) # then key is in the set
It's actually more efficient to just run an exists check than to bother with the value in the hash, although to me the important thing here is just the concept that you are representing a set just with the keys of the hash. Also, somebody pointed out that by using undef as the value here, we will consume less storage space than we would assigning a value. (And also generate less confusion, as the value does not matter, and my solution would assign a value only to the first element in the hash and leave the others undef, and some other solutions are turning cartwheels to build an array of values to go into the hash; completely wasted effort).
Note that if typing if ( exists $hash{ key } ) isn’t too much work for you (which I prefer to use since the matter of interest is really the presence of a key rather than the truthiness of its value), then you can use the short and sweet
#hash{#key} = ();
I always thought that
foreach my $item (#array) { $hash{$item} = 1 }
was at least nice and readable / maintainable.
There is a presupposition here, that the most efficient way to do a lot of "Does the array contain X?" checks is to convert the array to a hash. Efficiency depends on the scarce resource, often time but sometimes space and sometimes programmer effort. You are at least doubling the memory consumed by keeping a list and a hash of the list around simultaneously. Plus you're writing more original code that you'll need to test, document, etc.
As an alternative, look at the List::MoreUtils module, specifically the functions any(), none(), true() and false(). They all take a block as the conditional and a list as the argument, similar to map() and grep():
print "At least one value undefined" if any { !defined($_) } #list;
I ran a quick test, loading in half of /usr/share/dict/words to an array (25000 words), then looking for eleven words selected from across the whole dictionary (every 5000th word) in the array, using both the array-to-hash method and the any() function from List::MoreUtils.
On Perl 5.8.8 built from source, the array-to-hash method runs almost 1100x faster than the any() method (1300x faster under Ubuntu 6.06's packaged Perl 5.8.7.)
That's not the full story however - the array-to-hash conversion takes about 0.04 seconds which in this case kills the time efficiency of array-to-hash method to 1.5x-2x faster than the any() method. Still good, but not nearly as stellar.
My gut feeling is that the array-to-hash method is going to beat any() in most cases, but I'd feel a whole lot better if I had some more solid metrics (lots of test cases, decent statistical analyses, maybe some big-O algorithmic analysis of each method, etc.) Depending on your needs, List::MoreUtils may be a better solution; it's certainly more flexible and requires less coding. Remember, premature optimization is a sin... :)
In perl 5.10, there's the close-to-magic ~~ operator:
sub invite_in {
my $vampires = [ qw(Angel Darla Spike Drusilla) ];
return ($_[0] ~~ $vampires) ? 0 : 1 ;
}
See here: http://dev.perl.org/perl5/news/2007/perl-5.10.0.html
Also worth noting for completeness, my usual method for doing this with 2 same-length arrays #keys and #vals which you would prefer were a hash...
my %hash = map { $keys[$_] => $vals[$_] } (0..#keys-1);
Raldi's solution can be tightened up to this (the '=>' from the original is not necessary):
my %hash = map { $_,1 } #array;
This technique can also be used for turning text lists into hashes:
my %hash = map { $_,1 } split(",",$line)
Additionally if you have a line of values like this: "foo=1,bar=2,baz=3" you can do this:
my %hash = map { split("=",$_) } split(",",$line);
[EDIT to include]
Another solution offered (which takes two lines) is:
my %hash;
#The values in %hash can only be accessed by doing exists($hash{$key})
#The assignment only works with '= undef;' and will not work properly with '= 1;'
#if you do '= 1;' only the hash key of $array[0] will be set to 1;
#hash{#array} = undef;
You could also use Perl6::Junction.
use Perl6::Junction qw'any';
my #arr = ( 1, 2, 3 );
if( any(#arr) == 1 ){ ... }
If you do a lot of set theoretic operations - you can also use Set::Scalar or similar module. Then $s = Set::Scalar->new( #array ) will build the Set for you - and you can query it with: $s->contains($m).
You can place the code into a subroutine, if you don't want pollute your namespace.
my $hash_ref =
sub{
my %hash;
#hash{ #{[ qw'one two three' ]} } = undef;
return \%hash;
}->();
Or even better:
sub keylist(#){
my %hash;
#hash{#_} = undef;
return \%hash;
}
my $hash_ref = keylist qw'one two three';
# or
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
If you really wanted to pass an array reference:
sub keylist(\#){
my %hash;
#hash{ #{$_[0]} } = undef if #_;
return \%hash;
}
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #a = qw(5 8 2 5 4 8 9);
my #b = qw(7 6 5 4 3 2 1);
my $h = {};
#{$h}{#a} = #b;
print Dumper($h);
gives (note repeated keys get the value at the greatest position in the array - ie 8->2 and not 6)
$VAR1 = {
'8' => '2',
'4' => '3',
'9' => '1',
'2' => '5',
'5' => '4'
};
You might also want to check out Tie::IxHash, which implements ordered associative arrays. That would allow you to do both types of lookups (hash and index) on one copy of your data.

Resources