Searching an array of version numbers in Perl - arrays

I have a problem to solve, which is to find the index of a target version from an index array. The index array looks like: {16.3.1, 16.2.5, 16.1.4, 15.3.5, 15.1.1}.
For each individual item in this array (such as 16.3.1), it is concatenated of these 3 parts:
16 is the yearly release number.
3 is the quarterly release number, and it is in the range of {1,2,3,4}.
1 is the bi-weekly release number, and it can be one of these 6 options (1,2,3,4,5,6).
And the array is sorted in descending order.
Now these are the requirements:
if I was giving a target version, such as 16.1.4, this algorithm will return me the matching index from that array, which is 2.
if I was giving a target version of 16.1.5, which is not in that array, then it will return me the index of next available one, which is also 2.
The target value is always higher than 15.1.1, which means it will always return a valid index back.
I was thinking of converting such a value into a number, then to do a search. (For example, 16.1.4 => 16 * 24 + 1 * 6 + 4 = 394, ..., )
But I am just wondering if there is a simple way to solve this problem?

Your solution will basically have the following form:
use List::MoreUtils qw( first_index );
my #versions = qw( 16.3.1 16.2.5 16.1.4 15.3.5 15.1.1 );
my $target = '16.1.4';
my $target_key = make_key($target);
my $index = first_index { make_key($_) <= $target_key } #versions;
For long lists, you'd use a binary search instead.
The previously posted solution assumes you are starting from hardcoded values, whereas this one demonstrates how to start from strings of any origin.
All you need now is a way to generate a key that's easily comparable with string or numerical comparison operators. The following are sorted from fastest to slowest:
# Use numerical comparison functions (<=).
sub make_key {
my #parts = split(/\./, $_[0]);
return ( $parts[0] * 4 + $parts[1] ) * 6 + $parts[2];
}
or
# Use string comparison functions (le).
sub make_key {
my $key = '';
$key .= chr($_) for split(/\./, $_[0]);
return $key;
}
or
use Sort::Key::Natural qw( mkkey_natural );
# Use string comparison functions (le).
sub make_key { mkkey_natural($_[0]) }
The first solution is an implementation of what formula you suggested.
The second solution is similar to version->parse, but without all the overhead and special cases you don't need.

Your version strings look normal enough, and your description supports the notion that you could probably just do a string comparison without further conversion. And the domain they represent is unlikely to change arbitrarily (quarters will always be quarters, for example). So since they are relationally comparable as strings, a simple string comparison is probably adequate.
The binsearch_pos function from List::BinarySearch will provide the index of the target element, or if the target element is not found, the index at which point the target could be inserted to preserve order. It is a stable binary search, so it will always return the lowest index where the target matches. Those characteristics seem to provide exactly what you need:
use List::BinarySearch qw(binsearch_pos);
my #array = qw(
16.3.1
16.2.5
16.1.4
15.3.5
15.1.1
);
print "$_: $array[$_]\t" foreach 0 .. $#array;
print "\n\n";
print "$_: ", (binsearch_pos {$b cmp $a} $_, #array), "\n"
foreach qw(16.3.1 16.3.6 16.2.7 16.2.5 16.2.4 15.1.1 15.1.3 15.1.0);
If the list of versions is short, then List::MoreUtils::first_ix is a simple linear approach that will be sufficiently efficient. If the list is sufficiently large, the binary search might be worth considering, as it scales logarithmically rather than linearly. This means as your list of version strings grows the time needed to search the list will grow at a slower rate using the binary search than it would using a linear search.
Because your list is in descending order, this solution uses $b cmp $a, which accommodates that descending order.

Perl supports a data type called version strings, which packs a version number as a string with a sequence of code points. For instance, v1.2.3 would be represented as the string "\x1\x2\x3"
You can create such a string by using v followed by a dotted-decimal sequence, or any dotted-decimal sequence with two or more dots will be treated the same way, even without the v
So we can solve your problem very simply by using version strings in combination with the first_index function from List::MoreUtils, like this
use strict;
use warnings 'all';
use feature 'say';
use List::MoreUtils 'first_index';
my #versions = ( v16.3.1, v16.2.5, v16.1.4, v15.3.5, v15.1.1 );
for my $target ( v16.1.4, v16.1.5 ) {
say first_index { $_ le $target } #versions;
}
output
2
2
There may be a problem with getting the version strings into your program in the first place, which is why I asked how you currently read them. But it's really no big issue if you explain what you need
Update
I've changed my answer to use v16.1.4, v16.1.5etc. It worked fine before, but it's less than obvious that 16.1.4 is a completely different literal from the floating-point value 16.1. On the other hand, v16.1.4 and v16.1 are both version strings
You also don't really say where your input comes from. Fair enough, you can declare a literal array of versions, as I have in my answer, but presumably your $target won't also be a literal, otherwise there's little point in writing the program in the first place
I hoped you would talk about where these things were coming from so that I could help you, but you probably need to look at the version pragma, which offers class methods that will convert between ordinary strings and version strings
For example, if the target was supplied as a string you could use version->parse to convert it to a version string, which means the final loop above would look like this
use version;
for my $target ( "16.1.5", "16.1.4" ) {
my $vs = version->parse($target);
say first_index { $_ le $vs } #versions;
}
So version->parse("16.1.4") eq v16.1.4 is always true
I hope that has clarified rather than confused

Related

Group similar element of array together to use in foreach at once in perl

i have an array which contents elements in which some elements are similiar under certain conditions (if we detete the "n and p" from the array element then the similiar element can be recognised) . I want to use these similiar element at once while using foreach statement. The array is seen below
my #array = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
Order of the array element need not to be in this way always.
i am editing this question again. Sorry if i am not able to deliver the question properly. I have to print a string multiple times in the file with the variable present in the above array . I am just trying to make you understand the question through below code, the below code is not right in any sense .... i m just using it to make you understand my question.
open (FILE, ">" , "test.v");
foreach my $xy (#array){
print FILE "DUF A1 (.pin1($1), .pin2($2));" ; // $1 And $2 is just used to explain that
} // i just want to print abc_n and abc_p in one iteration of foreach loop and followed by other pairs in successive loops respectively
close (FILE);
The result i want to print is as follows:
DUF A1 ( .pin1(abc_n), .pin2(abc_p));
DUF A1 ( .pin1(gg_n), .pin2(gg_p));
DUF A1 ( .pin1(munday_n_xy), .pin2(munday_p_xy));
DUF A1 ( .pin1(soc_n), .pin2(soc_p));
The scripting language used is perl . Your help is really appreciated .
Thank You.!!
Partitioning a data set depends entirely on how data are "similiar under certain conditions."
The condition given is that with removal of _n and _p the "similar" elements become equal (I assume that underscore; the OP says n and p). In such a case one can do
use warnings;
use strict;
use feature 'say';
my #data = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
my %part;
for my $elem (#data) {
push #{ $part{ $elem =~ s/_(?:n|p)//r } }, $elem;
}
say "$_ => #{$part{$_}}" for keys %part;
The grouped "similar" strings are printed as a demo since I don't understand the logic of the shown output. Please build your output strings as desired.
If this is it and there'll be no more input to process later in code, nor will there be a need to refer to those common factors, then you may want the groups in an array
my #groups = values %part;
If needed throw in a suitable sorting when writing the array, sort { ... } values %part.
For more fluid and less determined "similarity" try "fuzzy matching;" here is one example.

Check whether an array contains a value from another array

I have an array of objects, and an array of acceptable return values for a particular method. How do I reduce the array of objects to only those whose method in question returns a value in my array of acceptable values?
Right now, I have this:
my #allowed = grep {
my $object = $_;
my $returned = $object->method;
grep {
my $value = $_;
$value eq $returned;
} #acceptableValues;
} #objects;
The problem is that this is a compound loop, which I'd like to avoid. This program is meant to scale to arbitrary sizes, and I want to minimize the number of iterations that are run.
What's the best way to do this?
You could transform the accepted return values into a hash
my %values = map { $_ => 1 } #acceptedValues;
And grep with the condition that the key exists instead of your
original grep:
my #allowed = grep $values{ $_->method }, #objects;
Anyway, grep is pretty fast in itself, and this is just an idea of a
common approach to checking if an element is in an array. Try not to
optimize what's not needed, since it would only be worth in really big
arrays. Then you could for example sort the accepted results array and
use a binary search, or cache results if they repeat. But again, don't
worry with this kind of optimisation unless you're dealing with hundreds
of thousands of items — or more.
Elements supposed to be present in given arrays seems unique. So, I will make a hash containing the count of elements from both arrays. If there is any element with count greater than 1, it means its present in both the arrays.
my %values;
my #allowed;
map {$values{$_}++} (#acceptableValues, #objects);
for (keys %values) {
push #allowed, $_ if $values{$_} > 1;
}

How to copy data to an array in perl?

I am trying access the data from the database and copy the data to an array. This is my code,
$sth = $dbh->prepare("SELECT * FROM abcd WHERE id=100 ");
$sth->execute;
$N=$sth->rows;
print "$N\n";
while (my #row_val = $sth->fetchrow_array()){
my ($uniqid, $time, $current, $id ) = #row_val;
$y[k]=$current;
$k++;
}
for ($k=0;$k<$N;$k++) {
print "$y[k]\t";
}
But it displays the same value for all $y[k]. How to copy the data from database to an array in perl?
You are using a bareword here:
$y[k]=$current;
# ^--- here, the "k" is a bareword
If you use warnings this will give a warning
Unquoted string "k" may clash with future reserved word at foo.pl line 10.
Argument "k" isn't numeric in array element at foo.pl line 10.
And the "k" will be interpreted as a string, will be converted to a number, which will be zero 0, so all your data is stored in $y[0].
This is why it is a very bad idea to not turn warnings on.
What you probably want instead is to push the new values onto the array:
push #y, $current;
This is, IMO, preferable to using an index, since it does all that work for you. Usually, you only want to specifically get involved with array indexes if the indexes themselves are of value for you, such as when comparing array elements.
This also means that your subsequent for loop
for ($k=0;$k<$N;$k++) {
print "$y[k]\t";
}
Is better written
for (#y) {
print "$_\t";
}
Although this is better written with join:
print join "\t", #y;
As a final note, you should always use
use strict;
use warnings;
It takes a small amount of learning to overcome the additional noise when using these pragmas, but it is well worth it in terms of learning and reducing your time spent debugging. I usually say that not using these pragmas is like covering up the low oil warning lamp in your car: Not knowing about the errors does not solve them.
This behaviour is because you are putting everything to index "k" - not any number just "k",
it is only a coincidence that its working at all :) - the "same value" is the last value - isnt it ? :)
SOLUTION:
1) variables are written with $ - keep that in mind when accessing $yourArray[$variableWithIndex]
2) $y[k]=$current; # wrong! you are trying to access "k" index
correct: $y[$k]=$current;
Didnt tested it - but this should work:
$sth = $dbh->prepare("SELECT * FROM abcd WHERE id=100 ");
$sth->execute;
$N=$sth->rows;
print "$N\n";
$k=0; # init first!
while (my #row_val = $sth->fetchrow_array()){
my ($uniqid, $time, $current, $id ) = #row_val;
$y[$k]=$current; # dont forget the $
$k++;
}
for ($k=0;$k<$N;$k++) {
print "$y[$k]\t"; # dont forget the $
}

How to get a single column of emails from a html textarea into array

I was thinking I could do this on my own but I need some help.
I need to paste a list of email addresses from a local bands mail list into a textarea and process them my Perl script.
The emails are all in a single column; delimited by newlines:
email1#email.com
email2#email.com
email3#email.com
email4#email.com
email5#email.com
I would like to obviously get rid of any whitespace:
$emailgoodintheory =~ s/\s//ig;
and I am running them through basic validation:
if (Email::Valid->address($emailgoodintheory)) { #yada
I have tried all kinds of ways to get the list into an array.
my $toarray = CGI::param('toarray');
my #toarraya = split /\r?\n/, $toarray;
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Above is just to test to see if I was successful. I have no need to print them.
It just loops through, grabs the schedules .txt file and sends each member the band schedule. All that other stuff works but I cannot get the textarea into an array!
So, as you can see, I am pretty lost.
Thank you sir(s), may I have another quick lesson?
You seem a bit new to Perl, so I will give you a thorough explanation why your code is bad and how you can improve it:
1 Naming conventions:
I see that this seems to be symbolic code, but $emailgoodintheory is far less readable than $emailGoodInTheory or $email_good_in_theory. Pick any scheme and stick to it, just don't write all lowercase.
I suppose that $emailgoodintheory holds a single email address. Then applying the regex s/\s//g or the transliteration tr/\s// will be enough; space characters are not case sensitive.
Using a module to validate adresses is a very good idea. :-)
2 Perl Data Types
Perl has three man types of variables:
Scalars can hold strings, numbers or references. They are denoted by the $ sigil.
Arrays can hold an ordered sequence of Scalars. They are denoted by the # sigil.
Hashes can hold an unordered set of Scalars. Some people tend to know them as dicitonaries. All keys and all values must be Scalars. Hashes are denoted by the % sigil.
A word on context: When getting a value/element from a hash/array, you have to change the sigil to the data type you want. Usually, we only recover one value (which always is a scalar), so you write $array[$i] or $hash{$key}. This does not follow any references so
my $arrayref = [1, 2, 3];
my #array = ($arrayref);
print #array[0];
will not print 123, but ARRAY(0xABCDEF) and give you a warning.
3 Loops in Perl:
Your loop syntax is very weird! You can use C-style loops:
for (my $i = 0; $i < #array; $i++)
where #array gives the length of the array, because we have a scalar context. You could also give $i the range of all possible indices in your array:
for my $i (0 .. $#array)
where .. is the range operator (in list context) and $#array gives the highest available index of our array. We can also use a foreach-loop:
foreach my $element (#array)
Note that in Perl, the keywords for and foreach are interchangeable.
4 What your loop does:
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Here you put each element of #toarraya into the scalar $address. Then you try to use it as an array (wrong!) and get the index $arrcnt out of it. This does not work; I hope your program died.
You can use every loop type given above (you don't need to count manually), but the standard foreach loop will suit you best:
foreach my $address (#toarraya){
print "$address<br/>\n";
}
A note on quoting syntax: while qq~ quoted ~ is absolutely legal, this is the most obfuscated code I have seen today. The standard quote " would suffice, and when using qq, try to use some sort of parenthesis (({[<|) as delimiter.
5 complete code:
I assume you wanted to write this:
my #addressList = split /\r?\n/, CGI::param('toarray');
foreach my $address (#addressList) {
# eliminate white spaces
$address =~ s/\s//g;
# Test for validity
unless (Email::Valid->address($address)) {
# complain, die, you decide
# I recommend:
print "<strong>Invalid address »$address«</strong><br/>";
next;
}
print "$address<br/>\n";
# send that email
}
And never forget to use strict; use warnings; and possibly use utf8.

In Perl, how do I create a hash whose keys come from a given array?

Let's say I have an array, and I know I'm going to be doing a lot of "Does the array contain X?" checks. The efficient way to do this is to turn that array into a hash, where the keys are the array's elements, and then you can just say if($hash{X}) { ... }
Is there an easy way to do this array-to-hash conversion? Ideally, it should be versatile enough to take an anonymous array and return an anonymous hash.
%hash = map { $_ => 1 } #array;
It's not as short as the "#hash{#array} = ..." solutions, but those ones require the hash and array to already be defined somewhere else, whereas this one can take an anonymous array and return an anonymous hash.
What this does is take each element in the array and pair it up with a "1". When this list of (key, 1, key, 1, key 1) pairs get assigned to a hash, the odd-numbered ones become the hash's keys, and the even-numbered ones become the respective values.
#hash{#array} = (1) x #array;
It's a hash slice, a list of values from the hash, so it gets the list-y # in front.
From the docs:
If you're confused about why you use
an '#' there on a hash slice instead
of a '%', think of it like this. The
type of bracket (square or curly)
governs whether it's an array or a
hash being looked at. On the other
hand, the leading symbol ('$' or '#')
on the array or hash indicates whether
you are getting back a singular value
(a scalar) or a plural one (a list).
#hash{#keys} = undef;
The syntax here where you are referring to the hash with an # is a hash slice. We're basically saying $hash{$keys[0]} AND $hash{$keys[1]} AND $hash{$keys[2]} ... is a list on the left hand side of the =, an lvalue, and we're assigning to that list, which actually goes into the hash and sets the values for all the named keys. In this case, I only specified one value, so that value goes into $hash{$keys[0]}, and the other hash entries all auto-vivify (come to life) with undefined values. [My original suggestion here was set the expression = 1, which would've set that one key to 1 and the others to undef. I changed it for consistency, but as we'll see below, the exact values do not matter.]
When you realize that the lvalue, the expression on the left hand side of the =, is a list built out of the hash, then it'll start to make some sense why we're using that #. [Except I think this will change in Perl 6.]
The idea here is that you are using the hash as a set. What matters is not the value I am assigning; it's just the existence of the keys. So what you want to do is not something like:
if ($hash{$key} == 1) # then key is in the hash
instead:
if (exists $hash{$key}) # then key is in the set
It's actually more efficient to just run an exists check than to bother with the value in the hash, although to me the important thing here is just the concept that you are representing a set just with the keys of the hash. Also, somebody pointed out that by using undef as the value here, we will consume less storage space than we would assigning a value. (And also generate less confusion, as the value does not matter, and my solution would assign a value only to the first element in the hash and leave the others undef, and some other solutions are turning cartwheels to build an array of values to go into the hash; completely wasted effort).
Note that if typing if ( exists $hash{ key } ) isn’t too much work for you (which I prefer to use since the matter of interest is really the presence of a key rather than the truthiness of its value), then you can use the short and sweet
#hash{#key} = ();
I always thought that
foreach my $item (#array) { $hash{$item} = 1 }
was at least nice and readable / maintainable.
There is a presupposition here, that the most efficient way to do a lot of "Does the array contain X?" checks is to convert the array to a hash. Efficiency depends on the scarce resource, often time but sometimes space and sometimes programmer effort. You are at least doubling the memory consumed by keeping a list and a hash of the list around simultaneously. Plus you're writing more original code that you'll need to test, document, etc.
As an alternative, look at the List::MoreUtils module, specifically the functions any(), none(), true() and false(). They all take a block as the conditional and a list as the argument, similar to map() and grep():
print "At least one value undefined" if any { !defined($_) } #list;
I ran a quick test, loading in half of /usr/share/dict/words to an array (25000 words), then looking for eleven words selected from across the whole dictionary (every 5000th word) in the array, using both the array-to-hash method and the any() function from List::MoreUtils.
On Perl 5.8.8 built from source, the array-to-hash method runs almost 1100x faster than the any() method (1300x faster under Ubuntu 6.06's packaged Perl 5.8.7.)
That's not the full story however - the array-to-hash conversion takes about 0.04 seconds which in this case kills the time efficiency of array-to-hash method to 1.5x-2x faster than the any() method. Still good, but not nearly as stellar.
My gut feeling is that the array-to-hash method is going to beat any() in most cases, but I'd feel a whole lot better if I had some more solid metrics (lots of test cases, decent statistical analyses, maybe some big-O algorithmic analysis of each method, etc.) Depending on your needs, List::MoreUtils may be a better solution; it's certainly more flexible and requires less coding. Remember, premature optimization is a sin... :)
In perl 5.10, there's the close-to-magic ~~ operator:
sub invite_in {
my $vampires = [ qw(Angel Darla Spike Drusilla) ];
return ($_[0] ~~ $vampires) ? 0 : 1 ;
}
See here: http://dev.perl.org/perl5/news/2007/perl-5.10.0.html
Also worth noting for completeness, my usual method for doing this with 2 same-length arrays #keys and #vals which you would prefer were a hash...
my %hash = map { $keys[$_] => $vals[$_] } (0..#keys-1);
Raldi's solution can be tightened up to this (the '=>' from the original is not necessary):
my %hash = map { $_,1 } #array;
This technique can also be used for turning text lists into hashes:
my %hash = map { $_,1 } split(",",$line)
Additionally if you have a line of values like this: "foo=1,bar=2,baz=3" you can do this:
my %hash = map { split("=",$_) } split(",",$line);
[EDIT to include]
Another solution offered (which takes two lines) is:
my %hash;
#The values in %hash can only be accessed by doing exists($hash{$key})
#The assignment only works with '= undef;' and will not work properly with '= 1;'
#if you do '= 1;' only the hash key of $array[0] will be set to 1;
#hash{#array} = undef;
You could also use Perl6::Junction.
use Perl6::Junction qw'any';
my #arr = ( 1, 2, 3 );
if( any(#arr) == 1 ){ ... }
If you do a lot of set theoretic operations - you can also use Set::Scalar or similar module. Then $s = Set::Scalar->new( #array ) will build the Set for you - and you can query it with: $s->contains($m).
You can place the code into a subroutine, if you don't want pollute your namespace.
my $hash_ref =
sub{
my %hash;
#hash{ #{[ qw'one two three' ]} } = undef;
return \%hash;
}->();
Or even better:
sub keylist(#){
my %hash;
#hash{#_} = undef;
return \%hash;
}
my $hash_ref = keylist qw'one two three';
# or
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
If you really wanted to pass an array reference:
sub keylist(\#){
my %hash;
#hash{ #{$_[0]} } = undef if #_;
return \%hash;
}
my #key_list = qw'one two three';
my $hash_ref = keylist #key_list;
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #a = qw(5 8 2 5 4 8 9);
my #b = qw(7 6 5 4 3 2 1);
my $h = {};
#{$h}{#a} = #b;
print Dumper($h);
gives (note repeated keys get the value at the greatest position in the array - ie 8->2 and not 6)
$VAR1 = {
'8' => '2',
'4' => '3',
'9' => '1',
'2' => '5',
'5' => '4'
};
You might also want to check out Tie::IxHash, which implements ordered associative arrays. That would allow you to do both types of lookups (hash and index) on one copy of your data.

Resources