I have just this simple code of perl:
#!/usr/bin/perl
use Module::CoreList;
use local::lib;
#print ref( $Module::CoreList::version{5.014002});
#m = sort keys $Module::CoreList::version{5.014002};
So that I know some modules of particular version. But when tried to run:
Experimental keys on scalar is now forbidden at ./a line 5.
Type of arg 1 to keys must be hash or array (not hash element) at ./a line 5, near "};"
Execution of ./a aborted due to compilation errors.
But why does it take $Module::CoreList::version{5.014002} as scalar? When the type is hash (well array when you want to know keys of that hash)?
As per the documentation of Module::CoreList, %Module::CoreList::version returns a a hash of hashes that is keyed on perl version. So each element of the hash is a hash reference, not a hash.
You need to dereference the hash reference, by putting a % in front of it, as follows:
#m = sort keys %{ $Module::CoreList::version{5.014002} };
Related
I'm a beginner (a wet lab biologist, who has to fiddle a bit with bioinformatics for the first time in my life) and today I've got stuck on one problem: how to parse an array to a hash of arrays in perl?
This doesn't work:
#myhash{$key} = #mytable;
I've finally circumvented my problem with a for loop:
for(my $i=0;$i<=$#mytable;$i++){$myhash{$key}[$i]=$mytable[$i]};
Of course it works and it does what I need to be done, but it seems to me not a solution to my problem, but just a way to circumvent it... When something doesn't work I like to understand why...
Thank you very much for your advice!
If you are asking how to put an array as one value of a hash, you do this by taking a reference to the array, since references are scalars and the values of hashes must be scalars. This is done with the backslash operator.
$myhash{$key} = \#mytable;
The for loop you describe creates such a reference through autovivification, as $myhash{$key}[0] creates an array reference at $myhash{$key} in order to assign to its index. Also note that the difference between taking a reference and copying each value is that in the former case, changes to the array after the fact will also affect the values referenced via the hash value, and vice versa.
$mytable[5] = 42; # $myhash{$key}[5] is also changed
As Grinnz mentioned you can save a reference to an array, but any change on the array latter will be reflected in hash (it is same data).
For example if you reuse same array in the loop then data in hash will reflect last iteration of the loop.
In such case you will want a copy of array stored in the hash.
#{$hash{$key}} = #array;
Programming Perl: Data strutures
Recently I'm studying hash table, and understand the basis is
create an array, for example
hashtable ht[4];
hash the key
int hash = hash_key(key);
get the index
int index = hash % 4
set to hashtable
ht[index] = insert_or_update(value)
And I know there is hash collision problem, if key1 and key2 has same hash, they go to same ht[index], so separate chaining can solve this.
keys with same hash go to same bucket, these keys will be stored in a linked list.
My question is, what happens if hash is different, but modulus is same?
For example,
hash(key1): 3
hash(key2): 7
hash(key3): 11
hash(key4): 15
so index is 3, these keys with different hash and different key go to same bucket
I search google for some hash table implementation, it seems they don't deal with this situation. Am I overthought? Anything wrong?
For example, these implementations:
https://gist.github.com/tonious/1377667#file-hash-c-L139
http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)HashTables.html?highlight=%28CategoryAlgorithmNotes%29#CA-552d62422da2c22f8793edef9212910aa5fe0701_156
redis:
https://github.com/antirez/redis/blob/unstable/src/dict.c#L488
nginx:
https://github.com/nginx/nginx/blob/master/src/core/ngx_hash.c#L34
they just compare if key is equal
If two objects' keys hash to the same bucket, it doesn't really matter if it's because they have the same hash, or because they have different hashes but they both map (via modulo) to the same bucket. As you note, a collision that occurs because of either of these situations is commonly dealt with by placing both objects in a bucket-specific list.
When we look for an object in a hashtable, we are looking for an object that shares the same key. The hashing / modulo operation is just used to tell us in which bucket we should look to see if the object is present. Once we've found the proper bucket, we still need to compare the keys of any found objects (i.e., the objects in the bucket-specific list) directly to be sure we've found a match.
So the situation of two objects with different hashes but that map to the same bucket works for the same reason that two objects with the same hashes works: we only use the bucket to find candidate matches, and rely on the key itself to determine a true match.
I have a large array of approx 100,000 items, and a small array of approx 1000 items. I need to search the large array for each of the strings in the small array, and I need the index of the string returned. (So I need to search the 100k array 1000 times)
The large array has been sorted so I guess some kind of binary chop type search would be a lot more efficient than using a foreach loop (using 'last' to break the loop when found) which is what I started with. (this first attempt results in some 30m comparisons!)
Is there a built in search method that would produce a more efficient result, or am I going to have to manually code a binary search? I also want to avoid using external modules.
For the purposes of the question, just assume that I need to find the index of a single string in the large sorted array. (I only mention the 1000 items to give an idea of the scale)
This sounds like classic hash use case scenario,
my %index_for = map { $large_array[$_] => $_ } 0 .. $#large_array;
print "index in large array:", $index_for{ $small_array[1000] };
Using a binary search is probably optimal here. Binary search only needs O(log n) comparisions (here ~ 17 comparisons per lookup).
Alternatively, you can create a hash table that maps items to their indices:
my %positions;
$positions{ $large_array[$_] } = $_ for 0 .. $#large_array;
for my $item (#small_array) {
say "$item has position $positions{$item}";
}
While now each lookup is possible in O(1) without any comparisons, you do have to create the hash table first. This may or may not be faster. Note that hashes can only use strings for keys. If your items are complex objects with their own concept of equality, you will have to derive a suitable key first.
I am new to Perl and having some difficulty with arrays in Perl. Can somebody will explain to me as to why I am not able to print the value of an array in the script below.
$sum=();
$min = 999;
$LogEntry = '';
foreach $item (1, 2, 3, 4, 5)
{
$min = $item if $min > $item;
if ($LogEntry eq '') {
push(#sum,"1"); }
print "debugging the IF condition\n";
}
print "Array is: $sum\n";
print "Min = $min\n";
The output I get is:
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
Array is:
Min = 1
Shouldn't I get Array is: 1 1 1 1 1 (5 times).
Can somebody please help?
Thanks.
You need two things:
use strict;
use warnings;
at which point the bug in your code ($sum instead of #sum) should become obvious...
$sum is not the same variable as #sum.
In this case you would benefit from starting your script with:
use strict;
use warnings;
Strict forces you to declare all variables, and warnings gives warnings..
In the meantime, change the first line to:
#sum = ();
and the second-to-last line to:
print "Array is: " . join (', ', #sum) . "\n";
See join.
As others have noted, you need to understand the way Perl uses sigils ($, #, %) to denote data structures and the access of the data in them.
You are using a scalar sigil ($), which will simply try to access a scalar variable named $sum, that has nothing to do with a completely distinct array variable named #sum - and you obviously want the latter.
What confuses you is likely the fact that, once the array variable #sum exists, you can access individual values in the array using $sum[0] syntax, but here the sigil+braces ($[]) act as a "unified" syntactic constract.
The first thing you need to do (after using strict and warnings) is to read the following documentation on sigils in Perl (aside from good Perl book):
https://stackoverflow.com/a/2732643/119280 - brian d. foy's excellent summary
The rest of the answers to the same question
This SO answer
The best summary I can give you on the syntax of accessing data structures in Perl is (quoting from my older comment)
the sigil represents the amount of data from the data structure that you are retrieving ($ of 1 element, # for a list of elements, % for entire hash)
whereas the brace style represent what your data structure is (square for array, curly for hash).
As a special case, when there are NO braces, the sigil will represent BOTH the amount of data, as well as what the data structure is.
Please note that in your specific case, it's the last bullet point that matters. Since you're referring to the array as a whole, you won't have braces, and therefore the sigil will represent the data structure type - since it's an array, you must use the # sigil.
You push the values into the array #sum, then finish up by printing the scalar $sum. #sum and $sum are two completely independent variables. If you print "#sum\n" instead, you should get the output "11111".
print "Array is: $sum\n";
will print a non-existent scalar variable called $sum, not the array #sum and not the first item of the array.
If you 'use strict' it will flag the user of un-initialized variables like this.
You should definitly add use strict; and use warnings; to your script. That would have complained about the print "Array is: $sum\n"; line (among others).
And you initialize an array with my #sum=(); not with my $sum=();
Like CFL_Jeff mentions, you can't just do a quick print. Instead, do something like:
print "Array is ".join(', ',#array);
Still would like to add some details to this picture. )
See, Perl is well-known as a Very High Level Language. And this is not just because you can replace (1,2,3,4,5) with (1..5) and get the same result.
And not because you may leave your variables without (explicitly) assigning some initial values to them: my #arr is as good as my #arr = (), and my $scal (instead of my $scal = 'some filler value') may actually save you an hour or two one day. Perl is usually (with use warnings, yes) good at spotting undefined values in unusual places - but not so lucky with 'filler values'...
The true point of VHLL is that, in my opinion, you can express a solution in Perl code just like in any human language available (and some may be even less suitable for that case).
Don't believe me? Ok, check your code - or rather your set of tasks, for example.
Need to find the lowest element in a array? Or a sum of all values in array? List::Util module is to your command:
use List::Util qw( min sum );
my #array_of_values = (1..10);
my $min_value = min( #array_of_values );
my $sum_of_values = sum( #array_of_values );
say "In the beginning was... #array_of_values";
say "With lowest point at $min_value";
say "Collected they give $sum_of_values";
Need to construct an array from another array, filtering out unneeded values? grep is here to save the day:
#filtered_array = grep { $filter_condition } #source_array;
See the pattern? Don't try to code your solution into some machine-codish mumbo-jumbo. ) Find a solution in your own language, then just find means to translate THAT solution into Perl code instead. It's easier than you thought. )
Disclaimer: I do understand that reinventing the wheel may be good for learning why wheels are so useful at first place. ) But I do see how often wheels are reimplemented - becoming uglier and slower in process - in production code, just because people got used to this mode of thinking.
I need to see if there are duplicates in an array of strings, what's the most time-efficient way of doing it?
One of the things I love about Perl is it's ability to almost read like English. It just sort of makes sense.
use strict;
use warnings;
my #array = qw/yes no maybe true false false perhaps no/;
my %seen;
foreach my $string (#array) {
next unless $seen{$string}++;
print "'$string' is duplicated.\n";
}
Output
'false' is duplicated.
'no' is duplicated.
Turning the array into a hash is the fastest way [O(n)], though its memory inefficient. Using a for loop is a bit faster than grep, but I'm not sure why.
#!/usr/bin/perl
use strict;
use warnings;
my %count;
my %dups;
for(#array) {
$dups{$_}++ if $count{$_}++;
}
A memory efficient way is to sort the array in place and iterate through it looking for equal and adjacent entries.
# not exactly sort in place, but Perl does a decent job optimizing it
#array = sort #array;
my $last;
my %dups;
for my $entry (#array) {
$dups{$entry}++ if defined $last and $entry eq $last;
$last = $entry;
}
This is nlogn speed, because of the sort, but only needs to store the duplicates rather than a second copy of the data in %count. Worst case memory usage is still O(n) (when everything is duplicated) but if your array is large and there's not a lot of duplicates you'll win.
Theory aside, benchmarking shows the latter starts to lose on large arrays (like over a million) with a high percentage of duplicates.
If you need the uniquified array anyway, it is fastest to use the heavily-optimized library List::MoreUtils, and then compare the result to the original:
use strict;
use warnings;
use List::MoreUtils 'uniq';
my #array = qw(1 1 2 3 fibonacci!);
my #array_uniq = uniq #array;
print ((scalar(#array) == scalar(#array_uniq)) ? "no dupes" : "dupes") . " found!\n";
Or if the list is large and you want to bail as soon as a duplicate entry is found, use a hash:
my %uniq_elements;
foreach my $element (#array)
{
die "dupe found!" if $uniq_elements{$element}++;
}
Create a hash or a set or use a collections.Counter().
As you encounter each string/input check to see if there's an instance of that in the hash. If so, it's a duplicate (do whatever you want about those). Otherwise add a value (such as, oh, say, the numeral one) to the hash, using the string as the key.
Example (using Python collections.Counter):
#!python
import collections
counts = collections.Counter(mylist)
uniq = [i for i,c in counts.iteritems() if c==1]
dupes = [i for i, c in counts.iteritems() if c>1]
These Counters are built around dictionaries (Pythons name for hashed mapping collections).
This is time efficient because hash keys are indexed. In most cases the lookup and insertion time for keys is done in near constant time. (In fact Perl "hashes" are so-called because they are implemented using an algorithmic trick called "hashing" --- a sort of checksum chosen for its extremely low probability of collision when fed arbitrary inputs).
If you initialize values to integers, starting with 1, then you can increment each value as you find its key already in the hash. This is just about the most efficient general purpose means of counting strings.
Not a direct answer, but this will return an array without duplicates:
#!/usr/bin/perl
use strict;
use warnings;
my #arr = ('a','a','a','b','b','c');
my %count;
my #arr_no_dups = grep { !$count{$_}++ } #arr;
print #arr_no_dups, "\n";
Please don't ask about the most time efficient way to do something unless you have some specific requirements, such as "I have to dedupe a list of 100,000 integers in under a second." Otherwise, you're worrying about how long something takes for no reason.
similar to #Schwern's second solution, but checks for duplicates a little earlier from within the comparison function of sort:
use strict;
use warnings;
#_ = sort { print "dup = $a$/" if $a eq $b; $a cmp $b } #ARGV;
it won't be as fast as the hashing solutions, but it requires less memory and is pretty darn cute