Combining array to get an array of hashes - arrays

I have three arrays that need to be stored together for future use. Each array is related to each other, and every array element per position is meant to be matched together. The arrays elements orders will always be correct, but beyond that, there is no easy way to discern the correct order once lost.
How can I combine these arrays together without losing their original order?
I am assuming that an array of hashes is the best way to go, but, please let me know if I'm wrong in that assumption.
Example Arrays:
my #numbers = (5,2,7,32,9);
my #letters = qw(z b t t c);
my #words = qw(tiny book lawn very dance);
Example end result.
my #combined_arrays = (
{
'number' => '5',
'letter' => 'z',
'word' => 'tiny',
},
{
'number' => '2',
'letter' => 'b',
'word' => 'book',
},
{
'number' => '7',
'letter' => 't',
'word' => 'lawn',
},
{
'number' => '32',
'letter' => 't',
'word' => 'very',
},
{
'number' => '9',
'letter' => 'c',
'word' => 'dance',
},
);

I would do it like this
my #combined_arrays = map { "number" => $numbers[$_] , "letter" => $letters[$_] , "word" => $words[$_] } , 0..#letters-1;

I realise you've already accepted an answer, but I thought I'd just throw out a more concise option that relies on some modules.
I'm using zip (aka mesh, from either List::SomeUtils or List::MoreUtils) and zip_by (from List::UtilsBy), but I'm importing both of them via List::AllUtils.
use strict;
use warnings;
use List::AllUtils qw( zip zip_by );
my #numbers = (5,2,7,32,9);
my #letters = qw(z b t t c);
my #words = qw(tiny book lawn very dance);
my #keys = qw(number letter word);
my #combined = zip_by { +{ zip #keys, #_ } } \#numbers, \#letters, \#words;
It's potentially more readable, but only if you're familiar with what zip and zip_by do. At the very least, it fits inside 80 characters.
Update
I originally had \%{{ zip #keys, #_ }} inside the zip_by. This was to force it to interpret my curlies as a hash-ref. Then I remembered that +{} is a prettier way to disambiguate.

Related

Identifying elements in one array of hashes that are not in another array of hashes (perl)

I'm a novice perl programmer trying to identify which elements are in one array of hashes but not in another. I'm trying to search through the "new" array, identifying the id, title, and created elements that don't exist from the "old" array.
I believe I have it working with a set of basic for() loops, but I'd like to do it more efficiently. This only came after having tried to use grep() and failed.
These arrays are built from a database as such:
use DBI;
use strict;
use Data::Dumper;
use Array::Utils qw(:all);
sub db_connect_new();
sub db_disconnect_new($);
sub db_connect_old();
sub db_disconnect_old($);
my $dbh_old = db_connect_old();
my $dbh_new = db_connect_new();
# get complete list of articles on each host first (Joomla! system)
my $sql_old = "select id,title,created from mos_content;";
my $sql_new = "select id,title,created from xugc_content;";
my $sth_old = $dbh_old->prepare($sql_old);
my $sth_new = $dbh_new->prepare($sql_new);
$sth_old->execute();
$sth_new->execute();
my $ref_old;
my $ref_new;
while ($ref_old = $sth_old->fetchrow_hashref()) {
push #rv_old, $ref_old;
}
while ($ref_new = $sth_new->fetchrow_hashref()) {
push #rv_new, $ref_new;
}
my #seen = ();
my #notseen = ();
foreach my $i (#rv_old) {
my $id = $i->{id};
my $title = $i->{title};
my $created = $i->{created};
my $seen = 0;
foreach my $j (#rv_new) {
if ($i->{id} == $j->{id}) {
push #seen, $i;
$seen = 1;
}
}
if ($seen == 0) {
print "$i->{id},$i->{title},$i->{state},$i->{catid},$i->{created}\n";
push #notseen, $i;
}
}
The arrays look like this when using Dumper(#rv_old) to print them:
$VAR1 = {
'title' => 'Legal Notice',
'created' => '2004-10-07 00:17:45',
'id' => 14
};
$VAR2 = {
'created' => '2004-11-15 16:04:06',
'id' => 86096,
'title' => 'IRC'
};
$VAR3 = {
'id' => 16,
'created' => '2004-10-07 16:15:29',
'title' => 'About'
};
I tried to use grep() using array references, but I don't think I understand arrays, hashes, and references well enough to do it properly. My failed grep() attempts are below. I'd appreciate any ideas of how to do this properly.
I believe the problem with this is that I don't know how to reference the id field in the second array of hashes. Most of the examples using grep() that I've seen are to just look through an entire array, like you would with regular grep(1). I need to iterate through one array, checking each of the values from the id field with the id field from another array.
my $rv_old_ref = \#rv_old;
my $rv_new_ref = \#rv_new;
for my $i ( 0 .. $#rv_old) {
my $match = grep { $rv_new_ref->$_ == $rv_old_ref->$_ } #rv_new;
push #notseen, $match if !$match;
}
I also tried variations on the grep() above:
1) if (($p) = grep ($hash_ref->{id}, #rv_old)) {
2) if ($hash_ref->{id} ~~ #rv_old) {
There are a number of libraries that compare arrays. However, your comparison involves complex data structures (the arrays have hashrefs as elements) and this at least complicates use of all modules that I am aware of.
So here is a way to do it by hand. I use the shown array and its copy with one value changed.
use warnings;
use strict;
use feature 'say';
use List::Util qw(none); # in List::MoreUtils with older Perls
use Data::Dump qw(dd pp);
sub hr_eq {
my ($e1, $e2) = #_;
return 0 if scalar keys %$e1 != scalar keys %$e2;
foreach my $k1 (keys %$e1) {
return 0 if !exists($e2->{$k1}) or $e1->{$k1} ne $e2->{$k1};
}
return 1
}
my #a1 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:04:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #a2 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:xxx:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #only_in_two = grep {
my $e2 = $_;
none { hr_eq($e2, $_) } #a1;
} #a2;
dd \#only_in_two;
This correctly identifies the element in #a2 that doesn't exist in #a1 (with xxx in timestamp).
Notes
This finds what elements of one array are not in another, not the full difference between arrays. It is what the question specifically asks for.
The comparison relies on details of your data structure (hashref); there's no escaping that, unless you want to reach for more comprehensive libraries (like Test::More).
This uses string comparison, ne, even for numbers and timestamps. See whether it makes sense for your real data to use more appropriate comparisons for particular elements.
Searching through a whole list for each element of a list is an O(N*M) algorithm. Solutions of such (quadratic) complexity are usable as long as data isn't too big; however, once data gets big enough so that size increases have clear effects they break down rapidly (slow down to the point of being useless). Time it to get a feel for this in your case.
An O(N+M) approach exists here, utilizing hashes, shown in ikegami answer. This is much better algorithmically, once the data is large enough for it to show. However, as your array carries complex data structure (hashrefs) a bit of work is needed to come up with a working program, specially as we don't know data. But if your data is sizable then you surely want to implement this.
Some comments on filtering.
The question correctly observes that for each element of an array, as it's processed in grep, the whole other array need be checked.
This is done in the body of grep using none from List::Util. It returns true if the code in its block evaluates false for all elements of the list; thus, if "none" of the elements satisfy that code. This is the heart of the requirement: an element must not be found in the other array.
Care is needed with the default $_ variable, since it is used by both grep and none.
In grep's block $_ aliases the currently processed element of the list, as grep goes through them one by one; we save it into a named variable ($e2). Then none comes along and in its block "takes possession" of $_, assigning elements of #a1 to it as it processes them. The current element of #a2 is also available since we have copied it into $e2.
The test performed in none is pulled into a a subroutine, which I call hr_eq to emphasize that it is specifically for equality comparison of (elements in) hashrefs.
It is in this sub where the details can be tweaked. Firstly, instead of bluntly using ne for values for each key, you can add custom comparisons for particular keys (numbers must use ==, etc). Then, if your data structures change this is where you'd adjust specifics.
You could use grep.
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !grep { $_->{id} == $new_row->{id} } #old_rows;
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !grep { $_->{id} == $old_row->{id} } #new_rows;
}
But that's an O(N*M) solution, while there exists an O(N+M) solution that would be far faster.
my %old_keys; ++$old_keys{ $_->{id} } for #old_rows;
my %new_keys; ++$new_keys{ $_->{id} } for #new_rows;
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !$old_keys{$new_row->{id}};
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !$new_keys{$old_row->{id}};
}
If both of your database connections are to the same database, this can be done far more efficiently within the database itself.
Create a temporary table with three fields, id, old_count (DEFAULT 0) and new_count (DEFAULT 0).
INSERT OR UPDATE from the old table into the temporary table, incrementing old_count in the process.
INSERT OR UPDATE from the new table into the temporary table, incrementing new_count in the process.
SELECT the rows of the temporary table which have 0 for old_count or 0 for new_count.
select id,title,created from mos_content
LEFT JOIN xugc_content USING(id)
WHERE xugc_content.id IS NULL;
Gives you the rows that are in mos_content but not in xugc_content.
That's even shorter than the Perl code.

How to compare two hashes of different levels in perl without using sub routines or modules?

my arrays are
my #arr = ('mars','earth','jupiter');
my #arr1 = ('mercury','mars');
my #arr2 = ('planet','earth','star','sun','planet2','mars');
%space = ( 'earth'=>{
'planet'=> {
'1' =>'US',
'2' =>'UK'
},
'planet2'=>{
'1' =>'AFRICA',
'2' =>'AUS'
}
},
'sun'=>{
'star' =>{
'1' =>'US',
'2' =>'UK'
}
},
'mars' =>{
'planet2' =>{
'1' =>'US',
'2' =>'UK'
}
}
);
now i am comparing the first two arrays in the following manner
foreach (#arr)
{
$arr_hash{$_} =1;
}
foreach my $name (keys %space)
{
foreach my $key (keys %{$space{$name}})
if ($arr_hash{$name} !=1)
{
#do something
}
now how should i compare the third array? I am trying something like this
else
{
if($arr2_hash{$key}{$name} !=1)
{
#do something else
}
I want to check whether the planet+earth pair(ex. the combination of key1 and key2 should be matched with first and second element in #arr2) is present in %space too?
any help?
I've done this twice now in Perl. Once for Test::More's is_deeply() and again for perl5i's are_equal(). Doing it right is not simple. Doing it without subroutines is just silly. If you want to see how this is done, look at are_equal(), though it can be done better.
But I don't think you actually need to compare two hashes.
What I think is happening is you need to check if the things in the various arrays are present in %space. For example...
my #arr = ('mars','earth','jupiter');
That would be true, true, and false.
my #arr1 = ('mercury','mars');
False, true.
my #arr2 = ('planet','earth','star','sun','planet2','mars');
Assuming these are pairs, they're all true.
I'm going to use better variable names than #arr which describe the contents, not the type of the structure. I'm also going to assume that use strict; use warnings; use v5.10; is present.
The first two are simple, loop through the array and check if there's an entry in %space. And we can do both arrays in one loop.
for my $name in (#names1, #names2) {
print "$name...";
say $space{$name} ? "Yes" : "No";
}
The third set is a little trickier, and how the data is laid out makes it harder. Putting pairs in a list is awkward, that's what hashes are for. This would make more sense...
my %object_types = (
earth => "planet", sun => "star", mars => "planet2"
);
Then it's easy. Check that $space{$name}{$type} is true.
for my $name (keys %object_types) {
my $type = $object_types{$name};
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
Or if you're stuck with the array we can iterate through the list in pairs.
# $i will be 0, 2, 4, etc...
for( my $i = 0; $i < $#stellar_objects; $i+=2 ) {
my($type, $name) = ($stellar_objects[$i], $stellar_objects[$i+1]);
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
What if you had a hash of types with multiple names to check instead?
my %object_types = (
planet =>['earth'],
star =>['sun'],
planet2 =>['earth','mars']
);
Same idea, but we need an inner loop over the names array. Good use of plural variable names helps keep thing straight.
for my $type (keys %object_types) {
my $names = $object_types{$type};
for my $name (#$names) {
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
}
Since these are really a set of pairs to search for, combining them into a big hash is a disservice. A better data structure to feed this search might be a list of pairs.
my #searches = (
[ planet => 'earth' ],
[ star => 'sun' ],
[ planet2 => 'earth' ],
[ planet2 => 'mars' ],
);
for my $search (#searches) {
my($type, $name) = #$search;
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
For the record, %space is poorly designed. The first two levels are fine, name and type, it's the country hashes that are awkward.
'sun'=>{
'star' =>{
# This part
'1' =>'US',
'2' =>'UK'
}
},
This has none of the advantages of a hash, and all of the disadvantages. The advantage of a hash is it's very fast to look up a single key, but this makes it awkward by making the interesting part a value. If the key is trying to impose an order on the hash, use an array.
sun => {
star => [ 'US', 'UK' ]
},
Then you can get a list the countries: $countries = $space{$name}{$type}
If you want fast key lookup and order doesn't matter, use a hash with the keys being the thing stored, and the value being 1 (just a placeholder for "true").
sun => {
star => { 'US' => 1, 'UK' => 1 }
},
This takes advantage of hash key lookup and allows $space{$name}{$type}{$country} to quickly check for existence. The "values" (even though they're stored as keys) are also guaranteed to be unique. This formally known as a set, a collection of unique values.
And you can store further information in the value.

Difficulties initializing an array in Perl

I have the following code:
print Dumper($dec_res->{repositories}[0]);
print Dumper($dec_res->{repositories}[1]);
my #repos = ($dec_res->{repositories});
print scalar #repos . "\n";
and the output is the following:
$VAR1 = {
'status' => 'OK',
'name' => 'apir',
'svnUrl' => 'https://url.whatever/svn/apir',
'id' => 39,
'viewvcUrl' => 'https://url.whatever/viewvc/apir/'
};
$VAR1 = {
'status' => 'OK',
'name' => 'CCDS',
'svnUrl' => 'https://url.whatever/svn/CCDS',
'id' => 26,
'viewvcUrl' => 'https://url.whatever/viewvc/CCDS/'
};
1
So my question is why $dec_res->{repositories} is clearly an array but #repos is not?
Here I printed the size but even trying to access elements with $repos[0] still returns an error.
Dumping $repos[0] actually print the whole structure... like dumping $dec_res->{repositories}
$dec_res->{repositories} is clearly an array
It isn't. It is an array reference.
but #repos is not?
It is an array.
You are creating a list that is one item long, and that item is the array reference. You then assign the list to the array, so the array holds that single item.
You need to dereference the array instead.
my #repos = #{$dec_res->{repositories}};
perlref explains more about references in Perl.

How do I store a hash which is inside an array element?

The background
I got a Perl module which utilizes an array for its input/output parameters, like this:
Execute({inputfile => $req->{modules}.'filename', param => \#xchange});
Inside the module a hash is build and returned via reference
$param[0] = \%values;
This is all fine and good (I think) and print Dumper #xchange[0]; will output my desired content as
$VAR1 = { '33' => 'Title1', '53' => 'Title2', '21' => 'Title3' };
The goal
I would like to loop over the content and print the key/value pairs one by one, for example like this
%testhash = ('33' => 'Test1', '53' => 'Test2', '21' => 'Test3' );
foreach $key (keys %testhash) {
print "LOOP: $key, value=$testhash{$key}\n";
}
This loop does work as intended and dumping my testhash via print Dumper \%testhash; outputs the same as the array element above
$VAR1 = { '33' => 'Test1', '53' => 'Test2', '21' => 'Test3' };
The problem
The trouble now seems to be that although both structures appear to be of the same kind I cant get my head arround, how to properly access the returned hash which is stored inside #xchange[0].
I did try %realhash = #xchange[0]; and %realhash = \#xchange[0];, but then print Dumper \%realhash; will output $VAR1 = { 'HASH(0xa7b29c0)' => undef }; or $VAR1 = { 'REF(0xa7833a0)' => undef }; respectively.
So I either need a way to get the content of #xchange[0] inside a clean new hash or a way to foreach loop over the hash inside the #xchange[0] element.
I guess I am getting screwed by the whole hash reference concept, but I am at a loss here and can't think of another way to google for it.
$xchange[0] is a hash reference. Use the dereference operator %{...} to access it as a hash.
%realhash = %{$xchange[0]};
#xchange[0] is a scalar value, it contains the reference to a hash. When you assign it to a hash
%hash = #xchange[0];
The reference is stringified into something like HASH(0xa7b29c0), and you get the warnings
Scalar value #xchange[0] better written as $xchange[0] at ...
Reference found where even-sized list expected at ...
That is to say, you get these warnings, unless you have been so foolish as to not turn warnings on with use warnings.
The first one means what it says. The second one means that the list you assign to a hash should have an even number of elements: one value for every key. You only passed a "key" (something that Perl took as a key). The value then becomes undef, as noted in your Data::Dumper output:
$VAR1 = { 'HASH(0xa7b29c0)' => undef }
What you need to do is dereference the reference.
my $href = $xchange[0];
my %hash = %$href; # using a transition variable
my %hash2 = %{ $xchange[0] } # using support curly braces
perldsc
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
my %testhash = ('33' => 'Test1', '53' => 'Test2', '21' => 'Test3' );
# Add hash as first element of xchange AoH
my #xchange = \%testhash;
# Derefererence 1st element of AoH as a hash
my %realhash = %{ $xchange[0] };
# Dump new hash
print Dumper(\%realhash);
__END__
$VAR1 = {
'21' => 'Test3',
'33' => 'Test1',
'53' => 'Test2'
};

Iterate through Array of Hashes in a Hash in Perl

I have an Array of Hashes in a Hash that looks like this:
$VAR1 = {
'file' => [
{
'pathname' => './out.log',
'size' => '51',
'name' => 'out.log',
'time' => '1345799296'
},
{
'pathname' => './test.pl',
'size' => '2431',
'name' => 'test.pl',
'time' => '1346080709'
},
{
'pathname' => './foo/bat.txt',
'size' => '24',
'name' => 'bat.txt',
'time' => '1345708287'
},
{
'pathname' => './foo/out.log',
'size' => '75',
'name' => 'out.log',
'time' => '1346063384'
}
]
};
How can I iterate through these "file entries" in a loop and access its values? Is it easier to copy my #array = #{ $filelist{file} }; so i only have an array of hashes?
No need to copy:
foreach my $file (#{ $filelist{file} }) {
print "path: $file->{pathname}; size: $file->{size}; ...\n";
}
There are no arrays of hashes in Perl, only arrays of scalars. It only happens that there's a bunch of syntactic sugar in case those scalars are references to arrays or hashes.
In your example, $VAR1 holds a reference to a hash containing a reference to an array containing references to hashes. Yeah, that's quite a lot of nesting to deal with. Plus, the outer hash seems kinda useless, since it contains only one value. So yes, I think giving the inner array a meaningful name would definitely make things clearer. It's not actually a "copy": only the reference is copied, not the contents. All of the following are equivalent:
my #files = $VAR1 -> {file} # dereferencing with the -> operator
my #files = ${$VAR1}{file} # derefencing with the sigil{ref} syntax
my #files = $$VAR1{file} # same as above with syntactic sugar
Note that when using the sigil{ref} syntax, the sigil obeys the same rules as usual: %{$ref} (or %$ref) is the hash referenced by $ref, but the element of %{$ref} for a given key is ${$ref}{key} (or $$ref{key}). The braces can contain arbitrary code returning a reference, while the short version can only be used when a scalar variable already holds the reference.
Once your array of references to hashes is in a variable, iterating over it is as easy as:
for (#files) {
my %file = %$_;
# do stuff with %file
}
See: http://perldoc.perl.org/perlref.html

Resources