Iterate through Array of Hashes in a Hash in Perl - arrays

I have an Array of Hashes in a Hash that looks like this:
$VAR1 = {
'file' => [
{
'pathname' => './out.log',
'size' => '51',
'name' => 'out.log',
'time' => '1345799296'
},
{
'pathname' => './test.pl',
'size' => '2431',
'name' => 'test.pl',
'time' => '1346080709'
},
{
'pathname' => './foo/bat.txt',
'size' => '24',
'name' => 'bat.txt',
'time' => '1345708287'
},
{
'pathname' => './foo/out.log',
'size' => '75',
'name' => 'out.log',
'time' => '1346063384'
}
]
};
How can I iterate through these "file entries" in a loop and access its values? Is it easier to copy my #array = #{ $filelist{file} }; so i only have an array of hashes?

No need to copy:
foreach my $file (#{ $filelist{file} }) {
print "path: $file->{pathname}; size: $file->{size}; ...\n";
}

There are no arrays of hashes in Perl, only arrays of scalars. It only happens that there's a bunch of syntactic sugar in case those scalars are references to arrays or hashes.
In your example, $VAR1 holds a reference to a hash containing a reference to an array containing references to hashes. Yeah, that's quite a lot of nesting to deal with. Plus, the outer hash seems kinda useless, since it contains only one value. So yes, I think giving the inner array a meaningful name would definitely make things clearer. It's not actually a "copy": only the reference is copied, not the contents. All of the following are equivalent:
my #files = $VAR1 -> {file} # dereferencing with the -> operator
my #files = ${$VAR1}{file} # derefencing with the sigil{ref} syntax
my #files = $$VAR1{file} # same as above with syntactic sugar
Note that when using the sigil{ref} syntax, the sigil obeys the same rules as usual: %{$ref} (or %$ref) is the hash referenced by $ref, but the element of %{$ref} for a given key is ${$ref}{key} (or $$ref{key}). The braces can contain arbitrary code returning a reference, while the short version can only be used when a scalar variable already holds the reference.
Once your array of references to hashes is in a variable, iterating over it is as easy as:
for (#files) {
my %file = %$_;
# do stuff with %file
}
See: http://perldoc.perl.org/perlref.html

Related

Identifying elements in one array of hashes that are not in another array of hashes (perl)

I'm a novice perl programmer trying to identify which elements are in one array of hashes but not in another. I'm trying to search through the "new" array, identifying the id, title, and created elements that don't exist from the "old" array.
I believe I have it working with a set of basic for() loops, but I'd like to do it more efficiently. This only came after having tried to use grep() and failed.
These arrays are built from a database as such:
use DBI;
use strict;
use Data::Dumper;
use Array::Utils qw(:all);
sub db_connect_new();
sub db_disconnect_new($);
sub db_connect_old();
sub db_disconnect_old($);
my $dbh_old = db_connect_old();
my $dbh_new = db_connect_new();
# get complete list of articles on each host first (Joomla! system)
my $sql_old = "select id,title,created from mos_content;";
my $sql_new = "select id,title,created from xugc_content;";
my $sth_old = $dbh_old->prepare($sql_old);
my $sth_new = $dbh_new->prepare($sql_new);
$sth_old->execute();
$sth_new->execute();
my $ref_old;
my $ref_new;
while ($ref_old = $sth_old->fetchrow_hashref()) {
push #rv_old, $ref_old;
}
while ($ref_new = $sth_new->fetchrow_hashref()) {
push #rv_new, $ref_new;
}
my #seen = ();
my #notseen = ();
foreach my $i (#rv_old) {
my $id = $i->{id};
my $title = $i->{title};
my $created = $i->{created};
my $seen = 0;
foreach my $j (#rv_new) {
if ($i->{id} == $j->{id}) {
push #seen, $i;
$seen = 1;
}
}
if ($seen == 0) {
print "$i->{id},$i->{title},$i->{state},$i->{catid},$i->{created}\n";
push #notseen, $i;
}
}
The arrays look like this when using Dumper(#rv_old) to print them:
$VAR1 = {
'title' => 'Legal Notice',
'created' => '2004-10-07 00:17:45',
'id' => 14
};
$VAR2 = {
'created' => '2004-11-15 16:04:06',
'id' => 86096,
'title' => 'IRC'
};
$VAR3 = {
'id' => 16,
'created' => '2004-10-07 16:15:29',
'title' => 'About'
};
I tried to use grep() using array references, but I don't think I understand arrays, hashes, and references well enough to do it properly. My failed grep() attempts are below. I'd appreciate any ideas of how to do this properly.
I believe the problem with this is that I don't know how to reference the id field in the second array of hashes. Most of the examples using grep() that I've seen are to just look through an entire array, like you would with regular grep(1). I need to iterate through one array, checking each of the values from the id field with the id field from another array.
my $rv_old_ref = \#rv_old;
my $rv_new_ref = \#rv_new;
for my $i ( 0 .. $#rv_old) {
my $match = grep { $rv_new_ref->$_ == $rv_old_ref->$_ } #rv_new;
push #notseen, $match if !$match;
}
I also tried variations on the grep() above:
1) if (($p) = grep ($hash_ref->{id}, #rv_old)) {
2) if ($hash_ref->{id} ~~ #rv_old) {
There are a number of libraries that compare arrays. However, your comparison involves complex data structures (the arrays have hashrefs as elements) and this at least complicates use of all modules that I am aware of.
So here is a way to do it by hand. I use the shown array and its copy with one value changed.
use warnings;
use strict;
use feature 'say';
use List::Util qw(none); # in List::MoreUtils with older Perls
use Data::Dump qw(dd pp);
sub hr_eq {
my ($e1, $e2) = #_;
return 0 if scalar keys %$e1 != scalar keys %$e2;
foreach my $k1 (keys %$e1) {
return 0 if !exists($e2->{$k1}) or $e1->{$k1} ne $e2->{$k1};
}
return 1
}
my #a1 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:04:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #a2 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:xxx:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #only_in_two = grep {
my $e2 = $_;
none { hr_eq($e2, $_) } #a1;
} #a2;
dd \#only_in_two;
This correctly identifies the element in #a2 that doesn't exist in #a1 (with xxx in timestamp).
Notes
This finds what elements of one array are not in another, not the full difference between arrays. It is what the question specifically asks for.
The comparison relies on details of your data structure (hashref); there's no escaping that, unless you want to reach for more comprehensive libraries (like Test::More).
This uses string comparison, ne, even for numbers and timestamps. See whether it makes sense for your real data to use more appropriate comparisons for particular elements.
Searching through a whole list for each element of a list is an O(N*M) algorithm. Solutions of such (quadratic) complexity are usable as long as data isn't too big; however, once data gets big enough so that size increases have clear effects they break down rapidly (slow down to the point of being useless). Time it to get a feel for this in your case.
An O(N+M) approach exists here, utilizing hashes, shown in ikegami answer. This is much better algorithmically, once the data is large enough for it to show. However, as your array carries complex data structure (hashrefs) a bit of work is needed to come up with a working program, specially as we don't know data. But if your data is sizable then you surely want to implement this.
Some comments on filtering.
The question correctly observes that for each element of an array, as it's processed in grep, the whole other array need be checked.
This is done in the body of grep using none from List::Util. It returns true if the code in its block evaluates false for all elements of the list; thus, if "none" of the elements satisfy that code. This is the heart of the requirement: an element must not be found in the other array.
Care is needed with the default $_ variable, since it is used by both grep and none.
In grep's block $_ aliases the currently processed element of the list, as grep goes through them one by one; we save it into a named variable ($e2). Then none comes along and in its block "takes possession" of $_, assigning elements of #a1 to it as it processes them. The current element of #a2 is also available since we have copied it into $e2.
The test performed in none is pulled into a a subroutine, which I call hr_eq to emphasize that it is specifically for equality comparison of (elements in) hashrefs.
It is in this sub where the details can be tweaked. Firstly, instead of bluntly using ne for values for each key, you can add custom comparisons for particular keys (numbers must use ==, etc). Then, if your data structures change this is where you'd adjust specifics.
You could use grep.
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !grep { $_->{id} == $new_row->{id} } #old_rows;
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !grep { $_->{id} == $old_row->{id} } #new_rows;
}
But that's an O(N*M) solution, while there exists an O(N+M) solution that would be far faster.
my %old_keys; ++$old_keys{ $_->{id} } for #old_rows;
my %new_keys; ++$new_keys{ $_->{id} } for #new_rows;
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !$old_keys{$new_row->{id}};
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !$new_keys{$old_row->{id}};
}
If both of your database connections are to the same database, this can be done far more efficiently within the database itself.
Create a temporary table with three fields, id, old_count (DEFAULT 0) and new_count (DEFAULT 0).
INSERT OR UPDATE from the old table into the temporary table, incrementing old_count in the process.
INSERT OR UPDATE from the new table into the temporary table, incrementing new_count in the process.
SELECT the rows of the temporary table which have 0 for old_count or 0 for new_count.
select id,title,created from mos_content
LEFT JOIN xugc_content USING(id)
WHERE xugc_content.id IS NULL;
Gives you the rows that are in mos_content but not in xugc_content.
That's even shorter than the Perl code.

Combining array to get an array of hashes

I have three arrays that need to be stored together for future use. Each array is related to each other, and every array element per position is meant to be matched together. The arrays elements orders will always be correct, but beyond that, there is no easy way to discern the correct order once lost.
How can I combine these arrays together without losing their original order?
I am assuming that an array of hashes is the best way to go, but, please let me know if I'm wrong in that assumption.
Example Arrays:
my #numbers = (5,2,7,32,9);
my #letters = qw(z b t t c);
my #words = qw(tiny book lawn very dance);
Example end result.
my #combined_arrays = (
{
'number' => '5',
'letter' => 'z',
'word' => 'tiny',
},
{
'number' => '2',
'letter' => 'b',
'word' => 'book',
},
{
'number' => '7',
'letter' => 't',
'word' => 'lawn',
},
{
'number' => '32',
'letter' => 't',
'word' => 'very',
},
{
'number' => '9',
'letter' => 'c',
'word' => 'dance',
},
);
I would do it like this
my #combined_arrays = map { "number" => $numbers[$_] , "letter" => $letters[$_] , "word" => $words[$_] } , 0..#letters-1;
I realise you've already accepted an answer, but I thought I'd just throw out a more concise option that relies on some modules.
I'm using zip (aka mesh, from either List::SomeUtils or List::MoreUtils) and zip_by (from List::UtilsBy), but I'm importing both of them via List::AllUtils.
use strict;
use warnings;
use List::AllUtils qw( zip zip_by );
my #numbers = (5,2,7,32,9);
my #letters = qw(z b t t c);
my #words = qw(tiny book lawn very dance);
my #keys = qw(number letter word);
my #combined = zip_by { +{ zip #keys, #_ } } \#numbers, \#letters, \#words;
It's potentially more readable, but only if you're familiar with what zip and zip_by do. At the very least, it fits inside 80 characters.
Update
I originally had \%{{ zip #keys, #_ }} inside the zip_by. This was to force it to interpret my curlies as a hash-ref. Then I remembered that +{} is a prettier way to disambiguate.

Difficulties initializing an array in Perl

I have the following code:
print Dumper($dec_res->{repositories}[0]);
print Dumper($dec_res->{repositories}[1]);
my #repos = ($dec_res->{repositories});
print scalar #repos . "\n";
and the output is the following:
$VAR1 = {
'status' => 'OK',
'name' => 'apir',
'svnUrl' => 'https://url.whatever/svn/apir',
'id' => 39,
'viewvcUrl' => 'https://url.whatever/viewvc/apir/'
};
$VAR1 = {
'status' => 'OK',
'name' => 'CCDS',
'svnUrl' => 'https://url.whatever/svn/CCDS',
'id' => 26,
'viewvcUrl' => 'https://url.whatever/viewvc/CCDS/'
};
1
So my question is why $dec_res->{repositories} is clearly an array but #repos is not?
Here I printed the size but even trying to access elements with $repos[0] still returns an error.
Dumping $repos[0] actually print the whole structure... like dumping $dec_res->{repositories}
$dec_res->{repositories} is clearly an array
It isn't. It is an array reference.
but #repos is not?
It is an array.
You are creating a list that is one item long, and that item is the array reference. You then assign the list to the array, so the array holds that single item.
You need to dereference the array instead.
my #repos = #{$dec_res->{repositories}};
perlref explains more about references in Perl.

Referenced array dropped in size to one element

Dear fellow perl programmers,
I wanted to access to this array
my #vsrvAttribs = qw(
Code
Description
vsrv_id
vsrv_name
vsrv_vcpu_no
vsrv_vmem_size
vsrv_vdspace_alloc
vsrv_mgmt_ip
vsrv_os
vsrv_virt_platf
vsrv_owner
vsrv_contact
vsrv_state
);
through a variable composed of a variable and a string suffix, which of course led to the error message like this
Can't use string ("#vsrvAttribs") as an ARRAY ref while "strict refs" in use at cmdbuild.pl line 262.`
Therefore I decided to get the reference to the array through a hash
my %attribs = ( vsrv => #vsrvAttribs );
And this is the code where I need to get the content of aforementioned array
foreach my $classTypeKey (keys %classTypes) {
my #attribs = $attribs{$classTypeKey};
print Dumper(\#attribs);
}
It seems I can get the reference to the array #vsrvAttribs, but when I checked the content of the array with Dumper , the array have got only one element
$VAR1 = [
'Code'
];
Do you have any idea where could be the problem?
How do you store the array in a hash and access it later?
You need to store your array by reference like this:
my %attribs = ( vsrv => \#vsrvAttribs );
Note the backslash before the # sigil. This tells perl that you want a reference to the array.
Then when access the array stored in $attribs{vsrv} you need to treat it as a reference instead of as an array. You'll do something like this:
foreach my $classTypeKey (keys %classTypes) {
# make a copy of the array by dereferencing
my #attribs = #{ $attribs{$classTypeKey} };
# OR just use the array reference if profiling shows performance issues:
my $attribs = $attribs{$classTypeKey}
# these will show the same thing if you haven't done anything to #attribs
# in the interim
print Dumper(\#attribs);
print Dumper($attribs);
}
Why did you only get one value and where did the rest of the array go?
Your missing values from #vsrvAttribs weren't lost they were assigned as keys and values to %attribs itself. Try adding the following just after you made your assignment and you'll see it for yourself:
my %attribs = ( vsrv => #vsrvAttribs );
print Dumper(\%attribs);
You'll see output like this:
$VAR1 = {
'vsrv_contact' => 'vsrv_state',
'vsrv_virt_platf' => 'vsrv_owner',
'vsrv' => 'Code',
'vsrv_name' => 'vsrv_vcpu_no',
'vsrv_mgmt_ip' => 'vsrv_os',
'Description' => 'vsrv_id',
'vsrv_vmem_size' => 'vsrv_vdspace_alloc'
};
This is because perl interpreted your assignment by expanding the contents #vsrvAttribs as multiple arguments to the list literal ():
my %attribs = (
# your key => first value from array
vsrv => 'Code',
# subsequent values of the array
Description => 'vsrv_id',
vsrv_name => 'vsrv_vcpu_no',
vsrv_vmem_size => 'vsrv_vdspace_alloc',
vsrv_mgmt_ip => 'vsrv_os',
vsrv_virt_platf => 'vsrv_owner',
vsrv_contact => 'vsrv_state',
);
This is legal in perl and there are reasons where you might want to do this but in your case it wasn't what you wanted.
Incidentally, you would have been warned that perl was doing something that you might not want if you had an even number of elements in your array. Your 13 elements plush the hash key "vsrv" makes 14 which is even. Perl will take any list with an even number of elements and happily make it into a hash. If your array had another element for 15 elements total with the hash key you would get a warning: Odd number of elements in hash assignment at foo.pl line 28.
See "Making References" and "Using References" in perldoc perlreftut for more information.
If you use a bare array in a hash definition like
my %attribs = ( vsrv => #vsrv_attribs )
the array is expanded and used as key/value pairs, so you will get
my %attribs = (
vsrv => 'Code',
Description => 'vsrv_id',
vsrv_name => 'vsrv_vcpu_no',
vsrv_vmem_size => 'vsrv_vdspace_alloc',
...
)
The value of a Perl hash element can only be a scalar value, so if you want an array of values there you have to take a reference, as shown below
It is also a bad idea to use capitals in Perl identifiers for anything except globals, such as package names. Local names are conventional lower-case alphanumeric plus underscore, so $class_type_key instead of $classTypeKey
use strict;
use warnings;
use Data::Dumper;
my #vsrv_attribs = qw(
Code
Description
vsrv_id
vsrv_name
vsrv_vcpu_no
vsrv_vmem_size
vsrv_vdspace_alloc
vsrv_mgmt_ip
vsrv_os
vsrv_virt_platf
vsrv_owner
vsrv_contact
vsrv_state
);
my %attribs = (
vsrc => \#vsrv_attribs,
);
for my $class_type_key (keys %attribs) {
my $attribs = $attribs{$class_type_key};
print Dumper $attribs;
}
output
$VAR1 = [
'Code',
'Description',
'vsrv_id',
'vsrv_name',
'vsrv_vcpu_no',
'vsrv_vmem_size',
'vsrv_vdspace_alloc',
'vsrv_mgmt_ip',
'vsrv_os',
'vsrv_virt_platf',
'vsrv_owner',
'vsrv_contact',
'vsrv_state'
];

How do I store a hash which is inside an array element?

The background
I got a Perl module which utilizes an array for its input/output parameters, like this:
Execute({inputfile => $req->{modules}.'filename', param => \#xchange});
Inside the module a hash is build and returned via reference
$param[0] = \%values;
This is all fine and good (I think) and print Dumper #xchange[0]; will output my desired content as
$VAR1 = { '33' => 'Title1', '53' => 'Title2', '21' => 'Title3' };
The goal
I would like to loop over the content and print the key/value pairs one by one, for example like this
%testhash = ('33' => 'Test1', '53' => 'Test2', '21' => 'Test3' );
foreach $key (keys %testhash) {
print "LOOP: $key, value=$testhash{$key}\n";
}
This loop does work as intended and dumping my testhash via print Dumper \%testhash; outputs the same as the array element above
$VAR1 = { '33' => 'Test1', '53' => 'Test2', '21' => 'Test3' };
The problem
The trouble now seems to be that although both structures appear to be of the same kind I cant get my head arround, how to properly access the returned hash which is stored inside #xchange[0].
I did try %realhash = #xchange[0]; and %realhash = \#xchange[0];, but then print Dumper \%realhash; will output $VAR1 = { 'HASH(0xa7b29c0)' => undef }; or $VAR1 = { 'REF(0xa7833a0)' => undef }; respectively.
So I either need a way to get the content of #xchange[0] inside a clean new hash or a way to foreach loop over the hash inside the #xchange[0] element.
I guess I am getting screwed by the whole hash reference concept, but I am at a loss here and can't think of another way to google for it.
$xchange[0] is a hash reference. Use the dereference operator %{...} to access it as a hash.
%realhash = %{$xchange[0]};
#xchange[0] is a scalar value, it contains the reference to a hash. When you assign it to a hash
%hash = #xchange[0];
The reference is stringified into something like HASH(0xa7b29c0), and you get the warnings
Scalar value #xchange[0] better written as $xchange[0] at ...
Reference found where even-sized list expected at ...
That is to say, you get these warnings, unless you have been so foolish as to not turn warnings on with use warnings.
The first one means what it says. The second one means that the list you assign to a hash should have an even number of elements: one value for every key. You only passed a "key" (something that Perl took as a key). The value then becomes undef, as noted in your Data::Dumper output:
$VAR1 = { 'HASH(0xa7b29c0)' => undef }
What you need to do is dereference the reference.
my $href = $xchange[0];
my %hash = %$href; # using a transition variable
my %hash2 = %{ $xchange[0] } # using support curly braces
perldsc
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
my %testhash = ('33' => 'Test1', '53' => 'Test2', '21' => 'Test3' );
# Add hash as first element of xchange AoH
my #xchange = \%testhash;
# Derefererence 1st element of AoH as a hash
my %realhash = %{ $xchange[0] };
# Dump new hash
print Dumper(\%realhash);
__END__
$VAR1 = {
'21' => 'Test3',
'33' => 'Test1',
'53' => 'Test2'
};

Resources