Merge Perl hashes into one array and loop through it

Merge Perl hashes into one array and loop through it - arrays

I'm creating a Perl plugin for cPanel which has to get all domains in the account of a user and display it in a HTML select field. Originally, I'm a PHP developer, so I'm having a hard time understanding some of the logic of Perl. I do know that cPanel plugins can also be written in PHP, but for this plugin I'm limited to Perl.
This is how I get the data from cPanel:
my #user_domains = $cpliveapi->uapi('DomainInfo', 'list_domains');
#user_domains = $user_domains[0]{cpanelresult}{result}{data};
This is what it looks like using print Dumper #user_domains:
$VAR1 = {
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => 'main-domain.com',
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com']
};
I want the data to look like this (thanks #simbabque):
#domains = qw(domain1.com domain2.com domain3.com main-domain.com parked1.com parked2.com parked3.com);
So, I want to exclude sub_domains and merge the others in 1 single-dimensional array so I can loop through them with a single loop. I've struggled the past few days with what sounds like an extremely simple task, but I just can't wrap my head around it.

You need something like this
If you find you have a copy of List::Util that doesn't include uniq then you can either upgrade the module or use this definition
sub uniq {
my %seen;
grep { not $seen{$_}++ } #_;
}
From your dump, the uapi call is returning a reference to a hash. That goes into $cp_response and then drilling down into the structure fetches the data hash reference into $data
delete removes the subdomain information from the hash.
The lists you want are the values of the hash to which $data refers, so I extract those. Those values are references to arrays of strings if there is more than one domain in the list, or simple strings if there is only one
The map converts all the domain names to a single list by dereferencing array references, or passing strings straight through. That is what the ref() ? #$_ : $_ is doing. FInally uniq removes multiple occurrences of the same name
use List::Util 'uniq';
my $cp_response = $cpliveapi->uapi('DomainInfo', 'list_domains');
my $data = $cp_response->{cpanelresult}{result}{data};
delete $data->{sub_domains};
my #domains = uniq map { ref() ? #$_ : $_ } values %$data;
output
parked1.com
parked2.com
parked3.com
domain1.com
domain2.com
domain3.com
main-domain.com

That isn't doing what you think it' doing. {} is the anonymous hash constructor, so you're making a 1 element array, with a hash in it.
You probably want:
use Data::Dumper;
my %user_domains = (
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => 'main-domain.com',
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com'],
);
print Dumper \%user_domains;
And at which point the 'other' array elements you can iterate through either a double loop:
foreach my $key ( keys %user_domains ) {
if ( not ref $user_domains{$key} ) {
print $user_domains{$key},"\n";
next;
}
foreach my $domain ( #{$user_domains{$key}} ) {
print $domain,"\n";
}
}
Or if you really want to 'flatten' your hash:
my #flatten = map { ref $_ : #$_ ? $_ } values %user_domains;
print Dumper \#flatten;
(You need the ref test, because without it, the non-array main-domain won't work properly)
So for the sake of consistency, you might be better off with:
my %user_domains = (
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => ['main-domain.com'],
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com'],
);

Related

Perl ... create horizontal children of a %hash using #array items

I've been banging my head on this awhile and searched many ways. I'm sure this is going to boil down to being really basic.
I have data in an #array that I want to move to a tree in a %hash.
This might be something more appropriate to JSON? But I haven't delved into it before and I don't need to save out/restore this information.
Desire:
Create a dependent tree of USB devices that can nest under each other that can track the end point (deviceC) through a hub (deviceB) and finally the root (deviceA).
Example:
Simplified (I hope ... this isn't from the actual longer script):
I want to convert an array in this format:
my #array = ['deviceA','deviceB','deviceC'];
to multidimensional hashes equal to:
my %hash = ('deviceA' => { 'deviceB' => { 'deviceC' => '' } } )
that would dump like:
$VAR1 = {
'deviceA' => {
'deviceB' => {
'deviceC' => ''
}
}
};
For just looking at a single device this isn't necessary, but I'm building out an IOMMU -> PCI Device -> USB map that contains many devices.
NOTES:
I'm trying to avoid installing CPAN modules so the script is to similar systems (Proxmox VE)
The last device (deviceC above) has no children
value '' is fine
undef would probably work
mixing the types would work but I need to know how to set that
I will never need to modify or manipulate the hash once created
I don't know the right way to recurse the #array to populate the %hash children. * I want the data horizontal for each USB device
I'd switch to an Object/package but each device can have a different set of children (or none) making it infeasible to know Object names
Some USB devices have no children (root hubs) ... similar to %hash = ('deviceA' => '')
Some have 1 child that is the final device ... similar to %hash = ('deviceA' => { 'deviceB' =>'' } )
Some have multiple steps between the root via additional hub(s) ... similar to %hash = ('deviceA' => { 'deviceB' => { 'deviceC' => '' } } ) or more
Starting point :
This is basic and incomplete but will run:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
# data in from parsing usb device path:
my #array = ['deviceA','deviceB','deviceC'];
# needs to be converted to:
my %hash = ('deviceA' => { 'deviceB' => { 'deviceC' => '' } } );
print "\n\%hash:\n" . Dumper \%hash;
Pseudo-code
This section is NOT working code in any form. I'm just trying to make a note of what I'm thinking. I know the format is wrong, I've tried multiple ways to create this and I'd look even dumber showing all of my attempts :)
I'm very new to refs and I'm not going to try and get that right here. The idea below is:
For each item in #array:
Create a way (either a ref or a copy of the current hash) that can be used next iteration to place the next child
Attach item as a child of the previous iteration with an empty value (that can be appended if there is further iteration)
my #array = ['deviceA','deviceB','deviceC'];
my %hash = {};
my %trackref;
for (#array) {
%trackref = %hash; # a copy of the existing that won't change when %hash updates
$hash{last_child} ::append_child:: $_;
}

You're actually pretty close, but it seems that you need to understand references a bit better. perldoc perlref is probably a good starting point to understand references.
A few mistakes in your code, before looking at the solution:
my #array = [ ... ];: [] creates an arrayref, not an array, which means that #array actually stores a single scalar item: a reference to another array. Use () to initialize an array: my #array = ( ... );.
my %hash = {};: similarly, {} creates a hashref, not a hash. Which means that this lines stores a single hashref in %hash, which will cause this warning: Reference found where even-sized list expected at hash.pl line (because a hash contains keys-values and you only provided a key). Use () for a simple (ie, not a hashref) hash. In this case however, you don't need to initialize %hash: my %hash; and my %hash = () do the same thing (that is, create an empty hash).
%trackref = %hash; copies the content of %hash in %trackref. Which means that, contrary to what the name "trackref" implies, %trackref doesn't contain a reference to anything, but a copy of %hash. Use \%hash to create a reference to %hash.
Note that if you already have a hashref, then assigning it to another variables copies the reference. For instance, if you do my $hash1 = {}; my $hash2 = $hash1, then both $hash1 and $hash2 reference the same hash.
So, fixing those issues in your attempt, we get:
my #array = ('deviceA','deviceB','deviceC');
my %hash;
my $trackref = \%hash;
for my $usb (#array) {
$trackref->{$usb} = {};
$trackref = $trackref->{$usb};
}
print Dumper \%hash;
Which outputs:
$VAR1 = {
'deviceA' => {
'deviceB' => {
'deviceC' => {}
}
}
};
The main change that I did was to replace your $hash{last_child} ::append_child:: $_; by $trackref->{$_} = {};. But the idea remains the same: Attach item as a child of the previous iteration with an empty value to reuse your words.
To help you understand the code a bit better, let's see what happens in the loop step by step:
Before the first iteration, %hash is empty and $trackref references %hash.
In the first iteration, we put deviceA => {} in $trackref (or, more pedantically, we associate {} with the key deviceA in $trackref). Since $trackref references %hash, this puts deviceA => {} in %hash. Then, we store in $trackref this new {} that we just created, which means that $trackref now references $hash{deviceA}.
In the second iteration, we put deviceB => {} in $trackref. $trackeref references $hash{deviceA} (which we created in the previous iteration), which means that %hash is now (deviceA => { deviceB => {} }). We then store in $trackref the new {}.
And so on...
You'll note that in the innermost hash, {} is associated to the key deviceC. When iterating of the hash, you can thus know if you are at the end by doing something like if (%$hash) (instead of just if ($hash) if this last {} would have been undef or ''). Let me know if that's an issue: we can add a bit of code to convert this {} into undef (alternatively, you can do it yourself, it will be a good exercise to get used to references)
Minor remark: #array and %hash are poor array and hash names, because the # already indicates an array, and % already indicates a hash. It's possible that you used those names just for this small example for your question, in which case, no problem. However, if you use those names in your actual code, consider changing them for something more explicit... #usb_devices and %usb_devices_tree maybe?

hash with array of hashes in perl

I know this topic has been covered but other posts usually has static hashes and arrays and the don't show how to load the hashes and arrays.
I am trying to process a music library. I have a hash with album name and an array of hashes that contain track no, song title and artist. This is loaded from an XML file generated by iTunes.
the pared down code follows:
use strict;
use warnings;
use utf8;
use feature 'unicode_strings';
use feature qw( say );
use XML::LibXML qw( );
use URI::Escape;
my $source = "Library.xml";
binmode STDOUT, ":utf8";
# load the xml doc
my $doc = XML::LibXML->load_xml( location => $source )
or warn $! ? "Error loading XML file: $source $!"
: "Exit status $?";
my %hCompilations;
my %track;
# extract xml fields
my #album_nodes = $doc->findnodes('/plist/dict/dict/dict');
for my $album_idx (0..$#album_nodes) {
my $album_node = $album_nodes[$album_idx];
my $trackName = $album_node->findvalue('key[text()="Name"]/following-sibling::*[position()=1]');
my $artist = $album_node->findvalue('key[text()="Artist"]/following-sibling::*[position()=1]');
my $album = $album_node->findvalue('key[text()="Album"]/following-sibling::*[position()=1]');
my $compilation = $album_node->exists('key[text()="Compilation"]');
# I only want compilations
if( ! $compilation ) { next; }
%track = (
trackName => $trackName,
trackArtist => $artist,
);
push #{$hCompilations{$album}} , %track;
}
#loop through each album access the album name field and get what should be the array of tracks
foreach my $albumName ( sort keys %hCompilations ) {
print "$albumName\n";
my #trackRecs = #{$hCompilations{$albumName}};
# how do I loop through the trackrecs?
}

This line isn't doing what you think it is:
push #{$hCompilations{$album}} , %track;
This will unwrap your hash into a list of key/value pairs and will push each of those individually onto your array. What you want is to push a reference to your hash onto the array.
You could do that by creating a new copy of the hash:
push #{$hCompilations{$album}} , { %track };
But that takes an unnecessary copy of the hash - which will have an effect on your program's performance. A better idea is to move the declaration of that variable (my %track) inside the loop (so you get a new variable each time round the loop) and then just push a reference to the hash onto your array.
push #{$hCompilations{$album}} , \%track;
You already have the code to get the array of tracks, so iterating across that array is simple.
my #trackRecs = #{$hCompilations{$albumName}};
foreach my $track (#trackRecs) {
print "$track->{trackName}/$track->{trackArtist}\n";
}
Note that you don't need the intermediate array:
foreach my $track (#{$hCompilations{$albumName}}) {
print "$track->{trackName}/$track->{trackArtist}\n";
}

first of all you want to push the hash as a single element, so instead of
push #{$hCompilations{$album}} , %track;
use
push #{$hCompilations{$album}} , {%track};
in the loop you can access the tracks with:
foreach my $albumName ( sort keys %hCompilations ) {
print "$albumName\n";
my #trackRecs = #{$hCompilations{$albumName}};
# how do I loop through the trackrecs?
foreach my $track (#trackRecs) {
print $track->{trackName} . "/" . $track->{trackArtist} . "\n";
}
}

Identifying elements in one array of hashes that are not in another array of hashes (perl)

I'm a novice perl programmer trying to identify which elements are in one array of hashes but not in another. I'm trying to search through the "new" array, identifying the id, title, and created elements that don't exist from the "old" array.
I believe I have it working with a set of basic for() loops, but I'd like to do it more efficiently. This only came after having tried to use grep() and failed.
These arrays are built from a database as such:
use DBI;
use strict;
use Data::Dumper;
use Array::Utils qw(:all);
sub db_connect_new();
sub db_disconnect_new($);
sub db_connect_old();
sub db_disconnect_old($);
my $dbh_old = db_connect_old();
my $dbh_new = db_connect_new();
# get complete list of articles on each host first (Joomla! system)
my $sql_old = "select id,title,created from mos_content;";
my $sql_new = "select id,title,created from xugc_content;";
my $sth_old = $dbh_old->prepare($sql_old);
my $sth_new = $dbh_new->prepare($sql_new);
$sth_old->execute();
$sth_new->execute();
my $ref_old;
my $ref_new;
while ($ref_old = $sth_old->fetchrow_hashref()) {
push #rv_old, $ref_old;
}
while ($ref_new = $sth_new->fetchrow_hashref()) {
push #rv_new, $ref_new;
}
my #seen = ();
my #notseen = ();
foreach my $i (#rv_old) {
my $id = $i->{id};
my $title = $i->{title};
my $created = $i->{created};
my $seen = 0;
foreach my $j (#rv_new) {
if ($i->{id} == $j->{id}) {
push #seen, $i;
$seen = 1;
}
}
if ($seen == 0) {
print "$i->{id},$i->{title},$i->{state},$i->{catid},$i->{created}\n";
push #notseen, $i;
}
}
The arrays look like this when using Dumper(#rv_old) to print them:
$VAR1 = {
'title' => 'Legal Notice',
'created' => '2004-10-07 00:17:45',
'id' => 14
};
$VAR2 = {
'created' => '2004-11-15 16:04:06',
'id' => 86096,
'title' => 'IRC'
};
$VAR3 = {
'id' => 16,
'created' => '2004-10-07 16:15:29',
'title' => 'About'
};
I tried to use grep() using array references, but I don't think I understand arrays, hashes, and references well enough to do it properly. My failed grep() attempts are below. I'd appreciate any ideas of how to do this properly.
I believe the problem with this is that I don't know how to reference the id field in the second array of hashes. Most of the examples using grep() that I've seen are to just look through an entire array, like you would with regular grep(1). I need to iterate through one array, checking each of the values from the id field with the id field from another array.
my $rv_old_ref = \#rv_old;
my $rv_new_ref = \#rv_new;
for my $i ( 0 .. $#rv_old) {
my $match = grep { $rv_new_ref->$_ == $rv_old_ref->$_ } #rv_new;
push #notseen, $match if !$match;
}
I also tried variations on the grep() above:
1) if (($p) = grep ($hash_ref->{id}, #rv_old)) {
2) if ($hash_ref->{id} ~~ #rv_old) {

There are a number of libraries that compare arrays. However, your comparison involves complex data structures (the arrays have hashrefs as elements) and this at least complicates use of all modules that I am aware of.
So here is a way to do it by hand. I use the shown array and its copy with one value changed.
use warnings;
use strict;
use feature 'say';
use List::Util qw(none); # in List::MoreUtils with older Perls
use Data::Dump qw(dd pp);
sub hr_eq {
my ($e1, $e2) = #_;
return 0 if scalar keys %$e1 != scalar keys %$e2;
foreach my $k1 (keys %$e1) {
return 0 if !exists($e2->{$k1}) or $e1->{$k1} ne $e2->{$k1};
}
return 1
}
my #a1 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:04:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #a2 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:xxx:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #only_in_two = grep {
my $e2 = $_;
none { hr_eq($e2, $_) } #a1;
} #a2;
dd \#only_in_two;
This correctly identifies the element in #a2 that doesn't exist in #a1 (with xxx in timestamp).
Notes
This finds what elements of one array are not in another, not the full difference between arrays. It is what the question specifically asks for.
The comparison relies on details of your data structure (hashref); there's no escaping that, unless you want to reach for more comprehensive libraries (like Test::More).
This uses string comparison, ne, even for numbers and timestamps. See whether it makes sense for your real data to use more appropriate comparisons for particular elements.
Searching through a whole list for each element of a list is an O(N*M) algorithm. Solutions of such (quadratic) complexity are usable as long as data isn't too big; however, once data gets big enough so that size increases have clear effects they break down rapidly (slow down to the point of being useless). Time it to get a feel for this in your case.
An O(N+M) approach exists here, utilizing hashes, shown in ikegami answer. This is much better algorithmically, once the data is large enough for it to show. However, as your array carries complex data structure (hashrefs) a bit of work is needed to come up with a working program, specially as we don't know data. But if your data is sizable then you surely want to implement this.
Some comments on filtering.
The question correctly observes that for each element of an array, as it's processed in grep, the whole other array need be checked.
This is done in the body of grep using none from List::Util. It returns true if the code in its block evaluates false for all elements of the list; thus, if "none" of the elements satisfy that code. This is the heart of the requirement: an element must not be found in the other array.
Care is needed with the default $_ variable, since it is used by both grep and none.
In grep's block $_ aliases the currently processed element of the list, as grep goes through them one by one; we save it into a named variable ($e2). Then none comes along and in its block "takes possession" of $_, assigning elements of #a1 to it as it processes them. The current element of #a2 is also available since we have copied it into $e2.
The test performed in none is pulled into a a subroutine, which I call hr_eq to emphasize that it is specifically for equality comparison of (elements in) hashrefs.
It is in this sub where the details can be tweaked. Firstly, instead of bluntly using ne for values for each key, you can add custom comparisons for particular keys (numbers must use ==, etc). Then, if your data structures change this is where you'd adjust specifics.

You could use grep.
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !grep { $_->{id} == $new_row->{id} } #old_rows;
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !grep { $_->{id} == $old_row->{id} } #new_rows;
}
But that's an O(N*M) solution, while there exists an O(N+M) solution that would be far faster.
my %old_keys; ++$old_keys{ $_->{id} } for #old_rows;
my %new_keys; ++$new_keys{ $_->{id} } for #new_rows;
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !$old_keys{$new_row->{id}};
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !$new_keys{$old_row->{id}};
}
If both of your database connections are to the same database, this can be done far more efficiently within the database itself.
Create a temporary table with three fields, id, old_count (DEFAULT 0) and new_count (DEFAULT 0).
INSERT OR UPDATE from the old table into the temporary table, incrementing old_count in the process.
INSERT OR UPDATE from the new table into the temporary table, incrementing new_count in the process.
SELECT the rows of the temporary table which have 0 for old_count or 0 for new_count.

select id,title,created from mos_content
LEFT JOIN xugc_content USING(id)
WHERE xugc_content.id IS NULL;
Gives you the rows that are in mos_content but not in xugc_content.
That's even shorter than the Perl code.

Accessing returned values as an array

I have simple XML that I want to read in Perl and make hash containing all read keys.
Consider this code:
my $content = $xml->XMLin("filesmap.xml")->{Item};
my %files = map { $_->{Path} => 1 } #$content;
This snippet works great when XML file contains many Item tags. Then $content is a reference to array. But when there is only one Item I get an error when dereferencing as array. My assumption is that $content is a reference to the scalar, not array.
What is the practice to make sure I get array of values read from XML?

What you need is to not use XML::Simple and then it's really trivial. My favourite for fairly straightforward XML samples is XML::Twig
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'filesmap.xml' );
my #files = map { $_ -> trimmed_text } $twig -> get_xpath ( '//Path' );
With a more detailed XML sample (and desired result) I'll be able to give you a better answer.
Part of the problem with XML::Simple is it tries to turn an XML data structure into perl data structures, and - because hashes are key-values and unordered but arrays are ordered, it has to guess. And sometimes it does it wrong, and other times it's inconsistent.
If you want it to be consistent, you can set:
my $xml = XMLin( "filesmap.xml", ForceArray => 1, KeyAttr => [], ForceContent => 1 );
But really - XML::Simple is just a route to pain. Don't use it. If you don't like XML::Twig, try XML::LibXML instead.

What I would say you need is a flatten-ing step.
my %files
= map { $_->{Path} => 1 }
# flatten...
map { ref() eq 'ARRAY' ? #$_ : $_ }
$xml->XMLin("filesmap.xml")->{Item}
;

You can do a check and force the return into an array reference if necessary:
my $content = $xml->XMLin("filesmap.xml")->{Item};
$content = ref $content eq 'ARRAY'
? $content
: [$content];

Hash in array in a hash

I'm trying to identify the output of Data::Dumper, it produces the output below when used on a hash in some code I'm trying to modify:
print Dumper(\%unholy_horror);
$VAR1 = {
'stream_details' => [
{
'file_path' => '../../../../tools/test_data/',
'test_file' => 'test_file_name'
}
]
};
Is this a hash inside an array inside a hash? If not what is it? and what is the syntax to access the "file path" and "test_file" keys, and their values.
I want to iterate over that inner hash like below, how would I do that?
while ( ($key, $value) = each %hash )
{
print "key: $key, value: $hash{$key}\n";
}

You're correct. It's a hash in an array in a hash.
my %top;
$top{'stream_details'}[0]{'file_path'} = '../../../../tools/test_data/';
$top{'stream_details'}[0]{'test_file'} = 'test_file_name';
print Dumper \%top;
You can access the elements as above, or iterate with 3 levels of for loop - assuming you want to iterate the whole thing.
foreach my $topkey ( keys %top ) {
print "$topkey\n";
foreach my $element ( #{$top{$topkey}} ) {
foreach my $subkey ( keys %$element ) {
print "$subkey = ",$element->{$subkey},"\n";
}
}
}
I would add - sometimes you get some quite odd seeming hash topologies as a result of parsing XML or JSON. It may be worth looking to see if that's what's happening, because 'working' with the parsed object might be easier.
The above might be the result of:
#JSON
{"stream_details":[{"file_path":"../../../../tools/test_data/","test_file":"test_file_name"}]}
Or something similar from an API. (I think it's unlikely to be XML, since XML doesn't implicitly have 'arrays' in the way JSON does).