Hash and array with Perl - arrays

I have a .plist file that can be seen by Data::Dumper into this form
$VAR1 = {
'SPOOLS' => [
'SPOOL1',
'SPOOL2',
'SPOOL3'
],
'path' => '/usr/local/thanks/for/your/help',
'contentMatch' => [
{
'priority' => '1',
'match' => '*.hello'
},
{
'match' => '*.guys',
'priority' => '2'
}
]
};
To access my .plist file in Perl, I use this code:
my $locPlist = "conf.plist";
my $configdict = NSDictionary->dictionaryWithContentsOfFile_($locPlist);
my $plistref = Foundation::perlRefFromObjectRef($configdict);
my %plist = %{$plistref};
I know how to access the "path" with "$plist{path}" and the SPOOLS array with "$plist{SPOOLS}[0]" but:
How can I get the of SPOOLS into an array, something like my #array = $plist{SPOOLS} and how can I also get the content of "contentMatch" ?
Thanks for your help!
EDIT:
Thank you so much for your help, I was able to access the data. But is there a cleaner way to get back the content from contentMatch in a hash and access directly instead of doing that:
my $number_matches = scalar #{ $plistref->{contentMatch} } ;
my $a = 0;
my %events;
foreach ( #{ $plistref->{contentMatch} } ) {
$events{match}[$a] = $_->{match};
$events{priority}[$a] = $_->{priority};
$a = $a+1;
}
Best,
Tim.

my #spools = #{ $plistref->{SPOOLS} };
my #content_match = #{ $plistref->{contentMatch} };
Or, to dig a little deeper into the contentMatch example:
foreach ( #{ $plistref->{contentMatch} } ) {
say "match: $_->{match}";
say "priority: $_->{priority}";
}
Plenty more information about dealing with complex data structures in Perl in the Perl Data Structures Cookbook.
Update: You ask for help accessing the contentMatch data in another way. I'm not really sure what you're asking - all your sample code seems to do is to transpose an array of hashes into a hash of arrays. And I think that the array of hashes is a much better representation of of the data.
But it's your code and your data. So I think this is how I'd write what you're trying to do.
my %events;
foreach ( #{ $plistref->{contentMatch} } ) {
push #{ $events{match} }, $_->{match};
push #{ $events{priority}, $_->{priority};
}
I'd be far happier keeping it as an array of hashes, and I've already given you that code.
my #content_match = #{ $plistref->{contentMatch} };
I think that understanding how complex data structures work in Perl is crucial to becoming a competent Perl programmer. I highly recommend getting to grips with the examples in perldsc.

Related

Identifying elements in one array of hashes that are not in another array of hashes (perl)

I'm a novice perl programmer trying to identify which elements are in one array of hashes but not in another. I'm trying to search through the "new" array, identifying the id, title, and created elements that don't exist from the "old" array.
I believe I have it working with a set of basic for() loops, but I'd like to do it more efficiently. This only came after having tried to use grep() and failed.
These arrays are built from a database as such:
use DBI;
use strict;
use Data::Dumper;
use Array::Utils qw(:all);
sub db_connect_new();
sub db_disconnect_new($);
sub db_connect_old();
sub db_disconnect_old($);
my $dbh_old = db_connect_old();
my $dbh_new = db_connect_new();
# get complete list of articles on each host first (Joomla! system)
my $sql_old = "select id,title,created from mos_content;";
my $sql_new = "select id,title,created from xugc_content;";
my $sth_old = $dbh_old->prepare($sql_old);
my $sth_new = $dbh_new->prepare($sql_new);
$sth_old->execute();
$sth_new->execute();
my $ref_old;
my $ref_new;
while ($ref_old = $sth_old->fetchrow_hashref()) {
push #rv_old, $ref_old;
}
while ($ref_new = $sth_new->fetchrow_hashref()) {
push #rv_new, $ref_new;
}
my #seen = ();
my #notseen = ();
foreach my $i (#rv_old) {
my $id = $i->{id};
my $title = $i->{title};
my $created = $i->{created};
my $seen = 0;
foreach my $j (#rv_new) {
if ($i->{id} == $j->{id}) {
push #seen, $i;
$seen = 1;
}
}
if ($seen == 0) {
print "$i->{id},$i->{title},$i->{state},$i->{catid},$i->{created}\n";
push #notseen, $i;
}
}
The arrays look like this when using Dumper(#rv_old) to print them:
$VAR1 = {
'title' => 'Legal Notice',
'created' => '2004-10-07 00:17:45',
'id' => 14
};
$VAR2 = {
'created' => '2004-11-15 16:04:06',
'id' => 86096,
'title' => 'IRC'
};
$VAR3 = {
'id' => 16,
'created' => '2004-10-07 16:15:29',
'title' => 'About'
};
I tried to use grep() using array references, but I don't think I understand arrays, hashes, and references well enough to do it properly. My failed grep() attempts are below. I'd appreciate any ideas of how to do this properly.
I believe the problem with this is that I don't know how to reference the id field in the second array of hashes. Most of the examples using grep() that I've seen are to just look through an entire array, like you would with regular grep(1). I need to iterate through one array, checking each of the values from the id field with the id field from another array.
my $rv_old_ref = \#rv_old;
my $rv_new_ref = \#rv_new;
for my $i ( 0 .. $#rv_old) {
my $match = grep { $rv_new_ref->$_ == $rv_old_ref->$_ } #rv_new;
push #notseen, $match if !$match;
}
I also tried variations on the grep() above:
1) if (($p) = grep ($hash_ref->{id}, #rv_old)) {
2) if ($hash_ref->{id} ~~ #rv_old) {
There are a number of libraries that compare arrays. However, your comparison involves complex data structures (the arrays have hashrefs as elements) and this at least complicates use of all modules that I am aware of.
So here is a way to do it by hand. I use the shown array and its copy with one value changed.
use warnings;
use strict;
use feature 'say';
use List::Util qw(none); # in List::MoreUtils with older Perls
use Data::Dump qw(dd pp);
sub hr_eq {
my ($e1, $e2) = #_;
return 0 if scalar keys %$e1 != scalar keys %$e2;
foreach my $k1 (keys %$e1) {
return 0 if !exists($e2->{$k1}) or $e1->{$k1} ne $e2->{$k1};
}
return 1
}
my #a1 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:04:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #a2 = (
{ 'title' => 'Legal Notice', 'created' => '2004-10-07 00:17:45', 'id' => 14 },
{ 'created' => '2004-11-15 16:xxx:06', 'id' => 86096, 'title' => 'IRC' },
{ 'id' => 16, 'created' => '2004-10-07 16:15:29', 'title' => 'About' }
);
my #only_in_two = grep {
my $e2 = $_;
none { hr_eq($e2, $_) } #a1;
} #a2;
dd \#only_in_two;
This correctly identifies the element in #a2 that doesn't exist in #a1 (with xxx in timestamp).
Notes
This finds what elements of one array are not in another, not the full difference between arrays. It is what the question specifically asks for.
The comparison relies on details of your data structure (hashref); there's no escaping that, unless you want to reach for more comprehensive libraries (like Test::More).
This uses string comparison, ne, even for numbers and timestamps. See whether it makes sense for your real data to use more appropriate comparisons for particular elements.
Searching through a whole list for each element of a list is an O(N*M) algorithm. Solutions of such (quadratic) complexity are usable as long as data isn't too big; however, once data gets big enough so that size increases have clear effects they break down rapidly (slow down to the point of being useless). Time it to get a feel for this in your case.
An O(N+M) approach exists here, utilizing hashes, shown in ikegami answer. This is much better algorithmically, once the data is large enough for it to show. However, as your array carries complex data structure (hashrefs) a bit of work is needed to come up with a working program, specially as we don't know data. But if your data is sizable then you surely want to implement this.
Some comments on filtering.
The question correctly observes that for each element of an array, as it's processed in grep, the whole other array need be checked.
This is done in the body of grep using none from List::Util. It returns true if the code in its block evaluates false for all elements of the list; thus, if "none" of the elements satisfy that code. This is the heart of the requirement: an element must not be found in the other array.
Care is needed with the default $_ variable, since it is used by both grep and none.
In grep's block $_ aliases the currently processed element of the list, as grep goes through them one by one; we save it into a named variable ($e2). Then none comes along and in its block "takes possession" of $_, assigning elements of #a1 to it as it processes them. The current element of #a2 is also available since we have copied it into $e2.
The test performed in none is pulled into a a subroutine, which I call hr_eq to emphasize that it is specifically for equality comparison of (elements in) hashrefs.
It is in this sub where the details can be tweaked. Firstly, instead of bluntly using ne for values for each key, you can add custom comparisons for particular keys (numbers must use ==, etc). Then, if your data structures change this is where you'd adjust specifics.
You could use grep.
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !grep { $_->{id} == $new_row->{id} } #old_rows;
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !grep { $_->{id} == $old_row->{id} } #new_rows;
}
But that's an O(N*M) solution, while there exists an O(N+M) solution that would be far faster.
my %old_keys; ++$old_keys{ $_->{id} } for #old_rows;
my %new_keys; ++$new_keys{ $_->{id} } for #new_rows;
for my $new_row (#new_rows) {
say "$new_row->{id} not in old"
if !$old_keys{$new_row->{id}};
}
for my $old_row (#old_rows) {
say "$old_row->{id} not in new"
if !$new_keys{$old_row->{id}};
}
If both of your database connections are to the same database, this can be done far more efficiently within the database itself.
Create a temporary table with three fields, id, old_count (DEFAULT 0) and new_count (DEFAULT 0).
INSERT OR UPDATE from the old table into the temporary table, incrementing old_count in the process.
INSERT OR UPDATE from the new table into the temporary table, incrementing new_count in the process.
SELECT the rows of the temporary table which have 0 for old_count or 0 for new_count.
select id,title,created from mos_content
LEFT JOIN xugc_content USING(id)
WHERE xugc_content.id IS NULL;
Gives you the rows that are in mos_content but not in xugc_content.
That's even shorter than the Perl code.

How to compare two hashes of different levels in perl without using sub routines or modules?

my arrays are
my #arr = ('mars','earth','jupiter');
my #arr1 = ('mercury','mars');
my #arr2 = ('planet','earth','star','sun','planet2','mars');
%space = ( 'earth'=>{
'planet'=> {
'1' =>'US',
'2' =>'UK'
},
'planet2'=>{
'1' =>'AFRICA',
'2' =>'AUS'
}
},
'sun'=>{
'star' =>{
'1' =>'US',
'2' =>'UK'
}
},
'mars' =>{
'planet2' =>{
'1' =>'US',
'2' =>'UK'
}
}
);
now i am comparing the first two arrays in the following manner
foreach (#arr)
{
$arr_hash{$_} =1;
}
foreach my $name (keys %space)
{
foreach my $key (keys %{$space{$name}})
if ($arr_hash{$name} !=1)
{
#do something
}
now how should i compare the third array? I am trying something like this
else
{
if($arr2_hash{$key}{$name} !=1)
{
#do something else
}
I want to check whether the planet+earth pair(ex. the combination of key1 and key2 should be matched with first and second element in #arr2) is present in %space too?
any help?
I've done this twice now in Perl. Once for Test::More's is_deeply() and again for perl5i's are_equal(). Doing it right is not simple. Doing it without subroutines is just silly. If you want to see how this is done, look at are_equal(), though it can be done better.
But I don't think you actually need to compare two hashes.
What I think is happening is you need to check if the things in the various arrays are present in %space. For example...
my #arr = ('mars','earth','jupiter');
That would be true, true, and false.
my #arr1 = ('mercury','mars');
False, true.
my #arr2 = ('planet','earth','star','sun','planet2','mars');
Assuming these are pairs, they're all true.
I'm going to use better variable names than #arr which describe the contents, not the type of the structure. I'm also going to assume that use strict; use warnings; use v5.10; is present.
The first two are simple, loop through the array and check if there's an entry in %space. And we can do both arrays in one loop.
for my $name in (#names1, #names2) {
print "$name...";
say $space{$name} ? "Yes" : "No";
}
The third set is a little trickier, and how the data is laid out makes it harder. Putting pairs in a list is awkward, that's what hashes are for. This would make more sense...
my %object_types = (
earth => "planet", sun => "star", mars => "planet2"
);
Then it's easy. Check that $space{$name}{$type} is true.
for my $name (keys %object_types) {
my $type = $object_types{$name};
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
Or if you're stuck with the array we can iterate through the list in pairs.
# $i will be 0, 2, 4, etc...
for( my $i = 0; $i < $#stellar_objects; $i+=2 ) {
my($type, $name) = ($stellar_objects[$i], $stellar_objects[$i+1]);
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
What if you had a hash of types with multiple names to check instead?
my %object_types = (
planet =>['earth'],
star =>['sun'],
planet2 =>['earth','mars']
);
Same idea, but we need an inner loop over the names array. Good use of plural variable names helps keep thing straight.
for my $type (keys %object_types) {
my $names = $object_types{$type};
for my $name (#$names) {
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
}
Since these are really a set of pairs to search for, combining them into a big hash is a disservice. A better data structure to feed this search might be a list of pairs.
my #searches = (
[ planet => 'earth' ],
[ star => 'sun' ],
[ planet2 => 'earth' ],
[ planet2 => 'mars' ],
);
for my $search (#searches) {
my($type, $name) = #$search;
print "$name / $type...";
say $space{$name}{$type} ? "Yes" : "No";
}
For the record, %space is poorly designed. The first two levels are fine, name and type, it's the country hashes that are awkward.
'sun'=>{
'star' =>{
# This part
'1' =>'US',
'2' =>'UK'
}
},
This has none of the advantages of a hash, and all of the disadvantages. The advantage of a hash is it's very fast to look up a single key, but this makes it awkward by making the interesting part a value. If the key is trying to impose an order on the hash, use an array.
sun => {
star => [ 'US', 'UK' ]
},
Then you can get a list the countries: $countries = $space{$name}{$type}
If you want fast key lookup and order doesn't matter, use a hash with the keys being the thing stored, and the value being 1 (just a placeholder for "true").
sun => {
star => { 'US' => 1, 'UK' => 1 }
},
This takes advantage of hash key lookup and allows $space{$name}{$type}{$country} to quickly check for existence. The "values" (even though they're stored as keys) are also guaranteed to be unique. This formally known as a set, a collection of unique values.
And you can store further information in the value.

Merge Perl hashes into one array and loop through it

I'm creating a Perl plugin for cPanel which has to get all domains in the account of a user and display it in a HTML select field. Originally, I'm a PHP developer, so I'm having a hard time understanding some of the logic of Perl. I do know that cPanel plugins can also be written in PHP, but for this plugin I'm limited to Perl.
This is how I get the data from cPanel:
my #user_domains = $cpliveapi->uapi('DomainInfo', 'list_domains');
#user_domains = $user_domains[0]{cpanelresult}{result}{data};
This is what it looks like using print Dumper #user_domains:
$VAR1 = {
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => 'main-domain.com',
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com']
};
I want the data to look like this (thanks #simbabque):
#domains = qw(domain1.com domain2.com domain3.com main-domain.com parked1.com parked2.com parked3.com);
So, I want to exclude sub_domains and merge the others in 1 single-dimensional array so I can loop through them with a single loop. I've struggled the past few days with what sounds like an extremely simple task, but I just can't wrap my head around it.
You need something like this
If you find you have a copy of List::Util that doesn't include uniq then you can either upgrade the module or use this definition
sub uniq {
my %seen;
grep { not $seen{$_}++ } #_;
}
From your dump, the uapi call is returning a reference to a hash. That goes into $cp_response and then drilling down into the structure fetches the data hash reference into $data
delete removes the subdomain information from the hash.
The lists you want are the values of the hash to which $data refers, so I extract those. Those values are references to arrays of strings if there is more than one domain in the list, or simple strings if there is only one
The map converts all the domain names to a single list by dereferencing array references, or passing strings straight through. That is what the ref() ? #$_ : $_ is doing. FInally uniq removes multiple occurrences of the same name
use List::Util 'uniq';
my $cp_response = $cpliveapi->uapi('DomainInfo', 'list_domains');
my $data = $cp_response->{cpanelresult}{result}{data};
delete $data->{sub_domains};
my #domains = uniq map { ref() ? #$_ : $_ } values %$data;
output
parked1.com
parked2.com
parked3.com
domain1.com
domain2.com
domain3.com
main-domain.com
That isn't doing what you think it' doing. {} is the anonymous hash constructor, so you're making a 1 element array, with a hash in it.
You probably want:
use Data::Dumper;
my %user_domains = (
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => 'main-domain.com',
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com'],
);
print Dumper \%user_domains;
And at which point the 'other' array elements you can iterate through either a double loop:
foreach my $key ( keys %user_domains ) {
if ( not ref $user_domains{$key} ) {
print $user_domains{$key},"\n";
next;
}
foreach my $domain ( #{$user_domains{$key}} ) {
print $domain,"\n";
}
}
Or if you really want to 'flatten' your hash:
my #flatten = map { ref $_ : #$_ ? $_ } values %user_domains;
print Dumper \#flatten;
(You need the ref test, because without it, the non-array main-domain won't work properly)
So for the sake of consistency, you might be better off with:
my %user_domains = (
'addon_domains' => ['domain1.com', 'domain2.com', 'domain3.com'],
'parked_domains' => ['parked1.com', 'parked2.com', 'parked3.com'],
'main_domain' => ['main-domain.com'],
'sub_domains' => ['sub1.main-domain.com', 'sub2.main-domain.com'],
);

Accessing returned values as an array

I have simple XML that I want to read in Perl and make hash containing all read keys.
Consider this code:
my $content = $xml->XMLin("filesmap.xml")->{Item};
my %files = map { $_->{Path} => 1 } #$content;
This snippet works great when XML file contains many Item tags. Then $content is a reference to array. But when there is only one Item I get an error when dereferencing as array. My assumption is that $content is a reference to the scalar, not array.
What is the practice to make sure I get array of values read from XML?
What you need is to not use XML::Simple and then it's really trivial. My favourite for fairly straightforward XML samples is XML::Twig
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'filesmap.xml' );
my #files = map { $_ -> trimmed_text } $twig -> get_xpath ( '//Path' );
With a more detailed XML sample (and desired result) I'll be able to give you a better answer.
Part of the problem with XML::Simple is it tries to turn an XML data structure into perl data structures, and - because hashes are key-values and unordered but arrays are ordered, it has to guess. And sometimes it does it wrong, and other times it's inconsistent.
If you want it to be consistent, you can set:
my $xml = XMLin( "filesmap.xml", ForceArray => 1, KeyAttr => [], ForceContent => 1 );
But really - XML::Simple is just a route to pain. Don't use it. If you don't like XML::Twig, try XML::LibXML instead.
What I would say you need is a flatten-ing step.
my %files
= map { $_->{Path} => 1 }
# flatten...
map { ref() eq 'ARRAY' ? #$_ : $_ }
$xml->XMLin("filesmap.xml")->{Item}
;
You can do a check and force the return into an array reference if necessary:
my $content = $xml->XMLin("filesmap.xml")->{Item};
$content = ref $content eq 'ARRAY'
? $content
: [$content];

How to create objects out of each element in array?

I have a module with a new constructor:
package myClass;
sub new
{
my $class = shift;
my $arrayreference = shift;
bless $arrayreference, $class;
return $arrayreference;
};
I want to do something like:
foreach $ref (#arrayref)
{
$array1 = myClass->new($ref);
}
$array1 is being rewritten each time, but I want each element in the array to have a distinct object name (ex. $array1, $array2, $array3 etc.)
If you are working with a plural data structure (an array), then you need to store the result into a plural container (or multiple scalar containers). The idomatic way to do this is to use the map function:
my #object_array = map {myClass->new($_)} #source_array;
If you know that #source_array contains a fixed number of items, and you want scalars for each object:
my ($foo, $bar, $baz) = map {myClass->new($_)} #source_with_3_items;
I think you should use some hash or array to contain the objects.
foreach $ref (#arrayref)
{
push #array, myClass->new($ref);
$hash{$key++} = myClass->new($ref);
}
thus you can access them with $array[42] or $hash{42}.
There is essentially no name difference between $array[1] and $array1. There is a programmatic difference in that $array[1] can be "pieced together" and, under modern Perl environments $array1 can't. Thus I can write $array[$x] for any valid $x and get an item with a "virtual name" of $array.$x.
my #objects = map { MyClass->new( $_ ); } #data_array;
Thus, if you just want to append a number, you probably just want to collect your objects in an array. However, if you want a more complex naming scheme, one or more levels of hashes is probably a good way to go.
If you had a way to derive the name from the object data once formed, and had a method called name, you could do this:
my %object_map
= map { my $o = MyClass->new( $_ ); ( $o->name => $o ); } #data_array
;
Are you are trying to do it in place?
my #objects = (
{ ...args for 1st object... },
{ ...args for 2nd object... },
...
);
$_ = Class->new($_) for #objects;
However, you should avoid reusing variables like that.
my #object_data = (
{ ...args for 1st object... },
{ ...args for 2nd object... },
...
);
my #objects = map Class->new($_), #object_data;
I agree with Ade YU and Eric Strom, and have +1'd their answers: you should use one of their approaches. But what you ask is technically possible, using symbolic references, so for completeness' sake:
foreach my $i (0 .. $#arrayref)
{
no strict refs;
my $varname = 'array' . ($i + 1);
${$varname} = myClass->new($arrayref[$i]);
}

Resources