Generate a data structure in Perl - arrays

i have Perl "config files" containing data structures like this:
'xyz' => {
'solaris' => [
"value1",
"valueN",
],
'linux' => [
"valueX",
"valueN",
],
},
i call them doing a simple :
%config = do '/path/to/file.conf';
now, i would like to "generate" config files like this (construct a data structure "structure" directly and print it in a config file).
i can fill the hash of hashes (of arrays or anything) in a normal way, but how do i dump it afterwards in a config file ?
is there a clean & easy way of doing it ?
instead of having to do dirty things like :
print $FH "'xyz' => {\n";
print $FH " 'solaris' => [\n";
etc.
i "guess" Data::Dumper could do that..
thanks!

You want:
$Data::Dumper::Terse = 1;
See the documentation.
$Data::Dumper::Terse or $OBJ->Terse([NEWVAL])
When set, Data::Dumper will emit single, non-self-referential values as atoms/terms rather than statements. This means that the $VARn names will be avoided where possible, but be advised that such output may not always be parseable by eval.
Update (to address the comment below):
Data::Dumper will add the correct punctuation in order for you to get back exactly what you give it. If you give it a hash reference, then you will get a string that starts and ends with curly braces.
$ perl -MData::Dumper -E'$Data::Dumper::Terse=1; say Dumper { foo => { bar => "baz" }}'
{
'foo' => {
'bar' => 'baz'
}
}
If you give it an array reference, then you will get back a string that starts and ends with square brackets.
$ perl -MData::Dumper -E'$Data::Dumper::Terse=1; say Dumper [ foo => { bar => "baz" }]'
[
'foo',
{
'bar' => 'baz'
}
]
If, for some reason, you want neither of those, then give it a list of values.
$ perl -MData::Dumper -E'$Data::Dumper::Terse=1; say Dumper ( foo => { bar => "baz" })'
'foo'
{
'bar' => 'baz'
}
If you have a hash reference and you don't want the surrounding braces (which seems like a strange requirement, to be honest) then dereference the reference before passing it to Dumper(). That will convert the hash reference to a hash and the hash will be "unrolled" to a list by being passed to a function.
$ perl -MData::Dumper -E'$Data::Dumper::Terse=1; $ref = { foo => { bar => "baz" }}; say Dumper %$ref'
'foo'
{
'bar' => 'baz'
}

Related

How can I make an array of values after duplicate keys in a hash?

I have a question regarding duplicate keys in hashes.
Say my dataset looks something like this:
>Mammals
Cats
>Fish
Clownfish
>Birds
Parrots
>Mammals
Dogs
>Reptiles
Snakes
>Reptiles
Snakes
What I would like to get out of my script is a hash that looks like this:
$VAR1 = {
'Birds' => 'Parrots',
'Mammals' => 'Dogs', 'Cats',
'Fish' => 'Clownfish',
'Reptiles' => 'Snakes'
};
I found a possible answer here (https://www.perlmonks.org/?node_id=1116320). However I am not sure how to identify the values and the duplicates with the format of my dataset.
Here's the code that I have been using:
use Data::Dumper;
open($fh, "<", $file) || die "Could not open file $file $!/n";
while (<$fh>) {
chomp;
if($_ =~ /^>(.+)/){
$group = $1;
$animals{$group} = "";
next;
}
$animals{$group} .= $_;
push #{$group (keys %animals)}, $animals{$group};
}
print Dumper(\%animals);
When I execute it the push function does not seem to work as the output from this command is the same as when the command is absent (in the duplicate "Mammal" group, it will replace the cat with the dog instead of having both as arrays within the same group).
Any suggestions as to what I am doing wrong would be highly appreciated.
Thanks !
You're very close here. We can't get exactly the output you want from Data::Dumper because hashes can only have one value per key. The easiest way to fix that is to assign a reference to an array to the key and add things to it. But since you want to eliminate the duplicates as well, it's easier to build hashes as an intermediate representation then transform them to arrays:
use Data::Dumper;
my $file = "animals.txt";
open($fh, "<", $file) || die "Could not open file $file $!/n";
while (<$fh>) {
chomp;
if(/^>(.+)/){
$group = $1;
next;
}
$animals{$group} = {} unless exists $animals{$group};
$animals{$group}->{$_} = 1;
}
# Transform the hashes to arrays
foreach my $group (keys %animals) {
# Make the hash into an array of its keys
$animals{$group} = [ sort keys %{$animals{$group}} ];
# Throw away the array if we only have one thing
$animals{$group} = $animals{$group}->[0] if #{ $animals{$group} } == 1;
}
print Dumper(\%animals);
Result is
$VAR1 = {
'Reptiles' => 'Snakes',
'Fish' => 'Clownfish',
'Birds' => 'Parrots',
'Mammals' => [
'Cats',
'Dogs'
]
};
which is as close as you can get to what you had as your desired output.
For ease in processing the ingested data, it may actually be easier to not throw away the arrays in the one-element case so that every entry in the hash can be processed the same way (they're all references to arrays, no matter how many things are in them). Otherwise you've added a conditional to strip out the arrays, and you have to add another conditional test in your processing code to check
if (ref $item) {
# This is an anonymous array
} else {
# This is just a single entry
}
and it's easier to just have one path there instead of two, even if the else just wraps the single item into an array again. Leave them as arrays (delete the $animals{$group} = $animals{$group}->[0] line) and you'll be fine.
Given:
__DATA__
>Mammals
Cats
>Fish
Clownfish
>Birds
Parrots
>Mammals
Dogs
>Reptiles
Snakes
>Reptiles
Snakes
(at the end of the source code or a file with that content)
If you are willing to slurp the file, you can do something with a regex and a HoH like this:
use Data::Dumper;
use warnings;
use strict;
my %animals;
my $s;
while(<DATA>){
$s.=$_;
}
while($s=~/^>(.*)\R(.*)/mg){
++$animals{$1}{$2};
}
print Dumper(\%animals);
Prints:
$VAR1 = {
'Mammals' => {
'Cats' => 1,
'Dogs' => 1
},
'Birds' => {
'Parrots' => 1
},
'Fish' => {
'Clownfish' => 1
},
'Reptiles' => {
'Snakes' => 2
}
};
Which you can arrive to your format with this complete Perl program:
$s.=$_ while(<DATA>);
++$animals{$1}{$2} while($s=~/^>(.*)\R(.*)/mg);
while ((my $k, my $v) = each (%animals)) {
print "$k: ". join(", ", keys($v)) . "\n";
}
Prints:
Fish: Clownfish
Birds: Parrots
Mammals: Cats, Dogs
Reptiles: Snakes
(Know that the output order may be different than file order since Perl hashes do not maintain insertion order...)

Perl, how to find a size of array in hash from json

So, i have this example of json:
{
"tab" : {
"sort" : "true",
"sort_by" : "0",
"name" : "blablabla",
"cols" : [
"time_ep",
"count_warning",
"count_critical"
]
}
}
And after i decoded it into perl hash, i got a problem with "cols" array.
The print Dumper ${$params->{$tab}}{cols} looks like that
$VAR1 = [
'time_ep',
'count_warning',
'count_critical'
];
I can't find the size of this array. When i'm trying to do that - it returns 1 or ARRAY(adress), but when i try to get single elements of array like that:
print Dumper ${$params->{$tab}}{cols}[1] - i get what i need:
$VAR1 = 'count_warning';
I tried various options with refs, but nothing gave me what i need.
Any suggestions?
To get the size of cols:
print scalar #{$params->{tab}->{cols}};
You're retrieving an array reference from this key, dereferencing it to get an array and then using it in a scalar context - which returns the size of array.
#!/usr/bin/env perl
use strict;
use warnings;
use JSON;
use Data::Dumper;
my $params = from_json(
'{
"tab" : {
"sort" : "true",
"sort_by" : "0",
"name" : "blablabla",
"cols" : [
"time_ep",
"count_warning",
"count_critical"
]
}
}'
);
print Dumper $params;
print scalar #{ $params->{tab}->{cols} }
Note - scalar explicitly forces scalar context, but it will happen implicitly if you do 'scalar operations' like concatenation or numeric comparison. (print doesn't force scalar context)

Self-deleting array elements (once they become undefined)

I have a Perl script generating an array of weak references to objects. Once one of these objects goes out of scope, the reference to it in the array will become undefined.
ex (pseudo code):
# Imagine an array of weak references to objects
my #array = ( $obj1_ref, $obj2_ref, $obj3_ref );
# Some other code here causes the last strong reference
# of $obj2_ref to go out of scope.
# We now have the following array
#array = ( $obj1_ref, undef, $obj3_ref )
Is there a way to make the undefined reference automatically remove itself from the array once it becomes undefined?
I want #array = ($obj1_red, $obj3_ref ).
EDIT:
I tried this solution and it didn't work:
#!/usr/bin/perl
use strict;
use warnings;
{
package Object;
sub new { my $class = shift; bless({ #_ }, $class) }
}
{
use Scalar::Util qw(weaken);
use Data::Dumper;
my $object = Object->new();
my $array;
$array = sub { \#_ }->( grep defined, #$array );
{
my $object = Object->new();
#$array = ('test1', $object, 'test3');
weaken($array->[1]);
print Dumper($array);
}
print Dumper($array);
Output:
$VAR1 = [
'test1',
bless( {}, 'Object' ),
'test3'
];
$VAR1 = [
'test1',
undef,
'test3'
];
The undef is not removed from the array automatically.
Am I missing something?
EDIT 2:
I also tried removing undefined values from the array in the DESTROY method of the object, but that doesn't appear to work either. It appears that since the object is still technically not "destroyed" yet, the weak references are still defined until the DESTROY method is completed...
No, there isn't, short of using a magical (e.g. tied) array.
If you have a reference to an array instead of an array, you can use the following to filter out the undefined element efficiently without "hardening" any of the references.
$array = sub { \#_ }->( grep defined, #$array );
This doesn't copy the values at all, in fact. Only "C pointers" get copied.
Perl won't do this for you automatically. You have a couple of options. The first is to clean it yourself whenever you use it:
my #clean = grep { defined $_ } #dirty;
Or you could create a tie'd array and add that functionality to the FETCH* and POP hooks.

Cast a string into a hash or array in perl

I am currently parsing a comma separated string of 2-tuples into a hash of scalars. For example, given the input:
"ip=192.168.100.1,port=80,file=howdy.php",
I end up with a hash that looks like:
%hash =
{
ip => 192.168.100.1,
port => 80,
file => howdy.php
}
Code works fine and looks something like this:
my $paramList = $1;
my #paramTuples = split(/,/, $paramList);
my %hash;
foreach my $paramTuple (#paramTuples) {
my($key, $val) = split(/=/, $paramTuple, 2);
$hash{$key} = $val;
}
I'd like to expand the functionality from just taking scalars to also take arrays and hashes. So, another example input could be:
"ips=(192.168.100.1,192.168.100.2),port=80,file=howdy.php,hashthing={key1 => val1, key2 => val2}",
I end up with a hash that looks like:
%hash =
{
ips => (192.168.100.1, 192.168.100.2), # <--- this is an array
port => 80,
file => howdy.php,
hashthing => { key1 => val1, key2 => val2 } # <--- this is a hash
}
I know I can parse the input string character by character. For each tuple I would do the following: If the first character is a ( then parse an array. Else, if the first character is a { then parse a hash. Else parse a scalar.
A co-worker of mine indicated he thought you could turn a string that looked like "(red,yellow,blue)" into an array or "{c1 => red, c2 => yellow, c3 => blue}" into a hash with some kind of cast function. If I went this route, I could use a different delimiter instead of a comma to separate my 2-tuples like a |.
Is this possible in perl?
I think the "cast" function you're referring to, might be eval.
Using eval
use strict;
use warnings;
use Data::Dumper;
my $string = "{ a => 1, b => 2, c => 3}";
my $thing = eval $string;
print "thing is a ", ref($thing),"\n";
print Dumper $thing;
Will print:
thing is a HASH
$VAR1 = {
'a' => 1,
'b' => 2,
'c' => 3
};
Or for arrays:
my $another_string = "[1, 2, 3 ]";
my $another_thing = eval $another_string;
print "another_thing is ", ref ( $another_thing ), "\n";
print Dumper $another_thing;
another_thing is ARRAY
$VAR1 = [
1,
2,
3
];
Although note that eval requires you to use brackets suitable for the appropriate data types - {} for anon hashes, and [] for anon arrays. So to take your example above:
my %hash4;
my $ip_string = "ips=[192.168.100.1,192.168.100.2]";
my ( $key, $value ) = split ( /=/, $ip_string );
$hash4{$key} = eval $value;
my $hashthing_string = "{ key1 => 'val1', key2 => 'val2' }";
$hash4{'hashthing'} = eval $hashthing_string;
print Dumper \%hash4;
Gives:
$VAR1 = {
'hashthing' => {
'key2' => 'val2',
'key1' => 'val1'
},
'ips' => [
192.168.100.1,
192.168.100.2
]
};
Using map to make an array into a hash
If you want to turn an array into a hash, the map function is for that.
my #array = ( "red", "yellow", "blue" );
my %hash = map { $_ => 1 } #array;
print Dumper \%hash;
Using slices of hashes
You can also use a slice if you have known values and known keys:
my #keys = ( "c1", "c2", "c3" );
my %hash2;
#hash2{#keys} = #array;
print Dumper \%hash2;
JSON / XML
Or if you have control over the export mechanism, you may find exporting as JSON or XML format would be a good choice, as they're well defined standards for 'data as text'. (You could perhaps use Perl's Storable too, if you're just moving data between Perl processes).
Again, to take the %hash4 above (with slight modifications, because I had to quote the IPs):
use JSON;
print encode_json(\%hash4);
Gives us:
{"hashthing":{"key2":"val2","key1":"val1"},"ips":["192.168.100.1","192.168.100.2"]}
Which you can also pretty-print:
use JSON;
print to_json(\%hash4, { pretty => 1} );
To get:
{
"hashthing" : {
"key2" : "val2",
"key1" : "val1"
},
"ips" : [
"192.168.100.1",
"192.168.100.2"
]
}
This can be read back in with a simple:
my $data_structure = decode_json ( $input_text );
Style point
As a point of style - can I suggest that the way you've formatted your data structures isn't ideal. If you 'print' them with Dumper then that's a common format that most people will recognise. So your 'first hash' looks like:
Declared as (not - my prefix, and () for the declaration, as well as quotes required under strict):
my %hash3 = (
"ip" => "192.168.100.1",
"port" => 80,
"file" => "howdy.php"
);
Dumped as (brackets of {} because it's an anonymous hash, but still quoting strings):
$VAR1 = {
'file' => 'howdy.php',
'ip' => '192.168.100.1',
'port' => 80
};
That way you'll have a bit more joy with people being able to reconstruct and interpret your code.
Note too - that the dumper style format is also suitable (in specific limited cases) for re-reading via eval.
Try this but compound values will have to be parsed separately.
my $qr_key_1 = qr{
( # begin capture
[^=]+ # equal sign is separator. NB: spaces captured too.
) # end capture
}msx;
my $qr_value_simple_1 = qr{
( # begin capture
[^,]+ # comma is separator. NB: spaces captured too.
) # end capture
}msx;
my $qr_value_parenthesis_1 = qr{
\( # starts with parenthesis
( # begin capture
[^)]+ # end with parenthesis NB: spaces captured too.
) # end capture
\) # end with parenthesis
}msx;
my $qr_value_brace_1 = qr{
\{ # starts with brace
( # begin capture
[^\}]+ # end with brace NB: spaces captured too.
) # end capture
\} # end with brace
}msx;
my $qr_value_3 = qr{
(?: # group alternative
$qr_value_parenthesis_1
| # or other value
$qr_value_brace_1
| # or other value
$qr_value_simple_1
) # end group
}msx;
my $qr_end = qr{
(?: # begin group
\, # ends in comma
| # or
\z # end of string
) # end group
}msx;
my $qr_all_4 = qr{
$qr_key_1 # capture a key
\= # separates key from value(s)
$qr_value_3 # capture a value
$qr_end # end of key-value pair
}msx;
while( my $line = <DATA> ){
print "\n\n$line"; # for demonstration; remove in real script
chomp $line;
while( $line =~ m{ \G $qr_all_4 }cgmsx ){
my $key = $1;
my $value = $2 || $3 || $4;
print "$key = $value\n"; # for demonstration; remove in real script
}
}
__DATA__
ip=192.168.100.1,port=80,file=howdy.php
ips=(192.168.100.1,192.168.100.2),port=80,file=howdy.php,hashthing={key1 => val1, key2 => val2}
Addendum:
The reason why it is so difficult to expand the parse is, in one word, context. The first line of data, ip=192.168.100.1,port=80,file=howdy.php is context free. That is, all the symbols in it do not change their meaning. Context-free data format can be parsed with regular expressions alone.
Rule #1: If the symbols denoting the data structure never change, it is a context-free format and regular expressions can parse it.
The second line, ips=(192.168.100.1,192.168.100.2),port=80,file=howdy.php,hashthing={key1 => val1, key2 => val2} is a different issue. The meaning of the comma and equal sign changes.
Now, you're thinking the comma doesn't change; it still separates things, doesn't it? But it changes what it separates. That is why the second line is more difficult to parse. The second line has three contexts, in a tree:
main context
+--- list context
+--- hash context
The tokienizer must switch parsing sets as the data switches context. This requires a state machine.
Rule #2: If the contexts of the data format form a tree, then it requires a state machine and different parsers for each context. The state machine determines which parser is in use. Since every context except the root have only one parent, the state machine can switch back to the parent at the end of its current context.
And this is the last rule, for completion sake. It is not used in this problem.
Rule #3: If the contexts form a DAG (directed acyclic graph) or a recursive (aka cyclic) graph, then the state machine requires a stack so it will know which context to switch back to when it reaches the end of the current context.
Now, you may have notice that there is no state machine in the above code. It's there but it's hidden in the regular expressions. But hiding it has a cost: the list and hash contexts are not parsed. Only their strings are found. They have to be parsed separately.
Explanation:
The above code uses the qr// operator to create the parsing regular expression. The qr// operator compiles a regular expression and returns a reference to it. This reference can be used in a match, substitute, or another qr// expression. Think of each qr// expression as a subroutine. Just like normal subroutines, qr// expressions can be used in other qr// expressions, building up complex regular expressions from simpler ones.
The first expression, $qr_key_1, captures the key name in the main context. Since the equal sign separates the key from the value, it captures all non-equal-sign characters. The "_1" on the end of the variable name is what I use to remind myself that one capture group is present.
The options on the end of the expression, /m, /s, and /x, are recommended in Perl Best Practices but only the /x option has an effect. It allows spaces and comments in the regular expression.
The next expression, $qr_value_simple_1, captures simple values for the key.
The next one, $qr_value_parenthesis_1, handles the list context. This is possible only because a closing parenthesis has only one meaning: end of list context. But is also has a price: the list is not parsed; only its string is found.
And again for $qr_value_brace_1: the closing brace has only one meaning. And the hash is also not parsed.
The $qr_value_3 expression combines the value REs into one. The $qr_value_simple_1 must be last but the others can be in any order.
The $qr_end parses the end of a field in the main context. There is no number at its end because it does not capture anything.
And finally, $qr_all_4 puts them all together to create the RE for data.
The RE used in the inner loop, m{ \G $qr_all_4 }cgmsx, parses out each field in the main context. The \G assertion means: if the has been changed since the last call (or it has never been called), then start the match at the beginning of the string; otherwise, start where the last match finished. This is used in conjunction with the /c and /g``options to parse each field out from the$line`, one at a time for processing inside the loop.
And that is briefly what is happening inside the code. ☺

In Perl how can I read a file of unknown length into multiple hashes to be stored in an array for later use?

I have a config file that looks a bit like this:
add
1
2
concatenate
foo
bar
blat
What I'm trying to do is turn this into hashes like %hash = (name=>"add", args=> [1,2]) etc, and push the hash references into a single array. Looping through the file and creating each hash seems straightforward enough, except I get stuck when it comes to naming these hashes to push their references into the array. The config file is going to change all the time and have a variable number of different name/arg combinations to store. Is there a way to iterate through hash names so I can push them into an array one at a time?
So far it looks like this:
my %temphash = (name=>'add', args=>[1,2]);
push (#array, \%temphash);
Can I make that %temphash into something generated on the fly and push it before moving on to the next one?
Edit: Context
The plan is to use those 'name' keys to call subroutines. So something like this could work:
my %subhash = (add=>\&addNumbers, concatenate=>\&concat);
Except the list of subroutines I'm going to need to call are in the config file and I won't know what they are until I start reading from it. Even if I include the names of the subroutines right there in the config file, how do I iterate through them and add them as elements to that hash?
Well, you can simply use curly brackets to make an anonymous hash:
push #array, { name => 'add', args => [1,2] };
You can create the same effect by utilising the lexical scope of the my declaration. E.g.:
my #array;
while ( ... ) {
...
my %hash = ( ... );
push #array, \%hash;
}
If I'm correctly understanding what you're asking, then you can write:
push #array, { name=>'add', args=>[1,2] };
where { ... } is a reference to an anonymous hash.
That said, I'm a bit surprised that you want an array of hashes, when each hash has just a name and args. Why not have a single hash mapping from names to args? :
%array = ( add => [ 1, 2 ], concatenate => [ 'foo', 'bar', 'baz' ] );
Something like this will do what you need
use strict;
use warnings;
open my $fh, '<', 'data_file' or die $!;
my $item;
my #data;
while (<$fh>) {
chomp;
next unless /^(\s*)(.+?)\s*$/;
if ($1) {
push #{ $item->{args} }, $2;
}
else {
push #data, $item if $item;
$item = { name => $2, args => [] };
}
}
push #data, $item if $item;
use Data::Dump;
dd \#data;
output
[
{ args => [1, 2], name => "add" },
{ args => ["foo", "bar", "blat"], name => "concatenate" },
]

Resources