I am new to Perl and have a little idea about hashes. I have a hash of array of hash of array of hash of array of hash (HoAoHoAoHoAoH) as follows.
%my_hash = (
key00 => 'value00',
key01 => [
{ key10 => 'value10',
key11 => 'value11',
key12 => [
{ key20 => 'value20',
key21 => 'value21',
key22 => [
{ key30 => 'value30',
key31 => [
{ color => 'blue', quantity => 10, boxes => [0,1,3] },
{ color => 'red', quantity => 2, boxes => [2,3] },
{ color => 'green', quantity => 5, boxes => [0] },
],
},
],
},
]
}
]
);
What is the easiest way to access the "color", "quantity" and "boxes"? I also need to do arithmetic operations with the "quantity"s, such as 10+2+5 (quantity0+quantity1+quantity2).
This looks a lot like an XY problem. What are you trying to solve here?
You can access an element of your data structure like this:
print $my_hash{key01}[0]{key12}[0]{key22}[0]{key31}[0]{color},"\n";
You can also iterate the bottom elements with:
foreach my $something ( #{ $my_hash{key01}[0]{key12}[0]{key22}[0]{key31} } ) {
print $something->{'color'};
print $something->{'quantity'}
}
But this doesn't look like a real problem - what are you actually trying to accomplish? I might guess you're trying to parse XML or similar, in which case there's almost certainly a better approach.
Related
My task is convert array, containing hash with x keys to x-1 dimensional hash.
Example:
use Data::Dumper;
my $arr = [
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm1',
'value' => 1,
},
{
'source' => 'source1',
'group' => 'group1',
'param' => 'prm2',
'value' => 2,
},
];
my $res;
for my $i (#$arr) {
$res->{ $i->{source} } = {};
$res->{ $i->{source} }{ $i->{group} } = {};
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;
my $res_expected = {
'source1' => {
'group1' => {
'prm1' => 1, # wasn't added, why ?
'prm2' => 2
}
}
};
However it doesn't work as expected, 'prm1' => 1 wasn't added. What is wrong and how to solve this task ?
The problem is that you are assigning to the source even if something was there, and you lose it. Just do a ||= instead of = and you'll be fine.
Or even easier, just use the fact that Perl autovivifies and leave that out.
my $res;
for my $i (#$arr) {
$res->{ $i->{source} }{ $i->{group} }{ $i->{param} } = $i->{value};
}
warn Dumper $res;
The first 2 lines in the for loop are what is causing your problem. They assign a new hash reference each iteration of the loop (and erase what was entered in the previous iteration). In perl, there is no need to set a reference as you did. Just eliminate the first 2 lines and your data structure will be as you wish.
The method you chose only shows 'prmt' => 2 because that was the last item entered.
Is there a way I can pick a value in hash of array, and reformat it to be only hash?
Is there any method I can do with it?
Example
[
{
"qset_id" => 1,
"name" => "New1"
},
{
"qset_id" => 2,
"name" => "New2"
}
]
Result
{
1 => {
"name" => "New1"
},
2 => {
"name" => "New2"
}
}
You can basically do arbitary manipulation using reduce function on array or hashes, for example this will get your result
array.reduce({}) do |result, item|
result[item["qset_id"]] = { "name" => item["name"] }
result
end
You can do the same thing with each.with_object do:
array.each.with_object({}) do |item, result|
result[item["qset_id"]] = { "name" => item["name"] }
end
it's basically the same thing but you don't have to make each iteration return the result (called a 'memo object').
You could iterate over the first hash and map it into a second hash:
h1.map{|h| {h['qset_id'] => {'name' => h['name']}} }
# => [{1=>{"name"=>"New1"}}, {2=>{"name"=>"New2"}}]
... but that would return an array. You could pull the elements into a second hash like this:
h2 = {}
h1.each do |h|
h2[h['qset_id']] = {'name' => h['name']}
end
>> h2
=> {1=>{"name"=>"New1"}, 2=>{"name"=>"New2"}}
I realize there are many questions of varying degrees of similarity to this one. I've searched at length (using: [ruby] merge array of hashes on key) for them and I have attempted bits and pieces of each answer to try to solve this on my own. Before coming to StackOverflow, I even shared my question with my colleagues who have been equally stumped. This seems to be a unique question or we're all just staring too closely at it to see an otherwise obvious answer.
Essential Requirements
The solution must work with the Ruby 1.8.7 standard library (no gems). Please feel free to additionally illustrate solutions for other versions of Ruby, but doing so will not automatically make one answer better than another.
The structure of the input data cannot be changed by its provider; the entire data structure is delivered as-is. If the data needs to be temporarily rearranged to provide the most efficient answer, that's perfectly fine as long as the output matches the required sample below. In addition, the solution can make no assumptions about the position of the sorting keys within the Hashes.
The source variable cannot be altered in any way; it is immutable at run-time (this is checked), so the result must be provided to a new variable.
The sample data below is fiction but the problem is real. There are other levels of Arrays-of-Hashes that must also be merged on other keys in the same way; so, the very best answer can be generically applied to arbitrary levels of the data structure.
The best solution will be easy to read, maintain, and apply to arbitrary -- though similar -- data structures. It needn't be a one-liner but if you can meet all the requirements in a single line of Ruby code, kudos to you.
Sample Data
If we think of the Apache Tomcat server.xml file as a Ruby data structure rather than XML, it can provide a very good analog for this problem. Assume further that the default configuration is merged upstream -- before being delivered to you -- with data that you must consolidate before some later operation consumes the resulting data structure. The source data will look very much like this:
source = {
:Server => {
:'attribute.port' => 8005,
:'attribute.shutdown' => 'SHUTDOWN',
:Listener => [
{ :'attribute.className' => 'org.apache.catalina.startup.VersionLoggerListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'off'},
{ :'attribute.className' => 'org.apache.catalina.core.JasperListener' },
{ :'attribute.className' => 'org.apache.catalina.core.JreMemoryLeakPreventionListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'on'}
],
:Service => [
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1'},
{ :'attribute.port' => 8009,
:'attribute.protocol' => 'AJP/1.3'}
],
:Engine => {
:'attribute.name' => 'Catalina',
:'attribute.defaultHost' => 'localhost',
:Realm => {
:'attribute.className' => 'org.apache.catalina.realm.LockOutRealm',
:Realm => [
{ :'attribute.className' => 'org.apache.catalina.realm.UserDatabaseRealm',
:'attribute.resourceName' => 'UserDatabase'}
]
},
:Host => [
{ :'attribute.name' => 'localhost',
:'attribute.appBase' => 'webapps',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.AccessLogValve',
:'attribute.directory' => 'logs'}
]
}
]
}
},
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1',
:'attribute.secure' => true,
:'attribute.scheme' => 'https',
:'attribute.proxyPort' => 443}
]
},
{ :'attribute.name' => 'JSVCBridge',
:Connector => [
{ :'attribute.port' => 8010,
:'attribute.protocol' => 'HTTP/2'}
]
},
{ :'attribute.name' => 'Catalina',
:Engine => {
:Host => [
{ :'attribute.name' => 'localhost',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.RemoteIpValve',
:'attribute.internalProxies' => '*',
:'attribute.remoteIpHeader' => 'X-Forwarded-For',
:'attribute.protocolHeader' => 'X-Forwarded-Proto',
:'attribute.protocolHeaderHttpsValue' => 'https'}
]
}
]
}
}
]
}
}
The challenge is to produce this result from it:
result = {
:Server => {
:'attribute.port' => 8005,
:'attribute.shutdown' => 'SHUTDOWN',
:Listener => [
{ :'attribute.className' => 'org.apache.catalina.startup.VersionLoggerListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'on'},
{ :'attribute.className' => 'org.apache.catalina.core.JasperListener' },
{ :'attribute.className' => 'org.apache.catalina.core.JreMemoryLeakPreventionListener' },
],
:Service => [
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1',
:'attribute.secure' => true,
:'attribute.scheme' => 'https',
:'attribute.proxyPort' => 443},
{ :'attribute.port' => 8009,
:'attribute.protocol' => 'AJP/1.3'}
],
:Engine => {
:'attribute.name' => 'Catalina',
:'attribute.defaultHost' => 'localhost',
:Realm => {
:'attribute.className' => 'org.apache.catalina.realm.LockOutRealm',
:Realm => [
{ :'attribute.className' => 'org.apache.catalina.realm.UserDatabaseRealm',
:'attribute.resourceName' => 'UserDatabase'}
]
},
:Host => [
{ :'attribute.name' => 'localhost',
:'attribute.appBase' => 'webapps',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.AccessLogValve',
:'attribute.directory' => 'logs'},
{ :'attribute.className' => 'org.apache.catalina.valves.RemoteIpValve',
:'attribute.internalProxies' => '*',
:'attribute.remoteIpHeader' => 'X-Forwarded-For',
:'attribute.protocolHeader' => 'X-Forwarded-Proto',
:'attribute.protocolHeaderHttpsValue' => 'https'}
]
}
]
}
},
{ :'attribute.name' => 'JSVCBridge',
:Connector => [
{ :'attribute.port' => 8010,
:'attribute.protocol' => 'HTTP/2'}
]
}
]
}
}
The Question
We need source to become result. To get there, :Listener gets merged by attribute.className; :Service gets merged by attribute.name; the resulting Arrays of :Connector get merged by attribute.port; and such. The identification of the location of the Arrays-of-Hashes within the data structure and the key which each is to be merged on should be easily provided to the solution.
The real essence of this question is finding that generic solution that can apply to multiple arbitrary levels of a complex data structure like this, merge Arrays-of-Hashes by a supplied key, and produce the merged result after the set of location and key pairs is provided.
Thank you all very much for your time and interest in this question.
There may be more elegant ways of condensing this code but I finally developed an answer to this very challenging question. While Wand Maker's answer came close, it was based on the untenable assumption that the order of the keys in the Hashes would be predictable and stable. As this is a Ruby 1.8.7 problem and because the data provider makes no such guarantee, I had to take a different path; we had to inform the merge engine which key to use for each Array-of-Hashes.
My (non-optimized) solution required three functions and an external Hash that defines the necessary merge keys:
deepMergeHash walks through a hash, deeply scanning for Arrays
deepMergeArrayOfHashes performs the desired merge against an Array-of-Hashes
subMergeHelper recursively assists deepMergeArrayOfHashes
The trick was to not only treat the Hash recursively, but to always be aware of the "present" location within the Hash so that the necessary merge key could be known. Having established a way to determine that location, defining, finding, and using the merge keys became trivial.
The Solution
def subMergeHelper(lhs, rhs, mergeKeys, crumbTrail)
lhs.merge(rhs){|subKey, subLHS, subRHS|
mergeTrail = crumbTrail + ':' + subKey.to_s
case subLHS
when Array
deepMergeArrayOfHashes(subLHS + subRHS, mergeKeys, mergeTrail)
when Hash
subMergeHelper(subLHS, subRHS, mergeKeys, mergeTrail)
else
subRHS
end
}
end
def deepMergeArrayOfHashes(arrayOfHashes, mergeKeys, crumbTrail)
mergedArray = arrayOfHashes
if arrayOfHashes.all? {|e| e.class == Hash}
if mergeKeys.has_key?(crumbTrail)
mergeKey = mergeKeys[crumbTrail]
mergedArray = arrayOfHashes.group_by{|evalHash| evalHash[mergeKey.to_sym]}.map{|groupID, groupArrayOfHashes|
groupArrayOfHashes.reduce({}){|memoHash, evalHash|
memoHash.merge(evalHash){|hashKey, lhs, rhs|
deepTrail = crumbTrail + ':' + hashKey.to_s
case lhs
when Array
deepMergeArrayOfHashes(lhs + rhs, mergeKeys, deepTrail)
when Hash
subMergeHelper(lhs, rhs, mergeKeys, deepTrail)
else
rhs
end
}
}
}
else
$stderr.puts "[WARNING] deepMergeArrayOfHashes: received an Array of Hashes without merge key at #{crumbTrail}."
end
else
$stderr.puts "[WARNING] deepMergeArrayOfHashes: received an Array containing non-Hashes at #{crumbTrail}?"
end
return mergedArray
end
def deepMergeHash(hashConfig, mergeKeys, crumbTrail = '')
return hashConfig unless Hash == hashConfig.class
mergedConfig = {}
hashConfig.each{|nodeKey, nodeValue|
nodeCrumb = nodeKey.to_s
testTrail = crumbTrail + ':' + nodeCrumb
case nodeValue
when Hash
mergedConfig[nodeKey] = deepMergeHash(nodeValue, mergeKeys, testTrail)
when Array
mergedConfig[nodeKey] = deepMergeArrayOfHashes(nodeValue, mergeKeys, testTrail)
else
mergedConfig[nodeKey] = nodeValue
end
}
return mergedConfig
end
Example Use
Using the data in the question, we can now:
mergeKeys = {
':Server:Listener' => 'attribute.className',
':Server:Service' => 'attribute.name',
':Server:Service:Connector' => 'attribute.port',
':Server:Service:Engine:Host' => 'attribute.name',
':Server:Service:Engine:Host:Valve' => 'attribute.className',
':Server:Service:Engine:Realm:Realm' => 'attribute.className'
}
mergedConfig = deepMergeHash(source, mergeKeys)
I can't seem to perform a successful equality test like (result == mergedConfig), but a visual inspection of mergedConfig shows that it is identical to result except that the order of some keys changes. I suspect that's a side-effect of using Ruby 1.8.x and is acceptable for this question.
Happy coding, everyone and thank you so much for your interest in this discussion.
Solution based on assumption that you are merging hashes based on value of first key in the given array of hashes is given below:
def merge_ary(ary_hash)
# Lets not process something that is not array of hash
return ary_hash if not ary_hash.all? {|h| h.class == Hash }
# If array of hash, lets group them by value of first key
# Then, reduce the resultant group of hashes by merging them.
c = ary_hash.group_by {|h| h.values.first}.map do |k,v|
v_reduced = v.reduce({}) do |memo_hash, h|
memo_hash.merge(h) do |k, v1, v2|
v1.class == Array ? merge_ary(v1 + v2) : v2
end
end
[k, v_reduced]
end
return Hash[c].values
end
def merge_hash(hash)
t = hash.map do |k,v|
new_v = v
if v.class == Hash
new_v = merge_hash(v)
elsif v.class == Array
new_v = merge_ary(v)
end
[k,new_v]
end
return Hash[t]
end
# Test the output
merge_hash(source) == result
#=> true
I thought I could do it the way that has been stated below. However when I sort it this way the output is the values in hexadecimal values, instead of the string pointing to "item" in the array #menu. What I want to achieve is to sort it by "item-name"
my #menu = (
{ item => "Blazer", price => 100, color => "Brown" },
{ item => "Jeans", price => 50, color => "Blue" },
{ item => "Shawl", price => 30, color => "Red" },
{ item => "Suit", price => 40, color => "Black" },
{ item => "Top", price => 25, color => "White" },
);
my #test = sort {item } #menu;
foreach (#test){
print $_;
}
Your print $_ prints the string value of each hash reference, so you will get something like HASH(0x1d33524). You need to print the fields of each hash that you're interested in.
Also, you need a proper comparison expression inside the sort block. Just giving the name of a hash key won't do anything useful.
use strict;
use warnings;
my #menu = (
{ item => 'Blazer', price => 100, color => 'Brown' },
{ item => 'Jeans', price => 50, color => 'Blue' },
{ item => 'Shawl', price => 30, color => 'Red' },
{ item => 'Suit', price => 40, color => 'Black' },
{ item => 'Top', price => 25, color => 'White' },
);
my #test = sort { $a->{item} cmp $b->{item} } #menu;
for ( #test ) {
print "#{$_}{qw/ item price color /}\n";
}
output
Blazer 100 Brown
Jeans 50 Blue
Shawl 30 Red
Suit 40 Black
Top 25 White
Update
If all you want is a sorted list of the item field values then you can write this more simply
use strict;
use warnings;
my #menu = (
{ item => 'Blazer', price => 100, color => 'Brown' },
{ item => 'Jeans', price => 50, color => 'Blue' },
{ item => 'Shawl', price => 30, color => 'Red' },
{ item => 'Suit', price => 40, color => 'Black' },
{ item => 'Top', price => 25, color => 'White' },
);
my #test = sort map { $_->{item} } #menu;
print "$_\n" for #test;
output
Blazer
Jeans
Shawl
Suit
Top
The contents of the curlies needs to be an expression that returns the whether the elements in $a should appear before the element in $b in the final result.
The elements, in this case, are references to hashes. You want to compare the item element of those hashes, so
sort { $a->{item} cmp $b->{item} }
The first argument to sort BLOCK LIST is the block that compares two members of the list, not the way how to extract the things to compare. See sort.
my #test = sort { $a->{item} cmp $b->{item} } #menu;
Sort::Key allows you to specify "what to sort by", not "how to compare elements".
use Sort::Key qw{ keysort };
# ...
my #test = keysort { $_->{item} } #menu;
In your code without strict, the string "item" is used to compare the elements, which doesn't really change the order in any way. What you see in the output is the representation of the members of the array, i.e. hash references. If you want to see the items only, use
for (#test) {
print $_->{item}, "\n";
}
See also List::UtilsBy:
use List::UtilsBy 'sort_by';
my #test = sort_by { $_->{item} } #menu;
This is my data structure created by Data::Dumper->Dumper:
$VAR1 = {
'name' => 'genomic',
'class' => [
{
'reference' => [
{
'name' => 'chromosome',
'referenced-type' => 'Chromosome'
},
{
'name' => 'chromosomeLocation',
'referenced-type' => 'Location'
},
{
'name' => 'sequence',
'referenced-type' => 'Sequence'
},
{
'name' => 'sequenceOntologyTerm',
'referenced-type' => 'SOTerm'
}
],
}
],
};
(trimmed for clarity)
I would like to return a reference to an array of each name value under reference in a single line.
Currently I have
$class->[0]{reference}[0..3]{name}
but no avail.
Also this example has four sibling-hashes with indexes 0..3, how can I represent the whole array independent of the number of elements?
There isn't an easy syntax to do that, unfortunately. You'll have to use map:
my $array_ref = [
map { $_->{name} } #{ $class->[0]{reference} }
];
Then, if you dump out $array_ref, you'll see it contains:
$array_ref = [
'chromosome',
'chromosomeLocation',
'sequence',
'sequenceOntologyTerm'
];
If you need references to the original strings (not copies), you just need a backslash before $_ (so it'd be \$_->{name} inside the map).
$class->[0]{reference} is an array reference, so you have to dereference it with #{}:
#{$class->[0]{reference}}
Is the 'whole array', you can then use slice syntax on the end to get a part of it:
#{$class->[0]{reference}}[0..3]
From there you're working with an array of hashrefs, so you'll have to iterate over it with for or map.