I realize there are many questions of varying degrees of similarity to this one. I've searched at length (using: [ruby] merge array of hashes on key) for them and I have attempted bits and pieces of each answer to try to solve this on my own. Before coming to StackOverflow, I even shared my question with my colleagues who have been equally stumped. This seems to be a unique question or we're all just staring too closely at it to see an otherwise obvious answer.
Essential Requirements
The solution must work with the Ruby 1.8.7 standard library (no gems). Please feel free to additionally illustrate solutions for other versions of Ruby, but doing so will not automatically make one answer better than another.
The structure of the input data cannot be changed by its provider; the entire data structure is delivered as-is. If the data needs to be temporarily rearranged to provide the most efficient answer, that's perfectly fine as long as the output matches the required sample below. In addition, the solution can make no assumptions about the position of the sorting keys within the Hashes.
The source variable cannot be altered in any way; it is immutable at run-time (this is checked), so the result must be provided to a new variable.
The sample data below is fiction but the problem is real. There are other levels of Arrays-of-Hashes that must also be merged on other keys in the same way; so, the very best answer can be generically applied to arbitrary levels of the data structure.
The best solution will be easy to read, maintain, and apply to arbitrary -- though similar -- data structures. It needn't be a one-liner but if you can meet all the requirements in a single line of Ruby code, kudos to you.
Sample Data
If we think of the Apache Tomcat server.xml file as a Ruby data structure rather than XML, it can provide a very good analog for this problem. Assume further that the default configuration is merged upstream -- before being delivered to you -- with data that you must consolidate before some later operation consumes the resulting data structure. The source data will look very much like this:
source = {
:Server => {
:'attribute.port' => 8005,
:'attribute.shutdown' => 'SHUTDOWN',
:Listener => [
{ :'attribute.className' => 'org.apache.catalina.startup.VersionLoggerListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'off'},
{ :'attribute.className' => 'org.apache.catalina.core.JasperListener' },
{ :'attribute.className' => 'org.apache.catalina.core.JreMemoryLeakPreventionListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'on'}
],
:Service => [
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1'},
{ :'attribute.port' => 8009,
:'attribute.protocol' => 'AJP/1.3'}
],
:Engine => {
:'attribute.name' => 'Catalina',
:'attribute.defaultHost' => 'localhost',
:Realm => {
:'attribute.className' => 'org.apache.catalina.realm.LockOutRealm',
:Realm => [
{ :'attribute.className' => 'org.apache.catalina.realm.UserDatabaseRealm',
:'attribute.resourceName' => 'UserDatabase'}
]
},
:Host => [
{ :'attribute.name' => 'localhost',
:'attribute.appBase' => 'webapps',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.AccessLogValve',
:'attribute.directory' => 'logs'}
]
}
]
}
},
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1',
:'attribute.secure' => true,
:'attribute.scheme' => 'https',
:'attribute.proxyPort' => 443}
]
},
{ :'attribute.name' => 'JSVCBridge',
:Connector => [
{ :'attribute.port' => 8010,
:'attribute.protocol' => 'HTTP/2'}
]
},
{ :'attribute.name' => 'Catalina',
:Engine => {
:Host => [
{ :'attribute.name' => 'localhost',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.RemoteIpValve',
:'attribute.internalProxies' => '*',
:'attribute.remoteIpHeader' => 'X-Forwarded-For',
:'attribute.protocolHeader' => 'X-Forwarded-Proto',
:'attribute.protocolHeaderHttpsValue' => 'https'}
]
}
]
}
}
]
}
}
The challenge is to produce this result from it:
result = {
:Server => {
:'attribute.port' => 8005,
:'attribute.shutdown' => 'SHUTDOWN',
:Listener => [
{ :'attribute.className' => 'org.apache.catalina.startup.VersionLoggerListener' },
{ :'attribute.className' => 'org.apache.catalina.core.AprLifecycleListener',
:'attribute.SSLEngine' => 'on'},
{ :'attribute.className' => 'org.apache.catalina.core.JasperListener' },
{ :'attribute.className' => 'org.apache.catalina.core.JreMemoryLeakPreventionListener' },
],
:Service => [
{ :'attribute.name' => 'Catalina',
:Connector => [
{ :'attribute.port' => 8080,
:'attribute.protocol' => 'HTTP/1.1',
:'attribute.secure' => true,
:'attribute.scheme' => 'https',
:'attribute.proxyPort' => 443},
{ :'attribute.port' => 8009,
:'attribute.protocol' => 'AJP/1.3'}
],
:Engine => {
:'attribute.name' => 'Catalina',
:'attribute.defaultHost' => 'localhost',
:Realm => {
:'attribute.className' => 'org.apache.catalina.realm.LockOutRealm',
:Realm => [
{ :'attribute.className' => 'org.apache.catalina.realm.UserDatabaseRealm',
:'attribute.resourceName' => 'UserDatabase'}
]
},
:Host => [
{ :'attribute.name' => 'localhost',
:'attribute.appBase' => 'webapps',
:Valve => [
{ :'attribute.className' => 'org.apache.catalina.valves.AccessLogValve',
:'attribute.directory' => 'logs'},
{ :'attribute.className' => 'org.apache.catalina.valves.RemoteIpValve',
:'attribute.internalProxies' => '*',
:'attribute.remoteIpHeader' => 'X-Forwarded-For',
:'attribute.protocolHeader' => 'X-Forwarded-Proto',
:'attribute.protocolHeaderHttpsValue' => 'https'}
]
}
]
}
},
{ :'attribute.name' => 'JSVCBridge',
:Connector => [
{ :'attribute.port' => 8010,
:'attribute.protocol' => 'HTTP/2'}
]
}
]
}
}
The Question
We need source to become result. To get there, :Listener gets merged by attribute.className; :Service gets merged by attribute.name; the resulting Arrays of :Connector get merged by attribute.port; and such. The identification of the location of the Arrays-of-Hashes within the data structure and the key which each is to be merged on should be easily provided to the solution.
The real essence of this question is finding that generic solution that can apply to multiple arbitrary levels of a complex data structure like this, merge Arrays-of-Hashes by a supplied key, and produce the merged result after the set of location and key pairs is provided.
Thank you all very much for your time and interest in this question.
There may be more elegant ways of condensing this code but I finally developed an answer to this very challenging question. While Wand Maker's answer came close, it was based on the untenable assumption that the order of the keys in the Hashes would be predictable and stable. As this is a Ruby 1.8.7 problem and because the data provider makes no such guarantee, I had to take a different path; we had to inform the merge engine which key to use for each Array-of-Hashes.
My (non-optimized) solution required three functions and an external Hash that defines the necessary merge keys:
deepMergeHash walks through a hash, deeply scanning for Arrays
deepMergeArrayOfHashes performs the desired merge against an Array-of-Hashes
subMergeHelper recursively assists deepMergeArrayOfHashes
The trick was to not only treat the Hash recursively, but to always be aware of the "present" location within the Hash so that the necessary merge key could be known. Having established a way to determine that location, defining, finding, and using the merge keys became trivial.
The Solution
def subMergeHelper(lhs, rhs, mergeKeys, crumbTrail)
lhs.merge(rhs){|subKey, subLHS, subRHS|
mergeTrail = crumbTrail + ':' + subKey.to_s
case subLHS
when Array
deepMergeArrayOfHashes(subLHS + subRHS, mergeKeys, mergeTrail)
when Hash
subMergeHelper(subLHS, subRHS, mergeKeys, mergeTrail)
else
subRHS
end
}
end
def deepMergeArrayOfHashes(arrayOfHashes, mergeKeys, crumbTrail)
mergedArray = arrayOfHashes
if arrayOfHashes.all? {|e| e.class == Hash}
if mergeKeys.has_key?(crumbTrail)
mergeKey = mergeKeys[crumbTrail]
mergedArray = arrayOfHashes.group_by{|evalHash| evalHash[mergeKey.to_sym]}.map{|groupID, groupArrayOfHashes|
groupArrayOfHashes.reduce({}){|memoHash, evalHash|
memoHash.merge(evalHash){|hashKey, lhs, rhs|
deepTrail = crumbTrail + ':' + hashKey.to_s
case lhs
when Array
deepMergeArrayOfHashes(lhs + rhs, mergeKeys, deepTrail)
when Hash
subMergeHelper(lhs, rhs, mergeKeys, deepTrail)
else
rhs
end
}
}
}
else
$stderr.puts "[WARNING] deepMergeArrayOfHashes: received an Array of Hashes without merge key at #{crumbTrail}."
end
else
$stderr.puts "[WARNING] deepMergeArrayOfHashes: received an Array containing non-Hashes at #{crumbTrail}?"
end
return mergedArray
end
def deepMergeHash(hashConfig, mergeKeys, crumbTrail = '')
return hashConfig unless Hash == hashConfig.class
mergedConfig = {}
hashConfig.each{|nodeKey, nodeValue|
nodeCrumb = nodeKey.to_s
testTrail = crumbTrail + ':' + nodeCrumb
case nodeValue
when Hash
mergedConfig[nodeKey] = deepMergeHash(nodeValue, mergeKeys, testTrail)
when Array
mergedConfig[nodeKey] = deepMergeArrayOfHashes(nodeValue, mergeKeys, testTrail)
else
mergedConfig[nodeKey] = nodeValue
end
}
return mergedConfig
end
Example Use
Using the data in the question, we can now:
mergeKeys = {
':Server:Listener' => 'attribute.className',
':Server:Service' => 'attribute.name',
':Server:Service:Connector' => 'attribute.port',
':Server:Service:Engine:Host' => 'attribute.name',
':Server:Service:Engine:Host:Valve' => 'attribute.className',
':Server:Service:Engine:Realm:Realm' => 'attribute.className'
}
mergedConfig = deepMergeHash(source, mergeKeys)
I can't seem to perform a successful equality test like (result == mergedConfig), but a visual inspection of mergedConfig shows that it is identical to result except that the order of some keys changes. I suspect that's a side-effect of using Ruby 1.8.x and is acceptable for this question.
Happy coding, everyone and thank you so much for your interest in this discussion.
Solution based on assumption that you are merging hashes based on value of first key in the given array of hashes is given below:
def merge_ary(ary_hash)
# Lets not process something that is not array of hash
return ary_hash if not ary_hash.all? {|h| h.class == Hash }
# If array of hash, lets group them by value of first key
# Then, reduce the resultant group of hashes by merging them.
c = ary_hash.group_by {|h| h.values.first}.map do |k,v|
v_reduced = v.reduce({}) do |memo_hash, h|
memo_hash.merge(h) do |k, v1, v2|
v1.class == Array ? merge_ary(v1 + v2) : v2
end
end
[k, v_reduced]
end
return Hash[c].values
end
def merge_hash(hash)
t = hash.map do |k,v|
new_v = v
if v.class == Hash
new_v = merge_hash(v)
elsif v.class == Array
new_v = merge_ary(v)
end
[k,new_v]
end
return Hash[t]
end
# Test the output
merge_hash(source) == result
#=> true
Related
I am working with an array in this form:
"car_documents_attributes"=>{
"1562523330183"=>{
"id"=>"", "filename"=>"tyYYqHeqSFOnqLHEz5lO_rc_tispor12756_6wldwu.pdf", "document_type"=>"contract"
},
"1562523353208"=>{
"id"=>"", "filename"=>"a9P8TyECRiKbI2YdRVZy_rc_tispor12756_bbtzdz.pdf", "document_type"=>"request"
},
"1562523353496"=>{
"id"=>"", "filename"=>"WCM5FHOfSw6yNSUrfPPm_rc_tispor12756_dqu9r2.pdf", "document_type"=>"notes"
},
...
}
I need to find out if in this array is an item where document_type=contract (there can be none, one or multiple ones).
The way I do it is looping through the array item by item, which can be slow if there are tens of items.
Is there a better and faster way to simply check if in the array is an item with document_type = contract?
That's a hash containing more hashes. What you can do is to access to car_documents_attributes, iterate over those hash values and check if any document_type is "contract":
data = {
"car_documents_attributes" => {
"1562523330183" => { "id" => "", "filename" => "tyYYqHeqSFOnqLHEz5lO_rc_tispor12756_6wldwu.pdf", "document_type" => "contract"},
"1562523353208" => { "id" => "", "filename" => "a9P8TyECRiKbI2YdRVZy_rc_tispor12756_bbtzdz.pdf", "document_type" => "request" },
"1562523353496" => { "id" => "", "filename" => "WCM5FHOfSw6yNSUrfPPm_rc_tispor12756_dqu9r2.pdf", "document_type" => "notes" }
}
}
p data['car_documents_attributes'].any? { |_, doc| doc['document_type'] == 'contract' }
# true
Didn't know it was data coming from the params. If so, you do need to permit what's being received or convert the params to an unsafe hash.
Also, you can try using fetch instead [] when trying to get car_documents_attributes, because if that key isn't in data, it'll throw nil, which would throw a NoMethodError:
data.fetch('car_documents_attributes', []).any? { |_, doc| doc['document_type'] == 'contract' }
Is there a way I can pick a value in hash of array, and reformat it to be only hash?
Is there any method I can do with it?
Example
[
{
"qset_id" => 1,
"name" => "New1"
},
{
"qset_id" => 2,
"name" => "New2"
}
]
Result
{
1 => {
"name" => "New1"
},
2 => {
"name" => "New2"
}
}
You can basically do arbitary manipulation using reduce function on array or hashes, for example this will get your result
array.reduce({}) do |result, item|
result[item["qset_id"]] = { "name" => item["name"] }
result
end
You can do the same thing with each.with_object do:
array.each.with_object({}) do |item, result|
result[item["qset_id"]] = { "name" => item["name"] }
end
it's basically the same thing but you don't have to make each iteration return the result (called a 'memo object').
You could iterate over the first hash and map it into a second hash:
h1.map{|h| {h['qset_id'] => {'name' => h['name']}} }
# => [{1=>{"name"=>"New1"}}, {2=>{"name"=>"New2"}}]
... but that would return an array. You could pull the elements into a second hash like this:
h2 = {}
h1.each do |h|
h2[h['qset_id']] = {'name' => h['name']}
end
>> h2
=> {1=>{"name"=>"New1"}, 2=>{"name"=>"New2"}}
I am new to Perl and have a little idea about hashes. I have a hash of array of hash of array of hash of array of hash (HoAoHoAoHoAoH) as follows.
%my_hash = (
key00 => 'value00',
key01 => [
{ key10 => 'value10',
key11 => 'value11',
key12 => [
{ key20 => 'value20',
key21 => 'value21',
key22 => [
{ key30 => 'value30',
key31 => [
{ color => 'blue', quantity => 10, boxes => [0,1,3] },
{ color => 'red', quantity => 2, boxes => [2,3] },
{ color => 'green', quantity => 5, boxes => [0] },
],
},
],
},
]
}
]
);
What is the easiest way to access the "color", "quantity" and "boxes"? I also need to do arithmetic operations with the "quantity"s, such as 10+2+5 (quantity0+quantity1+quantity2).
This looks a lot like an XY problem. What are you trying to solve here?
You can access an element of your data structure like this:
print $my_hash{key01}[0]{key12}[0]{key22}[0]{key31}[0]{color},"\n";
You can also iterate the bottom elements with:
foreach my $something ( #{ $my_hash{key01}[0]{key12}[0]{key22}[0]{key31} } ) {
print $something->{'color'};
print $something->{'quantity'}
}
But this doesn't look like a real problem - what are you actually trying to accomplish? I might guess you're trying to parse XML or similar, in which case there's almost certainly a better approach.
This is my data structure created by Data::Dumper->Dumper:
$VAR1 = {
'name' => 'genomic',
'class' => [
{
'reference' => [
{
'name' => 'chromosome',
'referenced-type' => 'Chromosome'
},
{
'name' => 'chromosomeLocation',
'referenced-type' => 'Location'
},
{
'name' => 'sequence',
'referenced-type' => 'Sequence'
},
{
'name' => 'sequenceOntologyTerm',
'referenced-type' => 'SOTerm'
}
],
}
],
};
(trimmed for clarity)
I would like to return a reference to an array of each name value under reference in a single line.
Currently I have
$class->[0]{reference}[0..3]{name}
but no avail.
Also this example has four sibling-hashes with indexes 0..3, how can I represent the whole array independent of the number of elements?
There isn't an easy syntax to do that, unfortunately. You'll have to use map:
my $array_ref = [
map { $_->{name} } #{ $class->[0]{reference} }
];
Then, if you dump out $array_ref, you'll see it contains:
$array_ref = [
'chromosome',
'chromosomeLocation',
'sequence',
'sequenceOntologyTerm'
];
If you need references to the original strings (not copies), you just need a backslash before $_ (so it'd be \$_->{name} inside the map).
$class->[0]{reference} is an array reference, so you have to dereference it with #{}:
#{$class->[0]{reference}}
Is the 'whole array', you can then use slice syntax on the end to get a part of it:
#{$class->[0]{reference}}[0..3]
From there you're working with an array of hashrefs, so you'll have to iterate over it with for or map.
I have this output from Dumper
'group' => {
'1104' => {
'a' => 1
},
'52202' => {
'b' => 1,
'c' => 1
},
'52201' => {
'c' => 1
},
'52200' => {
'c' => 1
}
},
which I assume is an Array of Hashes of Arrays of Hashes?
I would like to declare this structure my self.
Is there a way to do this, so next time I see such a complex structure, I can do that in no time? =)
Your output is a hash of hashes of hashes, with the first hash only containing a single element. The {} mark a hash reference, so you'd repeat your data structure thus, where the resulting $hohoh is a refrence to a HoHoH.
my $hohoh = {
'group' => {
'1104' => {
'a' => 1
},
'52202' => {
'b' => 1,
'c' => 1
},
'52201' => {
'c' => 1
},
'52200' => {
'c' => 1
}
},
};
print $hohoh->{group}{1104}{a}; # -> 1
I recommend reading the Perl Datastructures Cookbook.
Since the types of variables, and of hash values, can change in Perl, there isn't any way to "declare" a three-level hash the way you're probably thinking. You can instantiate an empty hashref into each key as it's created, which is a similar idea:
# First pass
my $data = {};
# Later...
$data->{group} = {};
# Still later...
$data->{group}->{1104} = {};
# Finally...
$data->{group}->{1104}->{a} = 1;
But you could just as easily simply fill in the data as you obtain it, allowing autovivification to do its thing:
my $data;
# Fill one piece of data... Perl creates all three hash levels now.
$data->{group}->{1104}->{a} = 1;
# Fill another piece of data, this one has two values in the "bottom" hash.
$data->{group}->{52202} = { b => 1, c => 2};
But there is no way (in plain Perl) to "enforce" that the values for any particular key contain hashes rather than strings or subroutine references, which is usually what is intended by declaration in languages with C-like type systems.