Perl merge hash of array - arrays

I have hash of array of numbers,
I would like to merge the hash elements with common numbers.
eg.
Input Hash
%HoA = (
A1 => [ "1", "2", "3", "4" ],
A2 => [ "5", "6", "7", "8" ],
A3 => [ "1", "9", "10", "11" ],
A4 => [ "12", "13", "14", "10" ],
);
Output Hash
%HoA_output = (
A1 => [ "1", "2", "3", "4", "9", "10", "11", "12", "13", "14" ],
A2 => [ "5", "6", "7", "8" ],
);
I need a solution that could quickly evaluate a hash that has nearly 50k keys with 8 numbers in each array.
regards

This is essentially a graph problem where you want to determine the sets of unconnected components
This solution uses the Graph::Undirected::Components module, whose sole purpose is to do exactly that. Hopefully it will be fast enough for your extended data set, but it is far easier for you to determine that than for me
The program creates a graph, and adds edges (connections) from every key in the data to each element of its value. Then, calling connected_components returns all the distinct sets of nodes — both keys and values — that are connected to one another
The final for loop filters the keys from the values once more using part from List::MoreUtils, based on whether the node value appears as a key in the original hash data. (You will have to adjust this if any of the key values can also appear in the values.) Then the first of the keys together with the sorted value items are used to create a new element in the %result hash
use strict;
use warnings;
use Graph::Undirected::Components;
use List::Util 'minstr';
use List::MoreUtils 'part';
my %data = (
A1 => [ 1, 2, 3, 4 ],
A2 => [ 5, 6, 7, 8 ],
A3 => [ 1, 9, 10, 11 ],
A4 => [ 12, 13, 14, 10 ],
);
my $graph = Graph::Undirected::Components->new;
while ( my ($k, $v) = each %data ) {
$graph->add_edge($k, $_) for #$v;
}
my %result;
for my $component ( $graph->connected_components ) {
my #keys_vals = part { $data{$_} ? 0 : 1 } #$component;
my $key = minstr #{ $keys_vals[0] };
my #values = sort { $a <=> $b } #{ $keys_vals[1] };
$result{$key} = \#values;
}
use Data::Dump;
dd \%result;
output
{ A1 => [1 .. 4, 9 .. 14], A2 => [5 .. 8] }

OK, pretty fundamentally - this isn't an easy one, because you do need to check each element against each other to see if they're present. The best I can come up with is saving some effort by merging lists as you go, and using an index to track dupes.
I would approach it like this:
use strict;
use warnings;
use Data::Dumper;
my %HoA = (
A1 => [ "1", "2", "3", "4" ],
A2 => [ "5", "6", "7", "8" ],
A3 => [ "1", "9", "10", "11" ],
A4 => [ "12", "13", "14", "10" ],
);
print Dumper \%HoA;
my %found;
sub merge_and_delete {
my ( $first_key, $second_key ) = #_;
print "Merging $first_key with $second_key\n";
#use hash to remove dupes.
my %elements;
foreach my $element ( #{ $HoA{$first_key} }, #{ $HoA{$second_key} } )
{
$elements{$element}++;
#update index - don't want to point it to an array we're deleting
$found{$element} = $first_key;
}
#sorting for neatness - you might want to do a numeric sort instead,
#as by default %HoA contains text elements.
$HoA{$first_key} = [ sort keys %elements ];
delete $HoA{$second_key};
}
foreach my $key ( sort keys %HoA ) {
print "$key\n";
foreach my $element ( sort #{ $HoA{$key} } ) {
if ( $found{$element} ) {
#this element is present in another list, we merge.
print "$element found in $found{$element}\n";
merge_and_delete( $found{$element}, $key );
last;
}
else {
#add this unique match to our index
print "$element -> $key\n";
$found{$element} = $key;
}
}
}
print Dumper \%HoA;
You iterate each of the element on %HoA, and make an index table %found. This index table you use to detect if an element has already been seen, and then trigger a merge - and then rebuilt the index. You may need to watch for memory consumption on a large data set though, because your index can grow to be nearly as large as your original data set (if enough unique elements are present).
But because we stop processing on the first match, we don't need to check every key any more, and because we discard the merged array and update the index, we don't need to do an all-to-all comparison any more.

Related

Match beginning of array with another array in mongodb

Let's say I have some documents that have an array like this:
[
{
"_id": ObjectId("5a934e000102030405000000"),
"letters": ["a","b","c","d"]
},
{
"_id": ObjectId("5a934e000102030405000001"),
"letters": ["a","b"]
},
{
"_id": ObjectId("5a934e000102030405000002"),
"letters": ["a"]
},
{
"_id": ObjectId("5a934e000102030405000003"),
"letters": ["x","a","b"]
}
]
I want to retrieve all the documents whose letters array start with an n length array. For example: ["a","b"]
So the result would be like this:
[
{
"_id": ObjectId("5a934e000102030405000000"),
"letters": ["a","b","c","d"]
},
{
"_id": ObjectId("5a934e000102030405000001"),
"letters": ["a","b"]
}
]
I have searched on mongo docs and stack overflow, and the only thing that's close is using $all operator but that's not exactly what I want.
I think it could be done by first slicing the array and then matching it with the query array, but I couldn't find anything.
You can simply use array index in match query,
check 0 index for a value
check 1 index for b value
db.collection.find({
"letters.0": "a",
"letters.1": "b"
})
Playground
Query
slice and take the first 2 of $letters
check if equal with ["a" "b"]
*this is like general solution for any array, to make it work you can replace the 2 with the array size, and the ["a" "b"] with your array
Playmongo
aggregate(
[{"$match": {"$expr": {"$eq": [{"$slice": ["$letters", 2]}, ["a", "b"]]}}}])

Change property name when converting Powershell object to JSON using ConvertTo-Json?

I have a dataset consisting of a two-dimensional array (flexData[5][2]). I have this defined in my Powershell script as follows:
class flexData {
[DateTime]$dateTime
[string]$firmwareVersion
[string[][]]$flexData
}
$flexObj = [flexData]#{dateTime = $(Get-Date); firmwareVersion = 'u031C'; flexData = #(#(0, 1), #(320, 17), #(45, 36), #(0, 0))}
The problem with this is that the output object that ConvertTo-Json spits out is hard to read:
{
"dateTime": "2021-10-11T13:58:25.0937842+02:00",
"firmwareVersion": "u031C",
"flexData": [
[
"0",
"1"
],
[
"320",
"17"
],
[
"45",
"36"
],
[
"0",
"0"
]
]
}
Is there a way to instead of using a single key name and two-dimensional arrays, to instead convert this to flexData0, flexData1 ... flexData4 and keep my actual data as single-dimensional arrays? I could obviously do this by manually defining my class as:
class flexData {
[DateTime]$dateTime
[string]$firmwareVersion
[string[]]$flexData0
[string[]]$flexData1
[string[]]$flexData2
[string[]]$flexData3
[string[]]$flexData4
}
But is there a smarter way of doing this? Especially since I would also like to make a third-dimension of my array to store multiple iterations of flexData?
You could add a constructor to your flexData class that creates an object from the top-level array instead:
class flexData {
[DateTime]$dateTime
[string]$firmwareVersion
[psobject]$flexData
flexData([DateTime]$dateTime, [string]$firmwareVersion, [string[][]]$flexData){
$this.dateTime = $dateTime
$this.firmwareVersion = $firmwareVersion
# Create object from nested array
$dataProperties = [ordered]#{}
for($i = 0; $i -lt $flexData.Length; $i++){
$dataProperties["$i"] = $flexData[$i]
}
$this.flexData = [pscustomobject]$dataProperties
}
}
Now, the individual outer array items will be listed as properties named 0 through (N-1):
PS ~> $data = [flexData]::new($(Get-Date), 'u031C', #(#(0, 1), #(320, 17), #(45, 36), #(0, 0)))
PS ~> $data |ConvertTo-Json
{
"dateTime": "2021-10-11T14:21:48.4026882+02:00",
"firmwareVersion": "u031C",
"flexData": {
"0": [
"0",
"1"
],
"1": [
"320",
"17"
],
"2": [
"45",
"36"
],
"3": [
"0",
"0"
]
}
}

access / print nth element of sub array, for every array

I have a multidimensional array:
#multarray = ( [ "one", "two", "three" ],
[ 4, 5, 6, ],
[ "alpha", "beta", "gamma" ]
);
I can access #multarray[0]
[
[0] [
[0] "one"
[1] "two"
[2] "three"
]
]
or even #multarray[0][0]
"one"
But how to I access say the 1st sub element of every sub array? something akin to multarray[*][0] so produce:
"one"
4
"alpha"
Thanks!
You can use map and dereference each array:
use warnings;
use strict;
use Data::Dumper;
my #multarray = (
[ "one", "two", "three" ],
[ 4, 5, 6, ],
[ "alpha", "beta", "gamma" ]
);
my #subs = map { $_->[0] } #multarray;
print Dumper(\#subs);
__END__
$VAR1 = [
'one',
4,
'alpha'
];
See also: perldsc
Using a for() loop, you can loop over the outer array, and use any of the inner elements. In this example, I've set $elem_num to 0, which is the first element. For each loop over the outer array, we take each element (which is an array reference), then, using the $elem_num variable, we print out the contents of the inner array's first element:
my $elem_num = 0;
for my $elem (#multarray){
print "$elem->[$elem_num]\n";
}

Remove value from Array of Dictionaries

How I can remove a dict from an array of dictionaries?
I have an array of dictionaries like so: var posts = [[String:String]]() - and I would like to remove a dictionary with a specific key, how would I proceed? I know that I can remove a value from a standard array by doing Array.removeAtIndex(_:) or a dictionary key by doing key.removeValue(), but an array of dictionaries is more tricky.
Thanks in advance!
If I understood you questions correctly, this should work
var posts: [[String:String]] = [
["a": "1", "b": "2"],
["x": "3", "y": "4"],
["a": "5", "y": "6"]
]
for (index, var post) in posts.enumerate() {
post.removeValueForKey("a")
posts[index] = post
}
/*
This will posts = [
["b": "2"],
["y": "4", "x": "3"],
["y": "6"]
]
*/
Since both your dictionary and the wrapping array are value types just modifying the post inside of the loop would modify a copy of dictionary (we created it by declaring it using var), so to conclude we need to set the value at index to the newly modified version of the dictionary
Removing all Dictionaries with a given key
let input = [
[
"Key 1" : "Value 1",
"Key 2" : "Value 2",
],
[
"Key 1" : "Value 1",
"Key 2" : "Value 2",
],
[
"Key 1" : "Value 1",
"Key 2" : "Value 2",
"Key 3" : "Value 3",
],
]
let keyToRemove = "Key 3"
//keep dicts only if their value for keyToRemove is nil (meaning key doesn't exist)
let result = input.filter{ $0[keyToRemove] == nil }
print("Input:\n")
dump(input)
print("\n\nAfter removing all dicts which have the key \"\(keyToRemove)\":\n")
dump(result)
You can see this code in action here.
Removing the only the first Dictionary with a given key
var result = input
//keep dicts only if their value for keyToRemove is nil (meaning key doesn't exist)
for (index, dict) in result.enumerate() {
if (dict[keyToRemove] != nil) { result.removeAtIndex(index) }
}
print("Input:\n")
dump(input)
print("\n\nAfter removing all dicts which have the key \"\(keyToRemove)\":\n")
dump(result)
You can see this code in action here.

Why is my sort function not working

I have a file which I have read into an array, with multiple columns and I want to sort numerically by the second column. I've looked up countless similar questions and tried to directly incorporate the answers given.
here is the basic code I am using:
use strict;
use warnings;
use diagnostics;
my #arrayed = (
"\ndog", "10", "barks",
"\ncat", "20", "meows",
"\nfish", "5", "plop",
"\nant", "30", "walk",
);
print "#arrayed";
print "\n";
my #sortedarray = sort { $a->[1] <=> $b->[1] } #arrayed;
print "#sortedarray";
exit;
This gives me an error cant use string ("dog") as an array reference while strict is turned on. I tried a few other examples with other files, arrays but always get this message so I assume there must be something intrinsically wrong with my code.
could anybody more experienced shed a little light on what I'm doing wrong please, and allow me to sort by the numbered column while still maintaining the row structure.
You have a flat array, but you want an array-of-arrays:
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
my #arrayed = (
["dog", "10", "barks"],
["cat", "20", "meows"],
["fish", "5", "plop"],
["ant", "30", "walk"],
);
print Dumper(\#arrayed);
my #sortedarray = sort { $a->[1] <=> $b->[1] } #arrayed;
print Dumper(\#sortedarray);
__END__
$VAR1 = [
[
'dog',
'10',
'barks'
],
[
'cat',
'20',
'meows'
],
[
'fish',
'5',
'plop'
],
[
'ant',
'30',
'walk'
]
];
$VAR1 = [
[
'fish',
5,
'plop'
],
[
'dog',
10,
'barks'
],
[
'cat',
20,
'meows'
],
[
'ant',
30,
'walk'
]
];
Your assignment does not create a multi-dimensional array:
my #arrayed = (
"\ndog", "10", "barks",
"\ncat", "20", "meows",
"\nfish", "5", "plop",
"\nant", "30", "walk",
);
You would need to use array references inside those parentheses:
my #arrayed = (
[ "\ndog", "10", "barks" ],
[ "\ncat", "20", "meows" ],
[ "\nfish", "5", "plop" ],
[ "\nant", "30", "walk" ]
);
The brackets [ ... ] create anonymous array references, which can then be stored in the array.
One of the most important things to know when debugging is what your data looks like. Doing something like what you did
print "#arrayed";
Is not very useful, since it will only show a list of the elements separated by space. Also, if you had done this with a multi-dimensional array, you would get output like this:
ARRAY(0x7fd658) ARRAY(0x7fd7f0)
Which is what array references look like when stringified. Instead, you should use the Data::Dumper module:
use Data::Dumper;
print Dumper \#arrayed;
Notice that you are printing a reference to the array. The output would be a data structure looking like what toolic has shown in his answer:
$VAR1 = [
[ ...
Note that the brackets, again, denote array references.

Resources