Unique Combos from powershell array - No duplicate combos - arrays

I'm trying to figure out the best way to get unique combinations from a powershell array. For instance, my array might be
#(B,C,D,E)
I would be hoping for an output like this :
B
C
D
E
B,C
B,D
B,E
C,D
C,E
D,E
B,C,D
C,D,E
B,C,D,E
I do not want re-arranged combos. If combo C,D exists already then I do not want combo D,C. It's redundant for my purposes.
I looked into the functions here : Get all combinations of an array
But they aren't what I want. I've been working on figuring this out myself, but have spent quite a bit of time without success. I thought I'd ask the question here so that if someone else already know I'm not wasting my time.
Thanks!

This is an adaptation from a solution for a C# class I took that asked this same question. For any set find all subsets, including the empty set.
function Get-Subsets ($a){
#uncomment following to ensure only unique inputs are parsed
#e.g. 'B','C','D','E','E' would become 'B','C','D','E'
#$a = $a | Select-Object -Unique
#create an array to store output
$l = #()
#for any set of length n the maximum number of subsets is 2^n
for ($i = 0; $i -lt [Math]::Pow(2,$a.Length); $i++)
{
#temporary array to hold output
[string[]]$out = New-Object string[] $a.length
#iterate through each element
for ($j = 0; $j -lt $a.Length; $j++)
{
#start at the end of the array take elements, work your way towards the front
if (($i -band (1 -shl ($a.Length - $j - 1))) -ne 0)
{
#store the subset in a temp array
$out[$j] = $a[$j]
}
}
#stick subset into an array
$l += -join $out
}
#group the subsets by length, iterate through them and sort
$l | Group-Object -Property Length | %{$_.Group | sort}
}
Use like so:
PS C:>Get-Subsets #('b','c','d','e')
b
c
d
e
bc
bd
be
cd
ce
de
bcd
bce
bde
cde
bcde
Note that computational costs go up exponentially with the length of the input array.
Elements SecondstoComplete
15 46.3488228
14 13.4836299
13 3.6316713
12 1.2542701
11 0.4472637
10 0.1942997
9 0.0867832

My tired attempt at this. I did manage to get it to produce the expected results but how it does it is not as elegant. Uses a recursive functionality.
Function Get-Permutations{
Param(
$theInput
)
$theInput | ForEach-Object{
$element = $_
$sansElement = ($theInput | Where-Object{$_ -ne $element})
If($sansElement.Count -gt 1){
# Build a collection of permutations using the remaining elements that were not isolated in this pass.
# Use the single element since it is a valid permutation
$perms = ,$element
For($elementIndex = 0;$elementIndex -le ($sansElement.Count - 1);$elementIndex++){
$perms += ,#(,$element + $sansElement[0..$elementIndex] | sort-object)
}
# For loop does not send to output properly so that is the purpose of collecting the results of this pass in $perms
$perms
# If there are more than 2 elements in $sansElement then we need to be sure they are accounted for
If($sansElement -gt 2){Get-Permutations $sansElement}
}
}
}
Get-Permutations B,C,D,E | %{$_ -join ","} | Sort-Object -Unique
I hope I can explain myself clearly....So each pass of the function will take an array. Each individual element of that array will be isolated from the rest of the array which is represented by the variables $element and $sansElement.
Using those variables we build individual and progressively larger arrays composing of those elements. Let this example show using the array 1,2,3,4
1
1,2
1,2,3
1,2,3,4
The above is done for each "number"
2
2,1
2,1,3
2,1,3,4
and so forth. If the returned array contains more that two elements (1,2 would be the same as 2,1 in your example so we don't care about pairs beyond one match) we would take that array and run it through the same function.
The real issue is that the logic here (I know this might be hard to swallow) creates several duplicates. I suppose you could create a hashtable instead which I will explore but it does not remove the logic flaw.
Regardless of me beating myself up as long as you don't have thousands of elements the process would still produce results.
Get-Permutations would return and array of arrays. PowerShell would display that one element per line. You asked for comma delimited output which is where -join comes in. Sort-Object -Unique takes those sorted string an discards the duplicates.
Sample Output
B
B,C
B,C,D
B,C,D,E
B,C,E #< Missing from your example output.
B,D
B,D,E #< Missing from your example output.
B,E
C
C,D
C,D,E
C,E
D
E

Related

Create and split an array twice all inline in Powershell

I have the following code which works but I am looking for a way to do this all inline without the need for creating the unnecessary variables $myArray1 and $myArray2:
$line = "20190208 10:05:00,Source,Severity,deadlock victim=process0a123b4";
$myArray1 = $line.split(",");
$myArray2 = $myArray1[3].split("=");
$requiredValue = $myArray2[1];
So I have a string $line which I want to:
split by commas into an array.
take the fourth item [3] of the new array
split this by the equals sign into another array
take the second item of this array [1]
and store the string value in a variable.
I have tried using Select -index but I haven't been able to then pipe the result and split it again.
The following works:
$line.split(",") | Select -index 3
However, the following results in an error:
$line.split(",") | Select -index 3 | $_.split("=") | Select -index 1
Error message: Expressions are only allowed as the first element of a pipeline.
$line.Split(',')[3].Split('=')[1]
Try below code:
$requiredValue = "20190208 10:05:00,Source,Severity,deadlock victim=process0a123b4" -split "," -split "=" | select -Last 1
Mudit already provided an answer, here's another about your particular case.
Piping to foreach and accessing 2nd element does the trick:
$line.split(",") | Select -index 3 | % {$_.split("=")[1]}
process0a123b4
That being said, aim for readability and ease of maintenance. There's nothing wrong with having intermediate variables. Memory is cheap nowadays, programmers' time is not. Optimization is due when it's needed and only then after careful profiling to see what's the actual bottleneck.
You could pipe the second split to a foreach
$line.split(",") | Select -index 3 | foreach { $_.split("=") | Select -index 1 }

How to split an array into chunks with jq?

I have a very large JSON file containing an array. Is it possible to use jq to split this array into several smaller arrays of a fixed size? Suppose my input was this: [1,2,3,4,5,6,7,8,9,10], and I wanted to split it into 3 element long chunks. The desired output from jq would be:
[1,2,3]
[4,5,6]
[7,8,9]
[10]
In reality, my input array has nearly three million elements, all UUIDs.
There is an (undocumented) builtin, _nwise, that meets the functional requirements:
$ jq -nc '[1,2,3,4,5,6,7,8,9,10] | _nwise(3)'
[1,2,3]
[4,5,6]
[7,8,9]
[10]
Also:
$ jq -nc '_nwise([1,2,3,4,5,6,7,8,9,10];3)'
[1,2,3]
[4,5,6]
[7,8,9]
[10]
Incidentally, _nwise can be used for both arrays and strings.
(I believe it's undocumented because there was some doubt about an appropriate name.)
TCO-version
Unfortunately, the builtin version is carelessly defined, and will not perform well for large arrays. Here is an optimized version (it should be about as efficient as a non-recursive version):
def nwise($n):
def _nwise:
if length <= $n then . else .[0:$n] , (.[$n:]|_nwise) end;
_nwise;
For an array of size 3 million, this is quite performant:
3.91s on an old Mac, 162746368 max resident size.
Notice that this version (using tail-call optimized recursion) is actually faster than the version of nwise/2 using foreach shown elsewhere on this page.
The following stream-oriented definition of window/3, due to Cédric Connes
(github:connesc), generalizes _nwise,
and illustrates a
"boxing technique" that circumvents the need to use an
end-of-stream marker, and can therefore be used
if the stream contains the non-JSON value nan. A definition
of _nwise/1 in terms of window/3 is also included.
The first argument of window/3 is interpreted as a stream. $size is the window size and $step specifies the number of values to be skipped. For example,
window(1,2,3; 2; 1)
yields:
[1,2]
[2,3]
window/3 and _nsize/1
def window(values; $size; $step):
def checkparam(name; value): if (value | isnormal) and value > 0 and (value | floor) == value then . else error("window \(name) must be a positive integer") end;
checkparam("size"; $size)
| checkparam("step"; $step)
# We need to detect the end of the loop in order to produce the terminal partial group (if any).
# For that purpose, we introduce an artificial null sentinel, and wrap the input values into singleton arrays in order to distinguish them.
| foreach ((values | [.]), null) as $item (
{index: -1, items: [], ready: false};
(.index + 1) as $index
# Extract items that must be reused from the previous iteration
| if (.ready | not) then .items
elif $step >= $size or $item == null then []
else .items[-($size - $step):]
end
# Append the current item unless it must be skipped
| if ($index % $step) < $size then . + $item
else .
end
| {$index, items: ., ready: (length == $size or ($item == null and length > 0))};
if .ready then .items else empty end
);
def _nwise($n): window(.[]; $n; $n);
Source:
https://gist.github.com/connesc/d6b87cbacae13d4fd58763724049da58
If the array is too large to fit comfortably in memory, then I'd adopt the strategy suggested by #CharlesDuffy -- that is, stream the array elements into a second invocation of jq using a stream-oriented version of nwise, such as:
def nwise(stream; $n):
foreach (stream, nan) as $x ([];
if length == $n then [$x] else . + [$x] end;
if (.[-1] | isnan) and length>1 then .[:-1]
elif length == $n then .
else empty
end);
The "driver" for the above would be:
nwise(inputs; 3)
But please remember to use the -n command-line option.
To create the stream from an arbitrary array:
$ jq -cn --stream '
fromstream( inputs | (.[0] |= .[1:])
| select(. != [[]]) )' huge.json
So the shell pipeline might look like this:
$ jq -cn --stream '
fromstream( inputs | (.[0] |= .[1:])
| select(. != [[]]) )' huge.json |
jq -n -f nwise.jq
This approach is quite performant. For grouping a stream of 3 million items into groups of 3 using nwise/2,
/usr/bin/time -lp
for the second invocation of jq gives:
user 5.63
sys 0.04
1261568 maximum resident set size
Caveat: this definition uses nan as an end-of-stream marker. Since nan is not a JSON value, this cannot be a problem for handling JSON streams.
here's a simple one that worked for me:
def chunk(n):
range(length/n|ceil) as $i | .[n*$i:n*$i+n];
example usage:
jq -n \
'def chunk(n): range(length/n|ceil) as $i | .[n*$i:n*$i+n];
[range(5)] | chunk(2)'
[
0,
1
]
[
2,
3
]
[
4
]
bonus: it doesn't use recursion and doesn't rely on _nwise, so it also works with jaq.
The below is hackery, to be sure -- but memory-efficient hackery, even with an arbitrarily long list:
jq -c --stream 'select(length==2)|.[1]' <huge.json \
| jq -nc 'foreach inputs as $i (null; null; [$i,try input,try input])'
The first piece of the pipeline streams in your input JSON file, emitting one line per element, assuming the array consists of atomic values (where [] and {} are here included as atomic values). Because it runs in streaming mode it doesn't need to store the entire content in memory, despite being a single document.
The second piece of the pipeline repeatedly reads up to three items and assembles them into a list.
This should avoid needing more than three pieces of data in memory at a time.

Comparing pairs of arrays in Powershell

I've got multiple pairs of Arrays.
The first of the pair contains multiple strings.
The second contains a timestamp belonging to the string at the same position of the first array.
Here are two pairs of arrays for an example:
#String array
M681
T9997
E61
H717
K700
#Timestamp Array
11:00
11:05
11:05
11:10
11:15
The second pair of Arrays would look like this:
#String array
B722
T732
L999
M681
I125
#Timestamp Array
11:00
11:00
11:05
11:10
11:15
I want to find matches where a string is in two pairs and then i want to measure the time difference between both entries.
Comparing the string-arrays was simply done by (Compare-Object -IncludeEqual -ReferenceObject $a_string -DifferenceObject $b_string -ExcludeDifferent).InputObject, but with this i couldnt compare the responding timestamps.
So my next idea was forging each pair together in a hashtable:
$hash = #{}
$hash.Add($a_string[$i],$a_timestamp[$i])
But i noticed that this would not work in my enviroment since both the strings and the timestamps could reoccur in one array, so setting one of them as the key of the hash would not be possible.
Next I tried creating a hashtable with an array as value and an index as key:
$hash.Add($indexNumber,#($a_string[$i],$a_timestamp[$i]))
Even though $hash.indexNumber prints out both values, i cant seem to get on of the values by $hash.indexNumber[0] / $hash.indexNumber[1]
I would appreciate if someone could tell me the best practice with this kind of situation or could tell me how I can successfully get the seperate Values of the array inside the hash and compare them to other array values in different hashes.
Kind regards,
Paxz.
So this is a relatively naive solution and will probably need some work to properly handle edge cases, but hopefully it will get you going in the right direction:
$arrayNames1 = #('M681','T9997','E61','H717','K700')
$arrayTimes1 = #('11:00','11:05','11:05','11:10','11:15')
$arrayNames2 = #('B722','T732','L999','M681','I125')
$arrayTimes2 = #('11:00','11:00','11:05','11:10','11:15')
function Find-AllIndexesOf{
param(
$array,
$term
)
$i = 0
foreach($item in $array){
if($item -eq $term){
$i
}
$i++
}
}
foreach($name in $arrayNames1){
if($arrayNames2.Contains($name)){
$array1Locations = Find-AllIndexesOf $arrayNames1 $name
$array2Locations = Find-AllIndexesOf $arrayNames2 $name
$i = 0
foreach($location in $array1Locations){
$thisTime1 = [datetime]$arrayTimes1[$array1Locations]
$thisTime2 = [datetime]$arrayTimes2[$array2Locations[$i]]
Write-Host ('I found a time difference for {0} that started at {1} and ended at {2} for {3} total minutes' -f $name, $thisTime1, $thisTime2, ($thisTime2 - $thisTime1).TotalMinutes)
$i++
}
}
}
But i noticed that this would not work in my enviroment since both the strings and the timestamps could reoccur in one array, so setting one of them as the key of the hash would not be possible.
I am unclear on why a simple hashtable does not work in your use case.
$table = #{}
for ($i = 0; $i -lt $a_string.count; $i++){
if(!$table.Contains($a_string[$i])){
$table.Add($a_string[$i], #())
$table[$a_string[$i]] += $a_timestamp[$i]
}else{
$table[$a_string[$i]] += $a_timestamp[$i]
}
}
This would generate a table mapped with Strings to times which could then be used to compare against other pairs.

PowerShell - Create an array that ignores duplicate values

Curious if there a construct in PowerShell that does this?
I know you can do this:
$arr = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$arr = $arr | Get-Unique
But seems like performance-wise it would be better to ignore the value as you are entering it into the array instead of filtering out after the fact.
If are you inserting a large number of items in to an array (thousands) the performance does drop, because the array needs to be reinitialized every time you add to it so it may be better in your case, performance wise, to use something else.
Dictionary, or HashTable could be a way. Your single dimensional unique array could be retrieved with $hash.Keys For example:
$hash = ${}
$hash.Set_Item(1,1)
$hash.Set_Item(2,1)
$hash.Set_Item(1,1)
$hash.Keys
1
2
If you use Set_Item, the key will be created or updated but never duplicated. Put anything else for the value if you're not using it, But maybe you'll have a need for a value with your problem too.
You could also use an Arraylist:
Measure-Command -Expression {
$bigarray = $null
$bigarray = [System.Collections.ArrayList]#()
$bigarray = (1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$bigarray | select -Unique
}
Time passed:
TotalSeconds : 0,0006581
TotalMilliseconds : 0,6581
Measure-Command -Expression {
$array = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$array | select -Unique
}
Time passed:
TotalSeconds : 0,0009261
TotalMilliseconds : 0,9261

Powershell Leading zeros are trimmed in array using Measure-Object cmdlet

When using Powershell to find out the maximum or minimum value in a string array, the leading zeros of the outcome string are trimmed. How to retain the zeros?
$arr = #("0001", "0002", "0003")
($arr | Measure-Object -Maximum).Maximum
>>> 3
Enumerating the array is the fastest method:
$max = ''
foreach ($el in $arr) {
if ($el -gt $max) {
$max = $el
}
}
$max
Or use SortedSet from .NET 4 framework (built-in since Win 8), it's 2 times faster than Measure-Object but two times slower than the manual enumeration above. Still might be useful if you plan to sort the data without duplicates quickly: it's faster than the built-in Sort-Object.
([Collections.Generic.SortedSet[string]]$arr).max
Obviously, it'll allocate some memory for the array index, but not the actual data as it'll be reused from the existing array. If you're concerned about it, just force garbage collection with [gc]::Collect()
try this
$arr = #("0001", "0002", "0003")
$arr | sort -Descending | select -First 1

Resources