PowerShell - Create an array that ignores duplicate values - arrays

Curious if there a construct in PowerShell that does this?
I know you can do this:
$arr = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$arr = $arr | Get-Unique
But seems like performance-wise it would be better to ignore the value as you are entering it into the array instead of filtering out after the fact.

If are you inserting a large number of items in to an array (thousands) the performance does drop, because the array needs to be reinitialized every time you add to it so it may be better in your case, performance wise, to use something else.
Dictionary, or HashTable could be a way. Your single dimensional unique array could be retrieved with $hash.Keys For example:
$hash = ${}
$hash.Set_Item(1,1)
$hash.Set_Item(2,1)
$hash.Set_Item(1,1)
$hash.Keys
1
2
If you use Set_Item, the key will be created or updated but never duplicated. Put anything else for the value if you're not using it, But maybe you'll have a need for a value with your problem too.

You could also use an Arraylist:
Measure-Command -Expression {
$bigarray = $null
$bigarray = [System.Collections.ArrayList]#()
$bigarray = (1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$bigarray | select -Unique
}
Time passed:
TotalSeconds : 0,0006581
TotalMilliseconds : 0,6581
Measure-Command -Expression {
$array = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$array | select -Unique
}
Time passed:
TotalSeconds : 0,0009261
TotalMilliseconds : 0,9261

Related

Powershell Compare 2 Arrays of Hashtables based on a property value

I have one array of hashtables like the one below:
$hashtable1 = #{}
$hashtable1.name = "aaa"
$hashtable1.surname =#()
$hashtable1.surname += "bbb"
$hashtable2 = #{}
$hashtable2.name = "aaa"
$hashtable2.surname =#()
$hashtable2.surname += "ccc"
$hashtable3 = #{}
$hashtable3.name = "bbb"
$hashtable3.surname = #()
$hashtable3.surname += "xxx"
$A = #($hashtable1; $hashtable2; $hashtable3)
I need to iterate though the array and I need to find out duplicates based on hashtable[].name
Then I need to group those hashtable.surname to hashtable[].surname so that the result will be an array of hashtables that will group all for name all the surnames:
$hashtable1.name = "aaa"
$hashtable1.surname = ("bbb","ccc")
$hashtable3.name = "bbb"
$hashtable3.surname = ("xxx")
I was looking into iterating to empty array
+
I have found this link:
powershell compare 2 arrays output if match
but I am not sure on how to reach into the elements of the hashtable.
My options:
I was wondering if -contain can do it.
I have read about compare-object but I am not sure it can be done like that.
(It looks a bit scary in the moment)
I am on PS5.
Thanks for your help,
Aster
You can group your array items by the names using a scriptblock like so.
Once grouped, you can easily build your output to do what you seek.
#In PS 7.0+ you can use Name directly but earlier version requires the use of the scriptblock when dealing with arrays of hashtables.
$Output = $A | Group-Object -Property {$_.Name} | % {
[PSCustomObject]#{
Name = $_.Name
Surname = $_.Group.Surname | Sort-Object -Unique
}
}
Here is the output variable content.
Name Surname
---- -------
aaa {bbb, ccc}
bbb xxx
Note
Improvements have been made in PS 7.0 that allows you to use simply the property name (eg: Name) in Group-Object for arrays of hashtables, just like you would do for any other arrays type. For earlier version though, these particular arrays must be accessed by passing the property in a scriptblock, like so: {$_.Name}
References
MSDN - Group_Object
SS64 - Group Object
Dr Scripto - Use a Script block to create custom groupings in PowerShell

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

Why cant I store a hashtable in an arraylist?

This is my code
$allTests = New-Object System.Collections.ArrayList
$singleTest = #{}
$singleTest.add("Type", "Human")
1..10 | foreach {
$singleTest.add("Count", $_)
$singleTest.add("Name", "FooBar...whatever..$_")
$singleTest.add("Age", $_)
$allTests.Add($singleTest) | out-null
$singleTest.remove("Count")
$singleTest.remove("Name")
$singleTest.remove("Age")
}
From my understanding my loop should be adding a copy of the hashtable to the arraylist everytime it gets to
$allTests.Add($singleTest) | out-null
the loop continues on, removes some keys and this paves the way for the next iteration of the loop . Thats not what happening, its like the add command is only adding a reference to the hashtable.
If I check the final value of
$allTests
this gets returned
Name Value
---- -----
Type Human
Type Human
Type Human
Type Human
Type Human
Type Human
Type Human
Type Human
Type Human
Type Human
How do I fix this so a actual copy of the hashtable is stored in the array list ?
I'm looking for an ouput like
$allTests[0]
Name Value
---- -----
Count 1
Name FooBar...whatever..1
Age 1
Type Human
$allTests[1]
Name Value
---- -----
Count 2
Name FooBar...whatever..2
Age 2
Type Human
Hashtables are references, when you create one object all further operations are against that one hashtable, including trying to retrieve that information.
You can declate a new hashtable each run of the loop to get around this.
$allTests = New-Object System.Collections.ArrayList
1..10 | foreach {
$singleTest = #{}
$singleTest.add("Type", "Human")
$singleTest.add("Count", $_)
$singleTest.add("Name", "FooBar...whatever..$_")
$singleTest.add("Age", $_)
$allTests.Add($singleTest) | Out-Null
}
or even this to cut out some bloat.
$allTests = New-Object System.Collections.ArrayList
1..10 | foreach {
$allTests.Add(#{
Type = "Human"
Count = $_
Name = "FooBar...Whatever..$_"
Age = $_
}) | Out-Null
}
Both of these answers will give you the expected output.
#ConnorLSW's answer is spot on functionally.
I have another solution for you that gives you more flexibility. I find myself building custom objects that share some fields so instead of making new objects every run of the loop you could define the base object outside the loop just as you are now and then inside the loop you can change a property value for that instance and then add it to your collection like this:
$allTests.Add($singleTest.Psobject.Copy())
This copys the contents to a new object before inserting it. Now you are not referencing the same object as you are changing during the next iteration of the loop.
Since hash tables are passed by reference, you're just adding multiple references to the same hash table to your arraylist. You need to create a new copy of the hash table and then add that to your array list.
One option is to use the hash table .clone() method when you want to save a copy to the arraylist.
$allTests.Add($singleTest.clone()) | out-null

Powershell Leading zeros are trimmed in array using Measure-Object cmdlet

When using Powershell to find out the maximum or minimum value in a string array, the leading zeros of the outcome string are trimmed. How to retain the zeros?
$arr = #("0001", "0002", "0003")
($arr | Measure-Object -Maximum).Maximum
>>> 3
Enumerating the array is the fastest method:
$max = ''
foreach ($el in $arr) {
if ($el -gt $max) {
$max = $el
}
}
$max
Or use SortedSet from .NET 4 framework (built-in since Win 8), it's 2 times faster than Measure-Object but two times slower than the manual enumeration above. Still might be useful if you plan to sort the data without duplicates quickly: it's faster than the built-in Sort-Object.
([Collections.Generic.SortedSet[string]]$arr).max
Obviously, it'll allocate some memory for the array index, but not the actual data as it'll be reused from the existing array. If you're concerned about it, just force garbage collection with [gc]::Collect()
try this
$arr = #("0001", "0002", "0003")
$arr | sort -Descending | select -First 1

create a new variable for each loop in a foreach-loop

How can I put $org into an array together with $count?
Like this example array:
$myArray = #{
1="SampleOrg";
2="AnotherSampleOrg"
}
Another example:
$myArray = #{
$count="$org";
$count="$org"
}
Example foreach:
$count=0;get-organization | foreach {$count++; $org = $_.Name.ToString();write-host $count -nonewline;write-host " $org"}
$answer = read-host "Select 1-$count"
The above will display:
1 SampleOrg
2 AnotherSampleOrg
Select 1-2:
What I would like to do afterwards is to put the array to use in a switch.
Example:
switch ($answer)
{
1 {$org=myArray[1]} #<-- or whatever that corresponds to "SampleOrg"
2 {$org=myArray[2]} #<-- or whatever that corresponds to "AnotherSampleOrg"
}
You have to initialize your hashtable somewhere before the loop:
$myArray = #{}
and add a
$myArray.Add($count, $org)
to your foreach-loop.
EDIT: For the discussion about hastable/array see the whole thread ;) I just kept the name of the variable from the original posting
Looks like you're confusing arrays and Hashtables. Arrays are ordered, and indexed by an numeric value. Hashtables are associative, and indexed by any value that has equality defined.
This is array syntax
$arr = #(1,2,3)
and this is Hashtable syntax
$ht = #{red=1;blue=2;}
For your question, the following will work
$orgs = #(get-organization | % { $_.Name })
this will create a 0 based array, mapping int -> OrgName, so
$orgs[$answer]
will get the correct name. Or if you're using 1 based indexing
$orgs[$answer-1]
Note, I removed the switch, as there's no reason for it.

Resources