PowerShell array initialization - arrays

What's the best way to initialize an array in PowerShell?
For example, the code
$array = #()
for($i=0; $i -lt 5;$i++)
{
$array[$i] = $FALSE
}
generates the error
Array assignment failed because index '0' was out of range.
At H:\Software\PowerShell\TestArray.ps1:4 char:10
+ $array[$ <<<< i] = $FALSE

Here's two more ways, both very concise.
$arr1 = #(0) * 20
$arr2 = ,0 * 20

You can also rely on the default value of the constructor if you wish to create a typed array:
> $a = new-object bool[] 5
> $a
False
False
False
False
False
The default value of a bool is apparently false so this works in your case. Likewise if you create a typed int[] array, you'll get the default value of 0.
Another cool way that I use to initialze arrays is with the following shorthand:
> $a = ($false, $false, $false, $false, $false)
> $a
False
False
False
False
False
Or if you can you want to initialize a range, I've sometimes found this useful:
> $a = (1..5)
> $a
1
2
3
4
5
Hope this was somewhat helpful!

Yet another alternative:
for ($i = 0; $i -lt 5; $i++)
{
$arr += #($false)
}
This one works if $arr isn't defined yet.
NOTE - there are better (and more performant) ways to do this... see https://stackoverflow.com/a/234060/4570 below as an example.

The original example returns an error because the array is created empty, then you try to access the nth element to assign it a value.
The are a number of creative answers here, many I didn't know before reading this post. All are fine for a small array, but as n0rd points out, there are significant differences in performance.
Here I use Measure-Command to find out how long each initialization takes. As you might guess, any approach that uses an explicit PowerShell loop is slower than those that use .Net constructors or PowerShell operators (which would be compiled in IL or native code).
Summary
New-Object and #(somevalue)*n are fast (around 20k ticks for 100k elements).
Creating an array with the range operator n..m is 10x slower (200k ticks).
Using an ArrayList with the Add() method is 1000x slower than the baseline (20M ticks), as is looping through an already-sized array using for() or ForEach-Object (a.k.a. foreach,%).
Appending with += is the worst (2M ticks for just 1000 elements).
Overall, I'd say array*n is "best" because:
It's fast.
You can use any value, not just the default for the type.
You can create repeating values (to illustrate, type this at the powershell prompt: (1..10)*10 -join " " or ('one',2,3)*3)
Terse syntax.
The only drawback:
Non-obvious. If you haven't seen this construct before, it's not apparent what it does.
But keep in mind that for many cases where you would want to initialize the array elements to some value, then a strongly-typed array is exactly what you need. If you're initializing everything to $false, then is the array ever going to hold anything other than $false or $true? If not, then New-Object type[] n is the "best" approach.
Testing
Create and size a default array, then assign values:
PS> Measure-Command -Expression {$a = new-object object[] 100000} | Format-List -Property "Ticks"
Ticks : 20039
PS> Measure-Command -Expression {for($i=0; $i -lt $a.Length;$i++) {$a[$i] = $false}} | Format-List -Property "Ticks"
Ticks : 28866028
Creating an array of Boolean is bit little slower than and array of Object:
PS> Measure-Command -Expression {$a = New-Object bool[] 100000} | Format-List -Property "Ticks"
Ticks : 130968
It's not obvious what this does, the documentation for New-Object just says that the second parameter is an argument list which is passed to the .Net object constructor. In the case of arrays, the parameter evidently is the desired size.
Appending with +=
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 100000; $i++) {$a+=$false} } | Format-List -Property "Ticks"
I got tired of waiting for that to complete, so ctrl+c then:
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 100; $i++) {$a+=$false} } | Format-List -Property "Ticks"
Ticks : 147663
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 1000; $i++) {$a+=$false} } | Format-List -Property "Ticks"
Ticks : 2194398
Just as (6 * 3) is conceptually similar to (6 + 6 + 6), so ($somearray * 3) ought to give the same result as ($somearray + $somearray + $somearray). But with arrays, + is concatenation rather than addition.
If $array+=$element is slow, you might expect $array*$n to also be slow, but it's not:
PS> Measure-Command -Expression { $a = #($false) * 100000 } | Format-List -Property "Ticks"
Ticks : 20131
Just like Java has a StringBuilder class to avoid creating multiple objects when appending, so it seems PowerShell has an ArrayList.
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 1000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 447133
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 10000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 2097498
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 100000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 19866894
Range operator, and Where-Object loop:
PS> Measure-Command -Expression { $a = 1..100000 } | Format-List -Property "Ticks"
Ticks : 239863
Measure-Command -Expression { $a | % {$false} } | Format-List -Property "Ticks"
Ticks : 102298091
Notes:
I nulled the variable between each run ($a=$null).
Testing was on a tablet with Atom processor; you would probably see faster speeds on other machines. [edit: About twice as fast on a desktop machine.]
There was a fair bit of variation when I tried multiple runs. Look for the orders of magnitude rather than exact numbers.
Testing was with PowerShell 3.0 in Windows 8.
Acknowledgements
Thanks to #halr9000 for array*n, #Scott Saad and Lee Desmond for New-Object, and #EBGreen for ArrayList.
Thanks to #n0rd for getting me to think about performance.

$array = 1..5 | foreach { $false }

Here's another idea. You have to remember, that it's .NET underneath:
$arr = [System.Array]::CreateInstance([System.Object], 5)
$arr.GetType()
$arr.Length
$arr = [Object[]]::new(5)
$arr.GetType()
$arr.Length
Result:
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
5
True True Object[] System.Array
5
Using new() has one distinct advantage: when you're programming in ISE and want to create an object, ISE will give you hint with all paramer combinations and their types. You don't have that with New-Object, where you have to remember the types and order of arguments.

$array = #()
for($i=0; $i -lt 5; $i++)
{
$array += $i
}

The solution I found was to use the New-Object cmdlet to initialize an array of the proper size.
$array = new-object object[] 5
for($i=0; $i -lt $array.Length;$i++)
{
$array[$i] = $FALSE
}

If I don't know the size up front, I use an arraylist instead of an array.
$al = New-Object System.Collections.ArrayList
for($i=0; $i -lt 5; $i++)
{
$al.Add($i)
}

Here's another typical way:
$array = for($i = 0; $i -le 4; $i++) { $false }

Or try this an idea. Works with powershell 5.0+.
[bool[]]$tf=((,$False)*5)

$array = foreach($i in 1..5) { $false }

Related

Appending objects to arrays in Powershell

I have the following code:
$DataType = "X,Y,Z"
$Data = "1,2,3"
$Table = #()
for ($i = 0; $i -le ($DataType.Count-1); $i++)
{
$Properties = #{$DataType[$i]=$Data[$i]}
$Object = New-Object -TypeName PSCustomObject -Property $Properties
$Table += $Object
}
$Table | Format-Table -AutoSize
I get this output:
X
-
1
What I would like to get is:
X Y Z
- - -
1 2 3
Thanks for your help!
Cutting a long story short:
$DataType, $Data | ConvertFrom-Csv
X Y Z
- - -
1 2 3
Ok, it needs a little explanation:
PowerShell will automatically unroll the array of strings ($DataType, $Data) and supply it as individual line items to the pipeline. The ConvertFrom-Csv cmdlet supports supplying the input table through the pipeline as separate lines (strings).
You can do the following instead:
$DataType = "X","Y","Z"
$Data = 1,2,3
$hash = [ordered]#{}
for ($i = 0; $i -lt $DataType.Count; $i++) {
$hash.Add($DataType[$i],$Data[$i])
}
$table = [pscustomobject]$hash
Explanation:
The code creates two collections, $DataType and $Data, of three items. $hash is an ordered hash table. [ordered] is used to preserve the order at which key-value pairs are added to the hash table. Since $hash is the object type hashtable, it contains the .Add(key,value) method for adding key-value pairs.
Since the [pscustomobject] type accelerator can be cast on a hash table, we can simply use the syntax [pscustomobject]$hash to create a new object.
If we consider your attempt, your variables are actually single strings rather than collections. Surrounding a value with quotes causes PowerShell to expand the inner contents as a string. When you index a string rather than a collection, you index the characters in the string rather than the entire item. You need to quote the individual elements between the commas so that the , acts as a separator rather than part of the string. You can see this behavior in the code below:
# DataType as a string
$DataType = "X,Y,Z"
$DataType[1]
,
# DataType as an array or collection
$DataType = "X","Y","Z"
$DataType[1]
Y
If you receive your data from another output in the current format, you can manipulate using $DataType = $DataType.Split(',') in order to create a collection. Alternatively you can treat the data as comma-separated and use the Import-Csv or ConvertFrom-Csv commands as in iRon's answer provided you order your strings properly.
Inside of your loop, you are adding three new objects to your collection $table rather than creating one object with three properties. $table += $Object creates an array called $table that appends a new item to the previous list from $table. If this was your original intention, you can view your collection by running $table | Format-List once you fix your $DataType and $Data variables.
When a collection is enumerated, the default table view displays the properties of the first object in a collection. Any succeeding objects will only display values for the first object's matching properties. So if object1 has properties X and Y and object2 has properties Y and Z, the console will only display values for properties X and Y for both objects. Format-List overrides this view and displays all properties of all objects. See below for an example of this behavior:
$obj1
X Y
- -
1 2
$obj2
Y Z
- -
3 4
$array = $obj1,$obj2
# Table View
$array
X Y
- -
1 2
3
# List View
$array | Format-List
X : 1
Y : 2
Y : 3
Z : 4
It seems that you want to create a single object with a property for each value in the arrays $DataType/$Data, but the problems are...
Neither $DataType nor $Data are arrays.
By creating your object inside the for loop you will create one object per iteration.
Since $DataType is a scalar variable $DataType.Count returns 1. Ordinarily, testing for $DataType.Count-1 would mean the loop never gets entered, but by the grace of using -le (so 0 -le 0 returns $true) instead of -lt, it does for exactly one iteration. Thus, you do get your single result object, but with only the first property created.
To fix this, let's create $DataType and $Data as arrays, as well as creating one set of properties before the loop to be used to create one result object after the loop...
...
$DataType = "X,Y,Z" -split ','
$Data = "1,2,3" -split ','
$Properties = #{}
for ($i = 0; $i -lt $DataType.Count; $i++)
{
$Properties[$DataType[$i]] = $Data[$i]
}
New-Object -TypeName PSCustomObject -Property $Properties | Format-Table -AutoSize
You'll also notice that $i -le ($DataType.Count-1) has been simplified to $i -lt $DataType.Count. On my system the above code outputs...
Y Z X
- - -
2 3 1
The properties are correct, but the order is not what you wanted. This is because Hashtable instances, such as $Properties, have no ordering among their keys. To ensure that the properties are in the order you specified in the question, on PowerShell 3.0 and above you can use this to preserve insertion order...
$Properties = [Ordered] #{}
What if you initialized $Table as an appendable like so:
$Table = New-Object System.Collections.ArrayList
for ($i = 0; $i -le ($DataType.Count-1); $i++)
{
$Properties = #{$DataType[$i]=$Data[$i]}
$Object = New-Object -TypeName PSCustomObject -Property $Properties
$Table.Add ( $Object )
}
Reformat your logic as needed.
One solution to this problem (if the inputs were two separate arrays):
$DataType = #( 'X','Y','Z' )
$Data = #( '1','2','3' )
$Table = New-Object psobject
for ($i = 0; $i -le ( $DataType.Count-1 ); $i++)
{
$Table | Add-Member -Name "$( $DataType[$i] )" -Value ( $Data[$i] ) -MemberType NoteProperty
}
$Table

Using ForEach-Object on Array of Structs - Powershell

I'm evolving my Surveillance script, so i can choose a Service/Maintenance Window. Where all errors are ignored between two time intervals.
This is what i got:
Add-Type -TypeDefinition #"
public struct ServiceWindow
{
public int SWStart;
public int SWEnd;
}
"#
[array]$SWArray = New-Object ServiceWindow
$time = Get-Date -Format HHMM
$time
$ActiveBatchVar = "1000-1005;1306-1345;2300-2305"
$ActiveBatchVar = $ActiveBatchVar.Split(";")
For ($i = 0; $i -lt $ActiveBatchVar.Length; $i++)
{
$tempSW = New-Object ServiceWindow
$tempSW.SWStart = $ActiveBatchVar[$i].Split("-")[0]
$tempSW.SWEnd = $ActiveBatchVar[$i].Split("-")[1]
If ($i -eq 0) { $SWArray = $tempSW } else { $SWArray += $tempSW }
}
Write-Host Complete array...
$SWArray
ForEach-Object ($SWArray) {
Get-Date -Format HHMM
If ($time -ge $_.SWStart -and $time -lt $_.SWEnd) {Write-Host Wohoo we have hit a service window service window...}
}
I get an error in my last ForEach-Object loop. and can't figure out what is wrong.
The point is that I would like to check if the current time is between two given times, like "1000-1005".
Anyone got a clue what’s missing, or maybe a way to simplify the whole thing ;)
Ok, a few things here... You really seem to like the Split() method. You may want to look into some alternatives, like this:
$ActiveBatchVar = #(#("1000","1005"),#("1306","1345"),#("2300","2305"))
See what we did there? It's an array of arrays. #() is the array notation. So I have an array, with 3 arrays in it.
I'm not real familliar with structs, but I am familliar with custom objects, so I would use that if it were me. Then you could do something like:
$SWArray = #() #That's an empty array, we'll add things to it now that it exists
ForEach ($Batch in $ActiveBatchVar){
$SWArray += New-Object PSObject -Property #{
SWStart = $Batch[0]
SWEnd = $Batch[1]
}
}
So then we change the last bit so that you are assigning $time just before your next loop to keep it as accurate as possible, and correct the ForEach just a little and the whole thing would look like this:
$ActiveBatchVar = #(#("1000","1005"),#("1306","1345"),#("2300","2305"))
$SWArray = #()
ForEach ($Batch in $ActiveBatchVar){
$SWArray += New-Object PSObject -Property #{
SWStart = $Batch[0]
SWEnd = $Batch[1]
}
}
Write-Host Complete array...
$SWArray
$time = date -f HHmm
ForEach($SW in $SWArray) {
If ($time -ge $SW.SWStart -and $time -lt $SW.SWEnd) {
Write-Host "Wohoo we have hit a service window service window..."
}
}
Minimum changes:
ForEach-Object ($SWArray) {
to
$SWArray | % {
Also your last Write-Host should enclose the message in quoes ie
{Write-Host "Wohoo..."}
ForEach-Object ($SWArray) {}
This is the wrong syntax, you should use the keyword in
Foreach-Object ($array in $SWArray) {}
if you have a small array...
($SWArray).foreach({
Get-Date -Format HHMM
If ($time -ge $_.SWStart -and $time -lt $_.SWEnd)
{Write-Host Wohoo we have hit a service window service window...}
})

How to fill an array efficiently in Powershell

I want to fill up a dynamic array with the same integer value as fast as possible using Powershell.
The Measure-Command shows that it takes 7 seconds on my system to fill it up.
My current code (snipped) looks like:
$myArray = #()
$length = 16385
for ($i=1;$i -le $length; $i++) {$myArray += 2}
(Full code can be seen on gist.github.com or on superuser)
Consider that $length can change. But for better understanding I chose a fixed length.
Q: How do I speed up this Powershell code?
You can repeat arrays, just as you can do with strings:
$myArray = ,2 * $length
This means »Take the array with the single element 2 and repeat it $length times, yielding a new array.«.
Note that you cannot really use this to create multidimensional arrays because the following:
$some2darray = ,(,2 * 1000) * 1000
will just create 1000 references to the inner array, making them useless for manipulation. In that case you can use a hybrid strategy. I have used
$some2darray = 1..1000 | ForEach-Object { ,(,2 * 1000) }
in the past, but below performance measurements suggest that
$some2darray = foreach ($i in 1..1000) { ,(,2 * 1000) }
would be a much faster way.
Some performance measurements:
Command Average Time (ms)
------- -----------------
$a = ,2 * $length 0,135902 # my own
[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length) 7,15362 # JPBlanc
$a = foreach ($i in 1..$length) { 2 } 14,54417
[int[]]$a = -split "2 " * $length 24,867394
$a = for ($i = 0; $i -lt $length; $i++) { 2 } 45,771122 # Ansgar
$a = 1..$length | %{ 2 } 431,70304 # JPBlanc
$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 } 10425,79214 # original code
Taken by running each variant 50 times through Measure-Command, each with the same value for $length, and averaging the results.
Position 3 and 4 are a bit of a surprise, actually. Apparently it's much better to foreach over a range instead of using a normal for loop.
Code to generate above chart:
$length = 16384
$tests = '$a = ,2 * $length',
'[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length)',
'$a = for ($i = 0; $i -lt $length; $i++) { 2 }',
'$a = foreach ($i in 1..$length) { 2 }',
'$a = 1..$length | %{ 2 }',
'$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 }',
'[int[]]$a = -split "2 " * $length'
$tests | ForEach-Object {
$cmd = $_
$timings = 1..50 | ForEach-Object {
Remove-Variable i,a -ErrorAction Ignore
[GC]::Collect()
Measure-Command { Invoke-Expression $cmd }
}
[pscustomobject]#{
Command = $cmd
'Average Time (ms)' = ($timings | Measure-Object -Average TotalMilliseconds).Average
}
} | Sort-Object Ave* | Format-Table -AutoSize -Wrap
Avoid appending to an array in a loop. It's copying the existing array to a new array with each iteration. Do this instead:
$MyArray = for ($i=1; $i -le $length; $i++) { 2 }
Using PowerShell 3.0 you can use (need .NET Framework 3.5 or upper):
[int[]]$MyArray = ([System.Linq.Enumerable]::Repeat(2, 65000))
Using PowerShell 2.0
$AnArray = 1..65000 | % {2}
It is not clear what you are trying. I tried looking at your code. But, $myArray +=2 means you are just adding 2 as the element. For example, here is the output from my test code:
$myArray = #()
$length = 4
for ($i=1;$i -le $length; $i++) {
Write-Host $myArray
$myArray += 2
}
2
2 2
2 2 2
Why do you need to add 2 as the array element so many times?
If all you want is just fill the same value, try this:
$myArray = 1..$length | % { 2 }
If you need it really fast, then go with ArrayLists and Tuples:
$myArray = New-Object 'Collections.ArrayList'
$myArray = foreach($i in 1..$length) {
[tuple]::create(2)
}
and if you need to sort it later then use this (normally a bit slower):
$myArray = New-Object 'Collections.ArrayList'
foreach($i in 1..$length) {
$myArray.add(
[tuple]::create(2)
)
}
both versions are in the 20ms range for me ;-)

Print Array Elements on one line

I'm populating an array variable $array at some point in my code, for example like below
this
is
an
array
varaible
What if, I wanted to print out the array variable like thisisanarrayvariable as one liner
i took the below approach, but i'am not getting any out while the program is hanging
for ($i=0;$i -le $array.length; $i++)
{ $array[$i] }
obviuosly, i dont want to glue them together like $array[0]+$array[1]+$array[2]..
Hope i can get a better answer.
Joining array elements with no separator
Use the -join operator...
$array -join ''
...or the static String.Join method...
[String]::Join('', $array)
...or the static String.Concat method...
[String]::Concat($array)
For all of the above the result will be a new [String] instance with each element in $array concatenated together.
Fixing the for loop
Your for loop will output each element of $array individually, which will be rendered on separate lines. To fix this you can use Write-Host to write to the console, passing -NoNewline to keep the output of each iteration all on one line...
for ($i = 0; $i -lt $array.Length; $i++)
{
Write-Host -NoNewline $array[$i]
}
Write-Host
The additional invocation of Write-Host moves to a new line after the last array element is output.
If it's not console output but a new [String] instance you want you can concatenate the elements yourself in a loop...
$result = ''
for ($i = 0; $i -lt $array.Length; $i++)
{
$result += $array[$i]
}
The += operator will produce a new intermediate [String] instance for each iteration of the loop where $array[$i] is neither $null nor empty, so a [StringBuilder] is more efficient, especially if $array.Length is large...
$initialCapacity = [Int32] ($array | Measure-Object -Property 'Length' -Sum).Sum
$resultBuilder = New-Object -TypeName 'System.Text.StringBuilder' -ArgumentList $initialCapacity
for ($i = 0; $i -lt $array.Length; $i++)
{
$resultBuilder.Append($array[$i]) | Out-Null # Suppress [StringBuilder] method returning itself
}
$result = $resultBuilder.ToString()
Just use
-join $array
which will glue all elements together.

Comparing two arrays & get the values which are not common

i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell
example if
$a1=#(1,2,3,4,5)
$b1=#(1,2,3,4,5,6)
$c which is the output should give me the value "6" which is the output of what's the uncommon value between both the arrays.
Can some one help me out with the same! thanks!
PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThru
PS > $c
6
$a = 1..5
$b = 4..8
$Yellow = $a | Where {$b -NotContains $_}
$Yellow contains all the items in $a except the ones that are in $b:
PS C:\> $Yellow
1
2
3
$Blue = $b | Where {$a -NotContains $_}
$Blue contains all the items in $b except the ones that are in $a:
PS C:\> $Blue
6
7
8
$Green = $a | Where {$b -Contains $_}
Not in question, but anyways; Green contains the items that are in both $a and $b.
PS C:\> $Green
4
5
Note: Where is an alias of Where-Object. Alias can introduce possible problems and make scripts hard to maintain.
Addendum 12 October 2019
As commented by #xtreampb and #mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).
Union
The symmetric difference between the $a and $b can be literally defined as the union of $Yellow and $Blue:
$NotGreen = $Yellow + $Blue
Which is written out:
$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})
Performance
As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a iterate (using Where) through items in list $b (using -NotContains) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b):
$Count = #{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
By using the ForEach statement instead of the ForEach-Object cmdlet and the Where method instead of the Where-Object you might increase the performance by a factor 2.5:
$Count = #{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})
LINQ
But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:
To use LINQ you need to explicitly define the array types:
[Int[]]$a = 1..5
[Int[]]$b = 4..8
And use the [Linq.Enumerable]:: operator:
$Yellow = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
SymmetricExceptWith
(Added 2022-05-02)
There is actually another way to get the symmetric difference which is using the SymmetricExceptWith method of the HashSet class, for a details see the specific answer from mklement0 on Find what is different in two very large lists:
$a = [System.Collections.Generic.HashSet[int]](1..5)
$b = [System.Collections.Generic.HashSet[int]](4..8)
$a.SymmetricExceptWith($b)
$NotGreen = $a # note that the result will be stored back in $a
Benchmark
(Updated 2022-05-02, thanks #Santiago for the improved benchmark script)
Benchmark results highly depend on the sizes of the collections and how many items there are actually shared. Besides there is a caveat with drawing conclussions on methods that use
lazy evaluation (also called deferred execution) as with LINQ and the SymmetricExceptWith where actually pulling the result (e.g. #($a)[0]) causes the expression to be evaluated and therefore might take longer than expected as nothing has been done yet other than defining what should be done. See also: Fastest Way to get a uniquely index item from the property of an array
Anyways, as an "average", I am presuming that half of each collection is shared with the other.
Test TotalMilliseconds
---- -----------------
Compare-Object 118.5942
Where-Object 275.6602
ForEach-Object 52.8875
foreach 25.7626
Linq 14.2044
SymmetricExce… 7.6329
To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.
[Int[]]$arrA = 1..1000
[Int[]]$arrB = 500..1500
Measure-Command {&{
$a = $arrA
$b = $arrB
Compare-Object -ReferenceObject $a -DifferenceObject $b -PassThru
}} |Select-Object #{N='Test';E={'Compare-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}} |Select-Object #{N='Test';E={'Where-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = #{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
}} |Select-Object #{N='Test';E={'ForEach-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = #{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1}) # => should be foreach($key in $Count.Keys) {if($Count[$key] -eq 1) { $key }} for fairness
}} |Select-Object #{N='Test';E={'foreach'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
[Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}} |Select-Object #{N='Test';E={'Linq'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($r = [System.Collections.Generic.HashSet[int]]::new($a)).SymmetricExceptWith($b)
}} |Select-Object #{N='Test';E={'SymmetricExceptWith'}}, TotalMilliseconds
Look at Compare-Object
Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }
Or if you would like to know where the object belongs to, then look at SideIndicator:
$a1=#(1,2,3,4,5,8)
$b1=#(1,2,3,4,5,6)
Compare-Object $a1 $b1
Try:
$a1=#(1,2,3,4,5)
$b1=#(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject
Or, you can use:
(Compare-Object $b1 $a1).InputObject
The order doesn't matter.
Your results will not be helpful unless the arrays are first sorted.
To sort an array, run it through Sort-Object.
$x = #(5,1,4,2,3)
$y = #(2,4,6,1,3,5)
Compare-Object -ReferenceObject ($x | Sort-Object) -DifferenceObject ($y | Sort-Object)
This should help, uses simple hash table.
$a1=#(1,2,3,4,5) $b1=#(1,2,3,4,5,6)
$hash= #{}
#storing elements of $a1 in hash
foreach ($i in $a1)
{$hash.Add($i, "present")}
#define blank array $c
$c = #()
#adding uncommon ones in second array to $c and removing common ones from hash
foreach($j in $b1)
{
if(!$hash.ContainsKey($j)){$c = $c+$j}
else {hash.Remove($j)}
}
#now hash is left with uncommon ones in first array, so add them to $c
foreach($k in $hash.keys)
{
$c = $c + $k
}

Resources