Comparing two arrays & get the values which are not common - arrays

i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell
example if
$a1=#(1,2,3,4,5)
$b1=#(1,2,3,4,5,6)
$c which is the output should give me the value "6" which is the output of what's the uncommon value between both the arrays.
Can some one help me out with the same! thanks!

PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThru
PS > $c
6

$a = 1..5
$b = 4..8
$Yellow = $a | Where {$b -NotContains $_}
$Yellow contains all the items in $a except the ones that are in $b:
PS C:\> $Yellow
1
2
3
$Blue = $b | Where {$a -NotContains $_}
$Blue contains all the items in $b except the ones that are in $a:
PS C:\> $Blue
6
7
8
$Green = $a | Where {$b -Contains $_}
Not in question, but anyways; Green contains the items that are in both $a and $b.
PS C:\> $Green
4
5
Note: Where is an alias of Where-Object. Alias can introduce possible problems and make scripts hard to maintain.
Addendum 12 October 2019
As commented by #xtreampb and #mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).
Union
The symmetric difference between the $a and $b can be literally defined as the union of $Yellow and $Blue:
$NotGreen = $Yellow + $Blue
Which is written out:
$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})
Performance
As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a iterate (using Where) through items in list $b (using -NotContains) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b):
$Count = #{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
By using the ForEach statement instead of the ForEach-Object cmdlet and the Where method instead of the Where-Object you might increase the performance by a factor 2.5:
$Count = #{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})
LINQ
But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:
To use LINQ you need to explicitly define the array types:
[Int[]]$a = 1..5
[Int[]]$b = 4..8
And use the [Linq.Enumerable]:: operator:
$Yellow = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
SymmetricExceptWith
(Added 2022-05-02)
There is actually another way to get the symmetric difference which is using the SymmetricExceptWith method of the HashSet class, for a details see the specific answer from mklement0 on Find what is different in two very large lists:
$a = [System.Collections.Generic.HashSet[int]](1..5)
$b = [System.Collections.Generic.HashSet[int]](4..8)
$a.SymmetricExceptWith($b)
$NotGreen = $a # note that the result will be stored back in $a
Benchmark
(Updated 2022-05-02, thanks #Santiago for the improved benchmark script)
Benchmark results highly depend on the sizes of the collections and how many items there are actually shared. Besides there is a caveat with drawing conclussions on methods that use
lazy evaluation (also called deferred execution) as with LINQ and the SymmetricExceptWith where actually pulling the result (e.g. #($a)[0]) causes the expression to be evaluated and therefore might take longer than expected as nothing has been done yet other than defining what should be done. See also: Fastest Way to get a uniquely index item from the property of an array
Anyways, as an "average", I am presuming that half of each collection is shared with the other.
Test TotalMilliseconds
---- -----------------
Compare-Object 118.5942
Where-Object 275.6602
ForEach-Object 52.8875
foreach 25.7626
Linq 14.2044
SymmetricExce… 7.6329
To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.
[Int[]]$arrA = 1..1000
[Int[]]$arrB = 500..1500
Measure-Command {&{
$a = $arrA
$b = $arrB
Compare-Object -ReferenceObject $a -DifferenceObject $b -PassThru
}} |Select-Object #{N='Test';E={'Compare-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}} |Select-Object #{N='Test';E={'Where-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = #{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
}} |Select-Object #{N='Test';E={'ForEach-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = #{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1}) # => should be foreach($key in $Count.Keys) {if($Count[$key] -eq 1) { $key }} for fairness
}} |Select-Object #{N='Test';E={'foreach'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
[Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}} |Select-Object #{N='Test';E={'Linq'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($r = [System.Collections.Generic.HashSet[int]]::new($a)).SymmetricExceptWith($b)
}} |Select-Object #{N='Test';E={'SymmetricExceptWith'}}, TotalMilliseconds

Look at Compare-Object
Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }
Or if you would like to know where the object belongs to, then look at SideIndicator:
$a1=#(1,2,3,4,5,8)
$b1=#(1,2,3,4,5,6)
Compare-Object $a1 $b1

Try:
$a1=#(1,2,3,4,5)
$b1=#(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject
Or, you can use:
(Compare-Object $b1 $a1).InputObject
The order doesn't matter.

Your results will not be helpful unless the arrays are first sorted.
To sort an array, run it through Sort-Object.
$x = #(5,1,4,2,3)
$y = #(2,4,6,1,3,5)
Compare-Object -ReferenceObject ($x | Sort-Object) -DifferenceObject ($y | Sort-Object)

This should help, uses simple hash table.
$a1=#(1,2,3,4,5) $b1=#(1,2,3,4,5,6)
$hash= #{}
#storing elements of $a1 in hash
foreach ($i in $a1)
{$hash.Add($i, "present")}
#define blank array $c
$c = #()
#adding uncommon ones in second array to $c and removing common ones from hash
foreach($j in $b1)
{
if(!$hash.ContainsKey($j)){$c = $c+$j}
else {hash.Remove($j)}
}
#now hash is left with uncommon ones in first array, so add them to $c
foreach($k in $hash.keys)
{
$c = $c + $k
}

Related

How to declare a multidimensional 2D array in Powershell? [duplicate]

I have a way of doing Arrays in other languagues like this:
$x = "David"
$arr = #()
$arr[$x]["TSHIRTS"]["SIZE"] = "M"
This generates an error.
You are trying to create an associative array (hash). Try out the following
sequence of commands
$arr=#{}
$arr["david"] = #{}
$arr["david"]["TSHIRTS"] = #{}
$arr["david"]["TSHIRTS"]["SIZE"] ="M"
$arr.david.tshirts.size
Note the difference between hashes and arrays
$a = #{} # hash
$a = #() # array
Arrays can only have non-negative integers as indexes
from powershell.com:
PowerShell supports two types of multi-dimensional arrays: jagged arrays and true multidimensional arrays.
Jagged arrays are normal PowerShell arrays that store arrays as elements. This is very cost-effective storage because dimensions can be of different size:
$array1 = 1,2,(1,2,3),3
$array1[0]
$array1[1]
$array1[2]
$array1[2][0]
$array1[2][1]
True multi-dimensional arrays always resemble a square matrix. To create such an array, you will need to access .NET. The next line creates a two-dimensional array with 10 and 20 elements resembling a 10x20 matrix:
$array2 = New-Object 'object[,]' 10,20
$array2[4,8] = 'Hello'
$array2[9,16] = 'Test'
$array2
for a 3-dimensioanl array 10*20*10
$array3 = New-Object 'object[,,]' 10,20,10
To extend on what manojlds said above is that you can nest Hashtables. It may not be a true multi-dimensional array but give you some ideas about how to structure the data. An example:
$hash = #{}
$computers | %{
$hash.Add(($_.Name),(#{
"Status" = ($_.Status)
"Date" = ($_.Date)
}))
}
What's cool about this is that you can reference things like:
($hash."Name1").Status
Also, it is far faster than arrays for finding stuff. I use this to compare data rather than use matching in Arrays.
$hash.ContainsKey("Name1")
Hope some of that helps!
-Adam
Knowing that PowerShell pipes objects between cmdlets, it is more common in PowerShell to use an array of PSCustomObjects:
$arr = #(
[PSCustomObject]#{Name = 'David'; Article = 'TShirt'; Size = 'M'}
[PSCustomObject]#{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
Or for older PowerShell Versions (PSv2):
$arr = #(
New-Object PSObject -Property #{Name = 'David'; Article = 'TShirt'; Size = 'M'}
New-Object PSObject -Property #{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
And grep your selection like:
$arr | Where {$_.Name -eq 'David' -and $_.Article -eq 'TShirt'} | Select Size
Or in newer PowerShell (Core) versions:
$arr | Where Name -eq 'David' | Where Article -eq 'TShirt' | Select Size
Or (just get the size):
$arr.Where{$_.Name -eq 'David' -and $_.Article -eq 'TShirt'}.Size
Addendum 2020-07-13
Syntax and readability
As mentioned in the comments, using an array of custom objects is straighter and saves typing, if you like to exhaust this further you might even use the ConvertForm-Csv (or the Import-Csv) cmdlet for building the array:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Or more readable:
$arr = ConvertFrom-Csv #'
Name, Article, Size
David, TShirt, M
Eduard, Trouwsers, S
'#
Note: values that contain spaces or special characters need to be double quoted
Or use an external cmdlet like ConvertFrom-SourceTable which reads fixed width table formats:
$arr = ConvertFrom-SourceTable '
Name Article Size
David TShirt M
Eduard Trouwsers S
'
Indexing
The disadvantage of using an array of custom objects is that it is slower than a hash table which uses a binary search algorithm.
Note that the advantage of using an array of custom objects is that can easily search for anything else e.g. everybody that wears a TShirt with size M:
$arr | Where Article -eq 'TShirt' | Where Size -eq 'M' | Select Name
To build an binary search index from the array of objects:
$h = #{}
$arr | ForEach-Object {
If (!$h.ContainsKey($_.Name)) { $h[$_.Name] = #{} }
If (!$h[$_.Name].ContainsKey($_.Article)) { $h[$_.Name][$_.Article] = #{} }
$h[$_.Name][$_.Article] = $_ # Or: $h[$_.Name][$_.Article]['Size'] = $_.Size
}
$h.david.tshirt.size
M
Note: referencing a hash table key that doesn't exist in Set-StrictMode will cause an error:
Set-StrictMode -Version 2
$h.John.tshirt.size
PropertyNotFoundException: The property 'John' cannot be found on this object. Verify that the property exists.
Here is a simple multidimensional array of strings.
$psarray = #(
('Line' ,'One' ),
('Line' ,'Two')
)
foreach($item in $psarray)
{
$item[0]
$item[1]
}
Output:
Line
One
Line
Two
Two-dimensional arrays can be defined this way too as jagged array:
$array = New-Object system.Array[][] 5,5
This has the nice feature that
$array[0]
outputs a one-dimensional array, containing $array[0][0] to $array[0][4].
Depending on your situation you might prefer it over $array = New-Object 'object[,]' 5,5.
(I would have commented to CB above, but stackoverflow does not let me yet)
you could also uses System.Collections.ArrayList to make a and array of arrays or whatever you want.
Here is an example:
$resultsArray= New-Object System.Collections.ArrayList
[void] $resultsArray.Add(#(#('$hello'),2,0,0,0,0,0,0,1,1))
[void] $resultsArray.Add(#(#('$test', '$testagain'),3,0,0,1,0,0,0,1,2))
[void] $resultsArray.Add("ERROR")
[void] $resultsArray.Add(#(#('$var', '$result'),5,1,1,0,1,1,0,2,3))
[void] $resultsArray.Add(#(#('$num', '$number'),3,0,0,0,0,0,1,1,2))
One problem, if you would call it a problem, you cannot set a limit. Also, you need to use [void] or the script will get mad.
Using the .net syntax (like CB pointed above)
you also add coherence to your 'tabular' array...
if you define a array...
and you try to store diferent types
Powershell will 'alert' you:
$a = New-Object 'byte[,]' 4,4
$a[0,0] = 111; // OK
$a[0,1] = 1111; // Error
Of course Powershell will 'help' you
in the obvious conversions:
$a = New-Object 'string[,]' 2,2
$a[0,0] = "1111"; // OK
$a[0,1] = 111; // OK also
Another thread pointed here about how to add to a multidimensional array in Powershell. I don't know if there is some reason not to use this method, but it worked for my purposes.
$array = #()
$array += ,#( "1", "test1","a" )
$array += ,#( "2", "test2", "b" )
$array += ,#( "3", "test3", "c" )
Im found pretty cool solvation for making arrays in array.
$GroupArray = #()
foreach ( $Array in $ArrayList ){
$GroupArray += #($Array , $null)
}
$GroupArray = $GroupArray | Where-Object {$_ -ne $null}
Lent from above:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Print the $arr:
$arr
Name Article Size
---- ------- ----
David TShirt M
Eduard Trouwsers S
Now select 'David'
$arr.Where({$_.Name -eq "david"})
Name Article Size
---- ------- ----
David TShirt M
Now if you want to know the Size of 'David'
$arr.Where({$_.Name -eq "david"}).size
M

Correct usage of ForEach -parallel in Powershell?

I want to process a large number of URLs and grab the *.jpg file locations.
The problem is that the $entry in the second foreach is not threadsafe.
The script is firing hundreds of errors because the $entry is getting overwritten over and over.
When I move the inner foreach outside of the ForEach-Object, then its working fine but very slowly.
How can I process the split output properly within my ForEach-Object without getting these errors?
$array just contains a huge amount of URLs
$clean_img_array is the output array of the operation
$tmpArray is the reference to $clean_img_array in order to use it within a parallel ForEach
Errors:
InvalidOperation:
Line |
14 | [void]$tmpArray:clean_img_array.Add($entry);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| You cannot call a method on a null-valued expression.
Snippet:
$clean_img_array = [System.Collections.ArrayList]#();
$array | ForEach-Object -Parallel {
$web = Invoke-RestMethod $_;
$i=1;
foreach($entry in $web.Split("`"")){
echo $entry;
if($entry.IndexOf(".jpg") -ne -1 -And $entry.IndexOf("http") -ne -1){
if($entry.IndexOf("?") -ne -1){
$tmpArray = $using:clean_img_array;
[void]$tmpArray.Add($entry.Substring(0, $entry.IndexOf('?')));
}else{
$tmpArray = $using:Clean_img_array;
[void]$tmpArray:clean_img_array.Add($entry);
}
}
}
} -ThrottleLimit 20
Here's a simple example. Both $a and $b are arrays. $b is the result of the parallel loop. It's like example 12 in the docs.
$a = 1..10
$b = $a | foreach-object -parallel { $_ + 1 }
$b
2
3
4
5
6
7
8
9
10
11
Thanks for the support!
I combined the answer from #js2012 with something of my own.
The return alone did not solve the thread unsave behavior of $entry, but clearly untangled the workflow.
But I used the inline version of .foreach with the pipe line variable $_ wich happens
to be thread safe as it appears to me. Running now like a charm and also very fast for more than 2million entrys.
$array holds the URLs to be processed
$clean_img_array returns the grabbed image URLs
$clean_img_array = $array | ForEach-Object -Parallel {
$web = Invoke-RestMethod $_;
$web.Split("`"").foreach({
if($_ -ne $null){
if($_.IndexOf(".jpg") -ne -1 -And $_.IndexOf("http") -ne -1){
if($_.IndexOf("?") -ne -1){
return $_.Substring(0, $entry.IndexOf('?'));
}else{
return $_;
}
}
}
});
} -ThrottleLimit 25

Powershell - remove duplicates from 2 arrays using wildcards

I am trying to remove duplicates and leave only unique entries from the output of 2 queries.
I am pulling a list of installed Windows Updates using the following (also stripping 12 chars of whitespace and dropping to lower case:
$A = #(Get-HotFix | select-object #{Expression={$_.HotFixID.ToLower()}} | ft -hidetableheaders | Out-String) -replace '\s{12}',''
I am then querying a list of available files in a folder and stripping 3 trailing whitespace chars using:
$B = #(Get-ChildItem D:\y | select-object 'Name' | ft -hidetableheaders | Out-String) -replace '\s{3}',''
The problem I have is that the first query ($A) returns output like:
kb4040981
kb4041693
kb2345678
kb8765432
While the second query ($B) returns output like:
windows8.1-kb4040981-x64_d1eb05bc8c55c7632779086079c7759f40d7386f.cab
windows8.1-kb4041687-x64_3bdf264bcfc0dda01c2eaf2135e322d2d6ce6c64.cab
windows8.1-kb4041693-x64_359b7ac71a48e5af003d67e3e4b80120a2f5b570.cab
windows8.1-kb4049179-x64_e6ec21d5d16fa6d8ff890c0c6042c2ba38a1f7c4.cab
I need to compare the 2 outputs using wildcards around each entry in the $A array (I think), and where it exists in $B remove the entire line from $B array.
I cannot truncate the output of $B as I need to use the full filenames in a subsequent process.
IE in the example output above, the entire FIRST and THIRD lines would be remove from the $B array and other lines left intact.
I have tried numerous methods from online searches, and used foreach loops, all to no avail.
Thank you in advance for any assistance.
What did you try with foreach loops that didn't work? Unless your output is huge, this method is pretty striaght forward.
$a = "kb4040981","kb4041693","kb2345678","kb8765432","test"
[System.Collections.ArrayList]$b = "windows8.1-kb4040981-x64_d1eb05bc8c55c7632779086079c7759f40d7386f.cab","windows8.1-kb4041687-x64_3bdf264bcfc0dda01c2eaf2135e322d2d6ce6c64.cab","windows8.1-kb4041693-x64_359b7ac71a48e5af003d67e3e4b80120a2f5b570.cab","windows8.1-kb4049179-x64_e6ec21d5d16fa6d8ff890c0c6042c2ba38a1f7c4.cab"
$toRemove = New-Object System.Collections.ArrayList
foreach($kb in $a)
{
foreach($line in $b)
{
if($line -match $kb)
{
write-host "$kb found in: $line" -ForegroundColor Green
$toRemove.add($line) | out-null
}
}
}
foreach($line in $toRemove)
{
$b.Remove($line)
}
$b
Hope it helps.
I would recommend for you to take a little time to learn the very basics of Powershell. When you use format cmdlets and text files instead of objects you cut yourself of the good stuff. ;-)
Here is how I would start the task:
$A = Get-HotFix
$B = Get-ChildItem D:\y | Select-Object -Property Name,#{Name='HotFixID';Expression={($_.BaseName -split '-')[1]}}
Compare-Object -ReferenceObject $A -DifferenceObject $B -Property 'HotFixID' -PassThru
Sincere thanks to sambardo for his patience and input! The final working solution based on his excellent recommendation is:
$a = (Get-Hotfix).hotfixID
$b = (Get-ChildItem D:\y\ -file *.cab).name
$toRemove = New-Object System.Collections.ArrayList
foreach($kb in $a)
{
foreach($line in $b)
{
if($line -match $kb)
{
# write-host "$kb found in: $line" -ForegroundColor Green
$toRemove.add($line) | out-null
}
}
}
foreach($line in $toRemove)
{
$b.Remove($line)
}
$b

PowerShell - Not creating Jagged Array within forEach loop

So, I'm having an issue enumerating through a forEach loop in PowerShell (v3) and adding the variable being evaluated, as well as a Test-Connection result into an array. I'm trying to make $arrPing a multi-dimensional array as this will make it easier for me to filter and process the objects in there later in the script, but I'm encountering issues with the code.
My code looks like the following:
$arrPing= #();
$strKioskIpAddress= (Get-WmiObject Win32_NetworkAdapterConfiguration | Where-Object { $_.IPAddress -ne $null }).ipaddress
...FURTHER DOWN THE CODE...
$tmpIpAddress= Select-Xml -Path $dirKioskIpAddresses -XPath '//kiosks/kiosk' | Select-Object -ExpandProperty Node
forEach ( $entry in $tmpIpAddress )
{
if ( $entry -ne $strKioskIpAddress )
{
$result= Test-Connection -ComputerName $entry -Count 1 -BufferSize 16 -Quiet -ErrorAction SilentlyContinue
$arrPing+= #($entry,$result);
}
}
But I'm getting the following output when I display the contents of the $arrPing variable:
PS H:\Documents\PowerShell Scripts> $arrPing
10.216.1.134
True
10.216.1.139
True
10.216.23.230
True
10.216.23.196
False
10.216.23.23
False
Can anyone tell me where I'm going wrong? I have a feeling that this is happening because I'm in a forEach loop but I just can't say for sure...
I would simplify it a bit by using a PSCustomObject:
$Ping = foreach ($Entry in $tmpIpAddress) {
if ($Entry -ne $strKioskIpAddress) {
$TestParams = #{
ComputerName = $Entry
Count = '1'
BufferSize = '16'
Quiet = $true
ErrorAction = 'SilentlyContinue'
}
$Result = Test-Connection #TestParams
[PSCustomObject]#{
Entry = $Entry
Result = $Result
}
}
}
$Ping
To avoid a long row of parameters I've used a technique called splatting.
You are seeing how PowerShell unrolls arrays. The variable is as designed: a large array. However PowerShell, when displaying those, puts each element on its own line. If you do not want that and especially if you are going to use This data will be used to filter out computers which are not on the network then you should use PowerShell objects.
if ( $entry -ne $strKioskIpAddress ){
$objPing += New-Object -TypeName psobject -Property #{
Entry = $entry
Result = Test-Connection -ComputerName $entry -Count 1 -BufferSize 16 -Quiet -ErrorAction SilentlyContinue
}
}
Instead of that those I would continue and use a different foreach contruct which is more pipeline friendly. That way you can use other cmdlets like Export-CSV if you need this output in other locations. Also lie PetSerAl says
[Y]ou should not use array addition operator and add elements one by one. It [will] create [a] new array (as arrays are not resizable) and copy elements from [the] old one on each operation.
$tmpIpAddress | Where-Object{$_ -ne $strKioskIpAddress} | ForEach-Object{
New-Object -TypeName psobject -Property #{
Entry = $_
Result = Test-Connection -ComputerName $_ -Count 1 -BufferSize 16 -Quiet -ErrorAction SilentlyContinue
}
} | Export-CSV -NoTypeInformation $path
The if is redundant now that we have moved that logic into Where-Object since you were using it do filter out certain records anyway. That is what Where-Object is good for.
The above code is good for PowerShell 2.0. If you have 3.0 or later then use [pscutomobject] and [ordered]
$tmpIpAddress | Where-Object{$_ -ne $strKioskIpAddress} | ForEach-Object{
[psobject][ordered] #{
Entry = $_
Result = Test-Connection -ComputerName $_ -Count 1 -BufferSize 16 -Quiet -ErrorAction SilentlyContinue
}
} | Export-CSV -NoTypeInformation $path

PowerShell array initialization

What's the best way to initialize an array in PowerShell?
For example, the code
$array = #()
for($i=0; $i -lt 5;$i++)
{
$array[$i] = $FALSE
}
generates the error
Array assignment failed because index '0' was out of range.
At H:\Software\PowerShell\TestArray.ps1:4 char:10
+ $array[$ <<<< i] = $FALSE
Here's two more ways, both very concise.
$arr1 = #(0) * 20
$arr2 = ,0 * 20
You can also rely on the default value of the constructor if you wish to create a typed array:
> $a = new-object bool[] 5
> $a
False
False
False
False
False
The default value of a bool is apparently false so this works in your case. Likewise if you create a typed int[] array, you'll get the default value of 0.
Another cool way that I use to initialze arrays is with the following shorthand:
> $a = ($false, $false, $false, $false, $false)
> $a
False
False
False
False
False
Or if you can you want to initialize a range, I've sometimes found this useful:
> $a = (1..5)
> $a
1
2
3
4
5
Hope this was somewhat helpful!
Yet another alternative:
for ($i = 0; $i -lt 5; $i++)
{
$arr += #($false)
}
This one works if $arr isn't defined yet.
NOTE - there are better (and more performant) ways to do this... see https://stackoverflow.com/a/234060/4570 below as an example.
The original example returns an error because the array is created empty, then you try to access the nth element to assign it a value.
The are a number of creative answers here, many I didn't know before reading this post. All are fine for a small array, but as n0rd points out, there are significant differences in performance.
Here I use Measure-Command to find out how long each initialization takes. As you might guess, any approach that uses an explicit PowerShell loop is slower than those that use .Net constructors or PowerShell operators (which would be compiled in IL or native code).
Summary
New-Object and #(somevalue)*n are fast (around 20k ticks for 100k elements).
Creating an array with the range operator n..m is 10x slower (200k ticks).
Using an ArrayList with the Add() method is 1000x slower than the baseline (20M ticks), as is looping through an already-sized array using for() or ForEach-Object (a.k.a. foreach,%).
Appending with += is the worst (2M ticks for just 1000 elements).
Overall, I'd say array*n is "best" because:
It's fast.
You can use any value, not just the default for the type.
You can create repeating values (to illustrate, type this at the powershell prompt: (1..10)*10 -join " " or ('one',2,3)*3)
Terse syntax.
The only drawback:
Non-obvious. If you haven't seen this construct before, it's not apparent what it does.
But keep in mind that for many cases where you would want to initialize the array elements to some value, then a strongly-typed array is exactly what you need. If you're initializing everything to $false, then is the array ever going to hold anything other than $false or $true? If not, then New-Object type[] n is the "best" approach.
Testing
Create and size a default array, then assign values:
PS> Measure-Command -Expression {$a = new-object object[] 100000} | Format-List -Property "Ticks"
Ticks : 20039
PS> Measure-Command -Expression {for($i=0; $i -lt $a.Length;$i++) {$a[$i] = $false}} | Format-List -Property "Ticks"
Ticks : 28866028
Creating an array of Boolean is bit little slower than and array of Object:
PS> Measure-Command -Expression {$a = New-Object bool[] 100000} | Format-List -Property "Ticks"
Ticks : 130968
It's not obvious what this does, the documentation for New-Object just says that the second parameter is an argument list which is passed to the .Net object constructor. In the case of arrays, the parameter evidently is the desired size.
Appending with +=
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 100000; $i++) {$a+=$false} } | Format-List -Property "Ticks"
I got tired of waiting for that to complete, so ctrl+c then:
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 100; $i++) {$a+=$false} } | Format-List -Property "Ticks"
Ticks : 147663
PS> $a=#()
PS> Measure-Command -Expression { for ($i=0; $i -lt 1000; $i++) {$a+=$false} } | Format-List -Property "Ticks"
Ticks : 2194398
Just as (6 * 3) is conceptually similar to (6 + 6 + 6), so ($somearray * 3) ought to give the same result as ($somearray + $somearray + $somearray). But with arrays, + is concatenation rather than addition.
If $array+=$element is slow, you might expect $array*$n to also be slow, but it's not:
PS> Measure-Command -Expression { $a = #($false) * 100000 } | Format-List -Property "Ticks"
Ticks : 20131
Just like Java has a StringBuilder class to avoid creating multiple objects when appending, so it seems PowerShell has an ArrayList.
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 1000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 447133
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 10000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 2097498
PS> $al = New-Object System.Collections.ArrayList
PS> Measure-Command -Expression { for($i=0; $i -lt 100000; $i++) {$al.Add($false)} } | Format-List -Property "Ticks"
Ticks : 19866894
Range operator, and Where-Object loop:
PS> Measure-Command -Expression { $a = 1..100000 } | Format-List -Property "Ticks"
Ticks : 239863
Measure-Command -Expression { $a | % {$false} } | Format-List -Property "Ticks"
Ticks : 102298091
Notes:
I nulled the variable between each run ($a=$null).
Testing was on a tablet with Atom processor; you would probably see faster speeds on other machines. [edit: About twice as fast on a desktop machine.]
There was a fair bit of variation when I tried multiple runs. Look for the orders of magnitude rather than exact numbers.
Testing was with PowerShell 3.0 in Windows 8.
Acknowledgements
Thanks to #halr9000 for array*n, #Scott Saad and Lee Desmond for New-Object, and #EBGreen for ArrayList.
Thanks to #n0rd for getting me to think about performance.
$array = 1..5 | foreach { $false }
Here's another idea. You have to remember, that it's .NET underneath:
$arr = [System.Array]::CreateInstance([System.Object], 5)
$arr.GetType()
$arr.Length
$arr = [Object[]]::new(5)
$arr.GetType()
$arr.Length
Result:
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
5
True True Object[] System.Array
5
Using new() has one distinct advantage: when you're programming in ISE and want to create an object, ISE will give you hint with all paramer combinations and their types. You don't have that with New-Object, where you have to remember the types and order of arguments.
$array = #()
for($i=0; $i -lt 5; $i++)
{
$array += $i
}
The solution I found was to use the New-Object cmdlet to initialize an array of the proper size.
$array = new-object object[] 5
for($i=0; $i -lt $array.Length;$i++)
{
$array[$i] = $FALSE
}
If I don't know the size up front, I use an arraylist instead of an array.
$al = New-Object System.Collections.ArrayList
for($i=0; $i -lt 5; $i++)
{
$al.Add($i)
}
Here's another typical way:
$array = for($i = 0; $i -le 4; $i++) { $false }
Or try this an idea. Works with powershell 5.0+.
[bool[]]$tf=((,$False)*5)
$array = foreach($i in 1..5) { $false }

Resources