Better "Split-ArrayInChunks" for PowerShell - arrays

When using an script that based on arrays for my "Split-ArrayInChunks" method takes it ages to process 190.000+ records, my initial version based on this code (see Split up an array into chunks and start a job on each one.)
$computers = gc c:\somedir\complist.txt
$n = 6
$complists = #{}
$count = 0
$computers |% {$complists[$count % $n] += #($_);$count++}
0..($n-1) |% {
start-job -scriptblock {gwmi win32_operatingsystem -computername $args} -argumentlist $complists[$_]
}
I found this article Performance: The += Operator (and When to Avoid It) and basically recommends the author to use "System.Collections.Generic.List" or "System.Collections.ArrayList" instead of arrays. So I came up with this implementation:
function Split-ArrayInChunks_UsingGenericList($inArray, $numberOfChunks) {
$list = New-Object System.Collections.Generic.List[System.Collections.Generic.List[PSCustomObject]]
$count = 0
# populate with empty lists
0..($numberOfChunks-1) | % {
$list.Add((New-Object System.Collections.Generic.List[PSCustomObject]))
}
# create packages
$inArray | % {
$list[$count % $numberOfChunks].Add($_);
$count++
}
return $list.ToArray()
}
I also tried to use "System.Collections.ArrayList", but this function returns an flat array. Inside the function is $arrayList an nested array, but once outside the function do I have an flat array (192169 items instead of 10 chunks).
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$arryList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
$arryList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
$arryList[$count % $numberOfChunks].Add($_);
$count++
}
Write-Host 'Number of arryList:'$arryList.Count
Write-Host 'Number of items in first arryList:' $arryList[0].Count
return $arryList
}
To illustrate the "flat" problem generates the following code...
Write-Host '-------------------------------'
$packages1 = Split-ArrayInChunks_UsingGenericList $data.CrmRecords 10
Write-Host 'Number of packages1:'$packages1.Count
Write-Host 'Number of items in first package1:' $packages1[0].Count
Write-Host '-------------------------------'
$packages2 = Split-ArrayInChunks_UsingArrayList $data.CrmRecords 10
Write-Host 'Number of packages2:'$packages2.Count
Write-Host 'Number of items in first package2:' $packages2[0].Count
...this output:
-------------------------------
Number of packages1: 10
Number of items in first package1: 19215
-------------------------------
Number of arryList: 10
Number of items in first arryList: 19215
Number of packages2: 192169
Number of items in first package2: 1
So I have two questions:
Any option to improve improve my "Split-ArrayInChunks_UsingArrayList" version (e.g. faster, more readable)?
Why is the return value of "ArrayInChunks_UsingArrayList" an flat array, inside the function is %arrayList an nested array?
Update 2016-02-04: I updated my code based on the feedback (use [void] to prevent the polluting of the output) and it works. The only strang thing is the fact that when I use |format-table is my version (Split-ArrayInChunks_UsingArrayList) again printed as flat list:
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$arryList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arryList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
[void]$arryList[$count % $numberOfChunks].Add($_);
$count++
}
return $arryList
}
function Split-ArrayInChunks_CommunityVersion($inArray, $numberOfChunks) {
$Lists = #{}
$count = 0
# populate
0..($numberOfChunks-1) | % {
$Lists[$_] = New-Object System.Collections.ArrayList
}
$inArray | % {
[void]$Lists[$count % $numberOfChunks].Add($_);
$count++
}
return $Lists
}
When I execute this code...
Write-Host 'CommunityVersion'
Write-Host '-------------------------------'
Split-ArrayInChunks_CommunityVersion $list 6 | Format-Table -AutoSize
Write-Host 'ArrayInChunks_UsingArrayList'
Write-Host '-------------------------------'
Split-ArrayInChunks_UsingArrayList $list 6 | Format-Table -AutoSize
... is this the output in the console:
CommunityVersion
-------------------------------
Name Value
---- -----
5 {denn, getan, verhaftet}
4 {haben, Böses, Morgens, war}
3 {verleumdet, etwas, eines, es}
2 {Josef K., er, er, er}
1 {musste, dass, wurde, sagte}
0 {Jemand, ohne, hätte, »Wie ein Hund!«}
ArrayInChunks_UsingArrayList
-------------------------------
Jemand
ohne
hätte
»Wie ein Hund!«
musste
dass
wurde
sagte
Josef K.
er
er
er
verleumdet
etwas
eines
es
haben
Böses
Morgens
war
denn
getan
verhaftet
I do not understand why "ArrayInChunks_UsingArrayList" is printed as list, it is an nested array, just like "ArrayInChunks_CommunityVersion".

Okay, here's how I'd do that:
function Split-ArrayInChunks_UsingArrayList($inArray, $numberOfChunks) {
$Lists = #{}
$count = 0
# populate
0..($numberOfChunks-1) | % {
$Lists[$_] = New-Object System.Collections.ArrayList
}
$inArray | % {
[void]$Lists[$count % $numberOfChunks].Add($_);
$count++
}
Write-Host 'Number of arryList:'$Lists.Count
Write-Host 'Number of items in first arryList:' $Lists[0].Count
return $Lists
}

Turns out the usage of "$inArray | % " makes the operation so slow. When using a normal foreach loop takes it less then 2 second to create the chunks. When using the "$inArray | % " based version it takes 20 seconds:
function Split-ArrayInChunks_Fast($inArray, $numberOfChunks) {
$arrayList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arrayList.Add((New-Object System.Collections.ArrayList))
}
foreach($elem in $inArray) {
[void]$arrayList[$count % $numberOfChunks].Add($elem)
$count++
}
return $arrayList.ToArray()
}
function Split-ArrayInChunks_Slow($inArray, $numberOfChunks) {
$arrayList = New-Object System.Collections.ArrayList
$count = 0
# populate
0..($numberOfChunks-1) | % {
[void]$arrayList.Add((New-Object System.Collections.ArrayList))
}
$inArray | % {
[void]$arrayList[$count % $numberOfChunks].Add($_);
$count++
}
return $arrayList.ToArray()
}

Related

Import CSV into a variable-size two-dimensional array with PowerShell

Background
I am trying to number each item in a WBS with PowerShell. The WBS is on a spreadsheet. For example, if you have the following WBS (4-level depth) from Wikipedia:
The result should be:
1
1.1
1.1.1
1.1.1.1
1.1.1.2
1.1.1.3
1.1.1.4
1.1.1.5
1.1.1.6
1.1.2
1.1.3
1.1.4
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
1.11
Problem
I decided to export the WBS to CSV and read it with PowerShell:
Import-Csv -LiteralPath .\WBS.csv -Header Lv1,Lv2,Lv3,Lv4 |
ForEach-Object {
$_ | Add-Member -MemberType NoteProperty -Name Lv1i -Value $null
$_ | Add-Member -MemberType NoteProperty -Name Lv2i -Value $null
$_ | Add-Member -MemberType NoteProperty -Name Lv3i -Value $null
$_ | Add-Member -MemberType NoteProperty -Name Lv4i -Value $null
$_
} | Set-Variable wbs
$Lv1i = 0;
$wbs | ForEach-Object {
if ($_.Lv1 -ne "") {
$Lv1i = $Lv1i + 1;
$_.Lv1i = $Lv1i;
} else {
$_.Lv1i = $Lv1i;
}
}
$Lv2i = 0;
$wbs | ForEach-Object {
if ($_.Lv2 -ne "") {
$Lv2 = $_.Lv2;
$Lv2i = $Lv2i + 1;
$_.Lv2i = $Lv2i;
} else {
if ($_.Lv1 -eq "") {
$_.Lv2i = $Lv2i;
} else {
$Lv2i = 0;
}
}
}
$Lv3i = 0;
$wbs | ForEach-Object {
if ($_.Lv3 -ne "") {
$Lv3 = $_.Lv3;
$Lv3i = $Lv3i + 1;
$_.Lv3i = $Lv3i;
} else {
if (($_.Lv1 -ne "") -or ($_.Lv2 -ne "")) {
$Lv3i = 0;
} else {
$_.Lv3i = $Lv3i;
}
}
}
$Lv4i = 0;
$wbs | ForEach-Object {
if ($_.Lv4 -ne "") {
$Lv4 = $_.Lv4;
$Lv4i = $Lv4i + 1;
$_.Lv4i = $Lv4i;
} else {
if (($_.Lv1 -ne "") -or ($_.Lv2 -ne "") -or ($_.Lv3 -ne "")) {
$Lv4i = 0;
} else {
$_.Lv4i = $Lv4i;
}
}
}
$wbs | ForEach-Object { "{0} {1} {2} {3} `t {4}.{5}.{6}.{7}" -F $_.Lv1, $_.Lv2, $_.Lv3, $_.Lv4,$_.Lv1i, $_.Lv2i, $_.Lv3i, $_.Lv4i } `
| ForEach-Object { $_.Trim(".") }
The code above works for me, but it supports only 4-level depth WBS. I want to improve it to handle any depth. To implement this requirement, I think it has to read the CSV file into a variable-size two-dimensional array. But I could not found the robust (support commas and line breaks in cell) way to do it in PowerShell.
Question
Is there any way to import a CSV into a variable-size two-dimensional array with PowerShell? Cells in the CSV could contain commas, double quotes, or line breaks.
Given this sample input data (sample.csv):
Aircraft System;;;
;Air Vehicle;;
;;Airframe;
;;;Airfram Integration
;;;Fuselage
;;;Wing
;;Propulsion;
;;Vehicle Subsystems;
;;Avionics;
;System Engineering;;
Other;;;
the following PowerShell script
$cols = 5
$data = Import-Csv .\sample.csv -Delimiter ";" -Encoding UTF8 -Header (1..$cols)
$stack = #()
$prev = 0
foreach ($row in $data) {
for ($i = 0; $i -lt $cols; $i++) {
$value = $row.$i
if (-not $value) { continue }
if ($i -eq $prev) {
$stack[$stack.Count-1]++
} elseif ($i -eq $prev + 1) {
$stack += 1
} elseif ($i -lt $prev) {
$stack = $stack[0..($i-1)]
$stack[$stack.Count-1]++
}
$prev = $i
Write-Host $($stack -join ".") $value
}
}
outputs
1 Aircraft System
1.1 Air Vehicle
1.1.1 Airframe
1.1.1.1 Airfram Integration
1.1.1.2 Fuselage
1.1.1.3 Wing
1.1.2 Propulsion
1.1.3 Vehicle Subsystems
1.1.4 Avionics
1.2 System Engineering
2 Other
To save the result, instead of printing it out to the console, e.g. this:
$result = foreach ($row in $data) {
for ($i = 0; $i -lt $cols; $i++) {
# ...
[pscustomobject]#{outline = $($stack -join "."); text = $value}
}
}
would give $result as
outline text
------- ----
1 Aircraft System
1.1 Air Vehicle
1.1.1 Airframe
1.1.1.1 Airfram Integration
1.1.1.2 Fuselage
1.1.1.3 Wing
1.1.2 Propulsion
1.1.3 Vehicle Subsystems
1.1.4 Avionics
1.2 System Engineering
2 Other
Not quite as succinct as #Tomalek's answer, but doesn't use an inner loop and accumulates the results into a variable...
Given:
$csv = #"
"Aircraft System"
, "Air Vehicle"
,, "Airframe"
,,, "Airframe Integration, Assembly, Test and Checkout"
,,, "Fuselage"
,,, "Wing"
,,, "Empennage"
,,, "Nacelle"
,,, "Other Airframe Components 1..n (Specify)"
,, "Propulsion"
,, "Vehicle Subsystems"
,, "Avionics"
, "System Engineering"
, "Program Management"
, "System Test and Evaluation"
, "Training"
, "Data"
, "Peculiar Support Equipment"
, "Common Support Equipment"
, "Operational/Site Activation"
, "Industrial Facilities"
, "Initial Spares and Repair Parts"
"#
the code:
# increase "1..9" to , e.g. "1..99" if you want to handle deeper hierarchies
$headers = 1..9;
$data = $csv | ConvertFrom-Csv -Header $headers;
# this variable does the magic - it tracks the index of the current node
# at each level in the hierarchy - e.g. 1.1.1.5 => #( 1, 1, 1, 5 ). each
# time we find a sibling or a new child we edit this array to append or
# increment the last item.
$indexes = new-object System.Collections.ArrayList;
$depth = 0;
$results = new-object System.Collections.ArrayList;
foreach( $item in $data )
{
# we can't nest by more than one level at a time, so this row must have
# a value at either the same depth as the previous if it's a sibling,
# the next depth if it's the first child, or a shallower index if we've
# reached the end of a nested list.
if( $item.($depth + 1) )
{
# this is the first child node of the previous node
$null = $indexes.Add(1);
$depth += 1;
}
elseif( $item.$depth )
{
# this is a sibling of the previous node, so increment the last index
$indexes[$depth - 1] += 1;
}
else
{
# this is the first item after a list of siblings (e.g. 1.1.2), so we
# need to look at shallower properties until we find a value
while( ($depth -gt 0) -and -not $item.$depth )
{
$indexes.RemoveAt($depth - 1);
$depth -= 1;
}
if( $depth -lt 1 )
{
throw "error - no shallower values found"
}
# don't forget this item is a sibling of the previous node at this level
# since it's not the *first* child, so we need to increment the counter
$indexes[$depth - 1] += 1;
}
$results += $indexes -join ".";
}
produces output:
$results
#1
#1.1
#1.1.1
#1.1.1.1
#1.1.1.2
#1.1.1.3
#1.1.1.4
#1.1.1.5
#1.1.1.6
#1.1.2
#1.1.3
#1.1.4
#1.2
#1.3
#1.4
#1.5
#1.6
#1.7
#1.8
#1.9
#1.10
#1.11

Powershell how to count all elements in a multidimensional array

I've been trying to figure out how to count all elements in an multidimensional array. But .Count only returns the first dimension.
after i gave up to find a proper solution i just created this loop to move all elements to the first dimension and count them. but this is really only a hack.
$mdarr = #((0,1,2,3,4),(5,6,7,8,9),(10,11,12,13,14))
$filecount = New-Object System.Collections.ArrayList
for($i = 0; $i -lt $mdarr.Length; ++$i) {
$filecount += $mdarr[$i]
}
$filecount.Count
How would this be done properly without processing the array first?
In the loop you are adding the elements of $mdarr[$i]. You later count the elements of the merge result. Instead of the adding to an ArrayList you could keep a count:
$xs = #((0,1,2,3,4),(5,6,7,8,9),(10,11,12,13,14))
$sum = 0;
foreach ($x in $xs) { $sum += $x.Count }
$sum // 15
# alternatively
$xs | % { $sum += $_.Count }
# or
($xs | % { $_.Count } | Measure-Object -Sum).Sum
# or
$xs | % { $_.Count } | Measure-Object -Sum | select -Expand Sum
one line code: you can flatten the multidimensional array into a anonymous array, and count the anonymous array
$xs = #((0,1,2,3,4),(5,6,7,8,9),(10,11,12,13,14))
#($xs | ForEach-Object {$_}).count #result 15
or multiline that is more readable:
$xs = #((0,1,2,3,4),(5,6,7,8,9),(10,11,12,13,14))
$xs_flatten = #($xs | ForEach-Object {$_})
$xs_flatten_count = $xs_flatten.count
echo $xs_flatten_count #result 15
put a dimension identifier index in front of .count
e.g $xs[0].count
this way, instead of returning the count of dimensions, it returns the number of rows for a given dimension

PowerShell foreach() multidimensional array based on first element

I have a fairly basic multidimensional array which looks something like this:
2017,123
2017,25
2018,5
2018,60
2017,11
I wish to run a ForEach() loop or similar function to total the numbers in the second element based on the year indicated in the first so that I end up with an output like this:
2017,159
2018,65
How do I best accomplish this?
The following solution is concise, but not fast:
# input array
$arr =
(2017,123),
(2017,25),
(2018,5),
(2018,60),
(2017,11)
# Group the sub-arrays by their 1st element and sum all 2nd elements
# in each resulting group.
$arr | Group-Object -Property { $_[0] } | ForEach-Object {
, ($_.Name, (($_.Group | ForEach-Object { $_[1] } | Measure-Object -Sum).Sum))
}
Assuming your array looks like "$array" this will give you what you need:
$2017total = 0
$2018total = 0
$array = "2017,123",
"2017,25",
"2018,5",
"2018,60",
"2017,11" | % {
if ($_ -match '2017') {
$2017 = ($_ -split ',')[1]
$2017total += $2017
}
else {
$2018 = ($_ -split ',')[1]
$2018total += $2018
}
}
Write-Host "2017,$2017total"
Write-Host "2018,$2018total"

Reference Elements in PowerShell Multidimensional Array

I have CSV file
My PowerShell Script attempts to store SourceIP, DestinationIP, and Traffic in multidimensional array
$source = #((Import-Csv D:\Script\my.csv).SourceIP)
$dest = #((Import-Csv D:\Script\my.csv).DestinationIP)
$t = #((Import-Csv D:\Script\my.csv).Traffic)
$multi = #($source),#($dest),#($t)
When I try to read from first element of $multi, I expect to get a list of SourceIP
foreach ($q in $multi){
write-host $q[0]
write-host `n
}
But instead, I get SourceIP, DestinationIP, Traffic, i.e.
10.153.128.110
10.251.68.80
3.66 GB
And if I try
foreach ($q in $multi){
write-host $q[0][0][0]
write-host `n
}
I get
1
1
3
How to troubleshoot?
UPDATE
Ultimate goal is to
Count total traffic
Count traffic if SourceIP or Destination IP fits into certain pattern, i.e. 10.251.22.x
Get percentage
UPDATE II
I am able to get code to import CSV and tally total bandwidth only, but I also need bandwidth from SourceIP and DestinationIP with certain pattern.
$t = #((Import-Csv D:\Script\my.csv).Traffic)
foreach ($k in $t){
write-host $k
}
foreach ($i in $t){
$j += ,#($i.split(" "))
}
foreach ($m in $j){
switch ($m[1]){
GB {
$m[0] = [int]($m[0]) * 1000
$m[1] = 'MB'
}
MB {}
KB {
$m[0] = [int]($m[0]) / 1000
$m[1] = 'MB'
}
}
$total_bandwidth += $m[0]
}
write-host Total bandwidth is ("{0:N2}" -f $total_bandwidth) MB
You should not split array of object to multiple parallel arrays of properties. It is much easy to operate when objects are whole.
$Scale=#{
B=1e00
KB=1e03
MB=1e06
GB=1e09
TB=1e12
}
$TrafficBytes={
$a=-split$_.Traffic
[double]$a[0]*$Scale[$a[1]]
}
Import-Csv D:\Script\my.csv|
ForEach-Object $TrafficBytes|
Measure-Object -Sum #total traffic
Import-Csv D:\Script\my.csv|
Where-Object {$_.DestinationIP-like'10.*'}| #condition
ForEach-Object $TrafficBytes|
Measure-Object -Sum #traffic by condition
PetSerAl has a good idea for the conversion, but here is a way to do this that requires iterating the CSV only once and will give your percentages.
$filter = "10.251.22.*"
$Scale=#{
B=1e00
KB=1e03
MB=1e06
GB=1e09
TB=1e12
}
$myCsv = Import-Csv D:\Script\my.csv | Select-Object *, #{ Name = "TrafficBytes"; Expression = { $a = -split $_.Traffic; [double] $a[0] * $Scale[$a[1]] } }
$trafficFiltered = $myCsv | Group-Object { $_.SourceIP -like $filter -or $_.DestinationIP -like $filter } | Select-Object #{ Name = "IPFilter"; Expression = { if ($_.Name -eq $true) { $filter } else { "Other" } } }, #{ Name = "TrafficBytes"; Expression = { ($_.Group | Measure-Object -Sum "TrafficBytes").Sum } }
$trafficTotal = $myCsv | Measure-Object -Sum TrafficBytes
$trafficReport = Select-Object IPFilter, TrafficBytes, #{ Name = "Percent"; Expression = { "{0:P}" -f $_.TrafficBytes / $trafficTotal.Sum * 100.0 } }
$trafficReport

How to fill an array efficiently in Powershell

I want to fill up a dynamic array with the same integer value as fast as possible using Powershell.
The Measure-Command shows that it takes 7 seconds on my system to fill it up.
My current code (snipped) looks like:
$myArray = #()
$length = 16385
for ($i=1;$i -le $length; $i++) {$myArray += 2}
(Full code can be seen on gist.github.com or on superuser)
Consider that $length can change. But for better understanding I chose a fixed length.
Q: How do I speed up this Powershell code?
You can repeat arrays, just as you can do with strings:
$myArray = ,2 * $length
This means »Take the array with the single element 2 and repeat it $length times, yielding a new array.«.
Note that you cannot really use this to create multidimensional arrays because the following:
$some2darray = ,(,2 * 1000) * 1000
will just create 1000 references to the inner array, making them useless for manipulation. In that case you can use a hybrid strategy. I have used
$some2darray = 1..1000 | ForEach-Object { ,(,2 * 1000) }
in the past, but below performance measurements suggest that
$some2darray = foreach ($i in 1..1000) { ,(,2 * 1000) }
would be a much faster way.
Some performance measurements:
Command Average Time (ms)
------- -----------------
$a = ,2 * $length 0,135902 # my own
[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length) 7,15362 # JPBlanc
$a = foreach ($i in 1..$length) { 2 } 14,54417
[int[]]$a = -split "2 " * $length 24,867394
$a = for ($i = 0; $i -lt $length; $i++) { 2 } 45,771122 # Ansgar
$a = 1..$length | %{ 2 } 431,70304 # JPBlanc
$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 } 10425,79214 # original code
Taken by running each variant 50 times through Measure-Command, each with the same value for $length, and averaging the results.
Position 3 and 4 are a bit of a surprise, actually. Apparently it's much better to foreach over a range instead of using a normal for loop.
Code to generate above chart:
$length = 16384
$tests = '$a = ,2 * $length',
'[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length)',
'$a = for ($i = 0; $i -lt $length; $i++) { 2 }',
'$a = foreach ($i in 1..$length) { 2 }',
'$a = 1..$length | %{ 2 }',
'$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 }',
'[int[]]$a = -split "2 " * $length'
$tests | ForEach-Object {
$cmd = $_
$timings = 1..50 | ForEach-Object {
Remove-Variable i,a -ErrorAction Ignore
[GC]::Collect()
Measure-Command { Invoke-Expression $cmd }
}
[pscustomobject]#{
Command = $cmd
'Average Time (ms)' = ($timings | Measure-Object -Average TotalMilliseconds).Average
}
} | Sort-Object Ave* | Format-Table -AutoSize -Wrap
Avoid appending to an array in a loop. It's copying the existing array to a new array with each iteration. Do this instead:
$MyArray = for ($i=1; $i -le $length; $i++) { 2 }
Using PowerShell 3.0 you can use (need .NET Framework 3.5 or upper):
[int[]]$MyArray = ([System.Linq.Enumerable]::Repeat(2, 65000))
Using PowerShell 2.0
$AnArray = 1..65000 | % {2}
It is not clear what you are trying. I tried looking at your code. But, $myArray +=2 means you are just adding 2 as the element. For example, here is the output from my test code:
$myArray = #()
$length = 4
for ($i=1;$i -le $length; $i++) {
Write-Host $myArray
$myArray += 2
}
2
2 2
2 2 2
Why do you need to add 2 as the array element so many times?
If all you want is just fill the same value, try this:
$myArray = 1..$length | % { 2 }
If you need it really fast, then go with ArrayLists and Tuples:
$myArray = New-Object 'Collections.ArrayList'
$myArray = foreach($i in 1..$length) {
[tuple]::create(2)
}
and if you need to sort it later then use this (normally a bit slower):
$myArray = New-Object 'Collections.ArrayList'
foreach($i in 1..$length) {
$myArray.add(
[tuple]::create(2)
)
}
both versions are in the 20ms range for me ;-)

Resources