I am having a difficult time understanding the most efficient to process large datasets/arrays in PowerShell. I have arrays that have several million items that I need to process and group. This list is always different in size meaning it could be 3.5 million items or 10 million items.
Example: 3.5 million items they group by "4's" like the following:
Items 0,1,2,3 Group together 4,5,6,7 Group Together and so on.
I have tried processing the array using a single thread by looping through the list and assigning to a pscustomobject which works it just takes 45-50+ minutes to complete.
I have also attempted to break up the array into smaller arrays but that causes the process to run even longer.
$i=0
$d_array = #()
$item_array # Large dataset
While ($i -lt $item_array.length){
$o = "Test"
$oo = "Test"
$n = $item_array[$i];$i++
$id = $item_array[$i];$i++
$ir = $item_array[$i];$i++
$cs = $item_array[$i];$i++
$items = [PSCustomObject]#{
'field1' = $o
'field2' = $oo
'field3' = $n
'field4' = $id
'field5' = $ir
'field6'= $cs
}
$d_array += $items
}
I would imagine if I applied a job scheduler that would allow me to run the multiple jobs would cut the process time down by a significant amount, but I wanted to get others takes on a quick effective way to tackle this.
If you are working with large data, using C# is also effective.
Add-Type -TypeDefinition #"
using System.Collections.Generic;
public static class Test
{
public static List<object> Convert(object[] src)
{
var result = new List<object>();
for(var i = 0; i <= src.Length - 4; i+=4)
{
result.Add( new {
field1 = "Test",
field2 = "Test",
field3 = src[i + 0],
field4 = src[i + 1],
field5 = src[i + 2],
field6 = src[i + 3]
});
}
return result;
}
}
"#
$item_array = 1..10000000
$result = [Test]::Convert($item_array)
While rokumarus version is unsurpassed, here my try with my local measurements from js2010
Same $item_array = 1..100000 applied to all versions
> .\SO_56406847.ps1
measuring...BDups
measuring...LotPings
measuring...Theo
measuring...js2010
measuring...rokumaru
BDups = 75,9949897 TotalSeconds
LotPings = 2,3663763 TotalSeconds
Theo = 2,4469917 TotalSeconds
js2010 = 2,9198114 TotalSeconds
rokumaru = 0,0109287 TotalSeconds
## Q:\Test\2019\06\01\SO_56406847.ps1
$i=0
$item_array = 1..100000 # Large dataset
'measuring...LotPings'
$LotPings = measure-command {
$d_array = for($i=0;$i -lt $item_array.length;$i+=4){
[PSCustomObject]#{
'field1' = "Test"
'field2' = "Test"
'field3' = $item_array[$i]
'field4' = $item_array[$i+1]
'field5' = $item_array[$i+2]
'field6' = $item_array[$i+3]
}
}
} # measure-command
How's this? 32.5x faster. Making arrays with += kills puppies. It copies the whole array every time.
$i=0
$item_array = 1..100000 # Large dataset
'measuring...'
# original 1 min 5 sec
# mine 2 sec
# other answer, 2 or 3 sec
# c# version 0.029 sec, 2241x faster!
measure-command {
$d_array =
While ($i -lt $item_array.length){
$o = "Test"
$oo = "Test"
$n = $item_array[$i];$i++
$id = $item_array[$i];$i++
$ir = $item_array[$i];$i++
$cs = $item_array[$i];$i++
# $items =
[PSCustomObject]#{
'field1' = $o
'field2' = $oo
'field3' = $n
'field4' = $id
'field5' = $ir
'field6'= $cs
}
# $d_array += $items
}
}
You could optimize this somewhat using an ArrayList, or perhaps even better by using a strongly typed List but going through millions of elements in an array will still take time..
As for your code: there is no need to capture the array item values in a variable first and use that later to add to the PSCustomObject.
$item_array = 'a','b','c','d','e','f','g','h' # Large dataset
$result = New-Object System.Collections.Generic.List[PSCustomObject]
# or use an ArrayList: $result = New-Object System.Collections.ArrayList
$i = 0
While ($i -lt $item_array.Count) {
[void]$result.Add(
[PSCustomObject]#{
'field1' = "Test" # $o
'field2' = "Test" # $oo
'field3' = $item_array[$i++] #$n
'field4' = $item_array[$i++] #$id
'field5' = $item_array[$i++] #$ir
'field6' = $item_array[$i++] #$cs
}
)
}
# save to a CSV file maybe ?
$result | Export-Csv 'D:\blah.csv' -NoTypeInformation
If you need the result to become a 'normal' array again, use $result.ToArray()
Related
I've been trying to get some scripts together to automate some of the manual tasks that we still do reporting on. This one I'm trying to coax into connecting to each of the remote servers specified (I can AD link and filter it later), pull disk information, do a basic calculation, some formatting, and then stick it in an array to pull later.
I'm currently stuck with errors stating that I'm "attempting to divide by 0", and my array returns no data (I'm assuming because of the above". There has to be something small I'm missing. Well, hopefully small. Here's where I've gotten to:
#Variable listing servers to check. Can convert to a csv, or direct connection to AD
using OU's.
$ServersToScan = #('x, y, z')
#Blank Array for Report
$finalReport = #()
#Threshold Definition %
$Critical = 5
$Warning = 15
#Action for each server
foreach ($i in $ServersToScan)
{
Enter-PSSession -ComputerName $i
#Fixed Disk Info Gather
$diskObj = Get-CimInstance -ClassName Win32_LogicalDisk | Where-Object { $_.DriveType -eq 3 }
#Iterate each disk information - rewrite as foreach ($x in $diskObj) - rewritten 3/31/22
foreach ($diskObj in $diskObj)
{
# Calculate the free space percentage
$percentFree = [int](($_.FreeSpace / $_.Size) * 100)
# Determine the "Status"
if ($percentFree -gt $Warning) {
$Status = 'Normal'
}
elseif ($percentFree -gt $Critical) {
$Status = 'Warning'
}
elseif ($percentFree -le $Critical) {
$Status = 'Critical'
}
# Compose the properties of the object to add to the report
$tempObj = [ordered]#{
'Computer Name' = $i
'Drive Letter' = $_.DeviceID
'Drive Name' = $_.VolumeName
'Total Space (GB)' = [int]($_.Size / 1gb)
'Free Space (GB)' = [int]($_.FreeSpace / 1gb)
'Free Space (%)' = "{0}{1}" -f [int]$percentFree, '%'
'Status' = $Status
}
# Add the object to the final report
$finalReport += New-Object psobject -property $tempObj
}
Exit-PSSession
}
return $finalReport
Any insight would be great - thank you very much!!
There are 2 main problems, the first one, as Jeff Zeitlin pointed out, the automatic variable $_ ($PSItem) has no effect in a foreach loop, it is effectively $null:
[int](($null / $null) * 100)
# => RuntimeException: Attempted to divide by zero.
The second problem is your use of Enter-PSSession, which is used exclusively in interactive sessions. For an unattended script you would use Invoke-Command instead, however, in this case we could also rely on Get-CimInstance -ComputerName to query the remote computers (note that this operation is performed in parallel and does not require a loop over the $serversToScan array).
$ServersToScan = 'x, y, z'
#Threshold Definition %
$Critical = 0.5
$Warning = 0.15
$params = #{
ClassName = 'Win32_LogicalDisk'
Filter = "DriveType = '3'"
ComputerName = $ServersToScan
}
$finalReport = Get-CimInstance #params | ForEach-Object {
$free = $_.FreeSpace / $_.Size
$status = switch($free) {
{ $_ -gt $Warning } { 'Normal'; break }
{ $_ -gt $Critical } { 'Warning'; break }
Default { 'Critical' }
}
[pscustomobject]#{
'Computer Name' = $_.PSComputerName
'Drive Letter' = $_.DeviceID
'Drive Name' = $_.VolumeName
'Total Space (GB)' = $_.Size / 1Gb
'Free Space (GB)' = $_.FreeSpace / 1Gb
'Free Space (%)' = $free.ToString('P0')
'Status' = $status
}
}
I have a PS script that will import a csv into several arrays and I need it to populate a table in word. I am able to get the data into the arrays, and create a table with headers and the correct number of rows, but cannot get the data from the arrays into the table. Doing lots of google searches led me to the following code. Any help is greatly appreciated.
Sample of My_File.txt
Number of rows will vary, but the header row is always there.
component,id,iType,
VCT,AD-1234,Story,
VCT,Ad-4567,DR,
$component = #()
$id = #()
$iType =#()
$vFile = Import-CSV ("H:\My_file.txt")
$word = New-Object -ComObject "Word.Application"
$vFile | ForEach-Object {
$component += $_.components
$id += $_.id
$iType +=_.iType
}
$template = $word.Documents.Open ("H:\Test.docx")
$template = $word.Document.Add()
$word.Visible = $True
$Number_rows = ($vFile.count +1)
$Number_cols = 3
$range = $template.range()
$template.Tables.add($range, $Number_rows, $Number_cols) | out-null
$table = $template.Tables.Item(1)
$table.cell(1,1).Range.Text = "Component"
$table.cell(1,2).Range.Text = "ID"
$table.cell(1,3).Range.text = "Type"
for ($i=0; $i -lt; $vFile.count+2, $i++){
$table.cell(($i+2),1).Range.Text = $component[$i].components
$table.cell(($i+2),2).Range.Text = $id[$i].id
$table.cell(($i+2),3).Range.Text = $iType[$i].iType
}
$Table.Style = "Medium Shading 1 - Accent 1"
$template.SaveAs("H:\New_Doc.docx")
Don't separate the rows in the parsed CSV object array into three arrays, but leave the collection as-is and use the data to fill the table using the properties of that object array directly.
I took the liberty of renaming your variable $vFile into $data as to me at least this is more descriptive of what is in there.
Try
$data = Import-Csv -Path "H:\My_file.txt"
$word = New-Object -ComObject "Word.Application"
$word.Visible = $True
$template = $word.Documents.Open("H:\Test.docx")
$Number_rows = $data.Count +1 # +1 for the header
$Number_cols = 3
$range = $template.Range()
[void]$template.Tables.Add($range, $Number_rows, $Number_cols)
$table = $template.Tables.Item(1)
$table.Style = "Medium Shading 1 - Accent 1"
# write the headers
$table.cell(1,1).Range.Text = "Component"
$table.cell(1,2).Range.Text = "ID"
$table.cell(1,3).Range.text = "Type"
# next, add the data rows
for ($i=0; $i -lt $data.Count; $i++){
$table.cell(($i+2),1).Range.Text = $data[$i].component
$table.cell(($i+2),2).Range.Text = $data[$i].id
$table.cell(($i+2),3).Range.Text = $data[$i].iType
}
$template.SaveAs("H:\New_Doc.docx")
When done, do not forget to close the document, quit word and clean up the used COM objects:
$template.Close()
$word.Quit()
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($template)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($word)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
I have two arrays in Powershell:
$headings = ['h1', 'h2']
$values = [3, 4]
It is guaranteed that both arrays have the same length. How can I create an array where the values of $headings become the headings of the $values array?
I want to be able to do something like this:
$result['h2'] #should return 4
Update:
The arrays $headings and $values are of type System.Array.
As stated above you'll need a PowerShell hashtable. By the way array in PowerShell are defined via #(), see about_arrays for further information.
$headings = #('h1', 'h2')
$values = #(3, 4)
$combined = #{ }
if ($headings.Count -eq $values.Count) {
for ($i = 0; $i -lt $headings.Count; $i++) {
$combined[$headings[$i]] = $values[$i]
}
}
else {
Write-Error "Lengths of arrays are not the same"
}
$combined
Dumping the content of combined returns:
$combined
Name Value
---- -----
h2 4
h1 3
Try something like this :
$hash = [ordered]#{ h1 = 3; h2 = 4}
$hash["h1"] # -- Will give you 3
## Next Approach
$headings = #('h1', 'h2') #array
$values = #(3, 4) #array
If($headings.Length -match $values.Length)
{
For($i=0;$i -lt $headings.Length; $i++)
{
#Formulate your Hashtable here like the above and then you would be able to pick it up
#something like this in hashtable variable $headings[$i] = $values[$i]
}
}
PS: I am just giving you the logical headstart and helping you with the hashtable part. Code is upto you.
I have two arrays: array1 [POP1, POP2, POP3 .... POP30] and array2 [61,61,62 ... 61]. I need to create a new object with value 62 and its POP.
In this example:
POP3 62.
I am simplifying the explanation because I've already been able to get the value from the database.
Can someone help me?
Code:
$target = #( )
$ini = 0 | foreach {
$apiurl = "http://xxxxxxxxx:8080/fxxxxp/events_xxxx.xml"
[xml]$ini = (New-Object System.Net.WebClient).downloadstring($apiurl)
$target = $ini.events.event.name
$nodename = $target
$target = $ini.events.event.statuscode
$statuscode = $target
}
$column1 = #($nodename)
$column2 = #($statuscode)
$i = 0
($column1,$column2)[0] | foreach {
New-Object PSObject -Property #{
POP = $Column1[$i]
Status = $column2[$i++]
} | ft -AutoSize
I really couldn't figure out what you were trying to do, but you definitely over complicated it. Here is what I thought of your code:
# Here you have an empty array
$target = #( )
# Here you set call a Foreach, but you don't even need it
$ini = 0 | foreach {
$apiurl = "http://xxxxxxxxx:8080/fxxxxp/events_xxxx.xml"
[xml]$ini = (new-object System.Net.WebClient).downloadstring($apiurl)
# You duplicated variables here. Just set $nodename = $ini.events.event.name
$target = $ini.events.event.name
$nodename = $target
# You duplicate variables here. Just set $statuscode = $ini.events.event.name
$target = $ini.events.event.statuscode
$statuscode = $target
}
# You should already have arrays, so now you're making making more arrays duplicating variables again
$column1 = #($nodename)
$column2 = #($statuscode)
# counter, but you won't need it
$i = 0
# So here, youre making a new array again, but this contains two nested arrays. I don't get it.
($column1,$column2)[0] | foreach {
New-Object PSObject -Property #{
POP = $Column1[$i]
Status = $column2[$i++]
} | ft -AutoSize
} # You were missing a closing bracket for your foreach loop
Here is a solution that should probable work for you:
# Download the file
$apiurl = "http://xxxxxxxxx:8080/fxxxxp/events_xxxx.xml"
[xml]$ini = (New-Object System.Net.WebClient).DownloadString($apiurl)
# Set arrays
$nodename = $ini.events.event.name
$statuscode = $ini.events.event.statuscode
# Create $TableValues by looping through one array
$TableValues = foreach ( $node in $nodename )
{
[pscustomobject] #{
# The current node
POP = $node
# use the array method IndexOf
# This should return the position of the current node
# Then use that index to get the matching value of $statuscode
Status = $statuscode[$nodename.IndexOf($node)]
}
}
# Add a custom value
$TableValues += [pscustomobject] #{
POP = 'POP100'
Status = 100
}
$TableValues | Format-Table -AutoSize
Assuming that your intent is to create an array of custom objects constructed from the pairs of corresponding elements of 2 arrays of the same size:
A concise pipeline-based solution (PSv3+; a for / foreach solution would be faster):
$arr1 = 'one', 'two', 'three'
$arr2 = 1, 2, 3
0..$($arr1.Count-1) | % { [pscustomobject] #{ POP = $arr1[$_]; Status = $arr2[$_] } }
This yields:
POP Status
--- ------
one 1
two 2
three 3
I want to fill up a dynamic array with the same integer value as fast as possible using Powershell.
The Measure-Command shows that it takes 7 seconds on my system to fill it up.
My current code (snipped) looks like:
$myArray = #()
$length = 16385
for ($i=1;$i -le $length; $i++) {$myArray += 2}
(Full code can be seen on gist.github.com or on superuser)
Consider that $length can change. But for better understanding I chose a fixed length.
Q: How do I speed up this Powershell code?
You can repeat arrays, just as you can do with strings:
$myArray = ,2 * $length
This means »Take the array with the single element 2 and repeat it $length times, yielding a new array.«.
Note that you cannot really use this to create multidimensional arrays because the following:
$some2darray = ,(,2 * 1000) * 1000
will just create 1000 references to the inner array, making them useless for manipulation. In that case you can use a hybrid strategy. I have used
$some2darray = 1..1000 | ForEach-Object { ,(,2 * 1000) }
in the past, but below performance measurements suggest that
$some2darray = foreach ($i in 1..1000) { ,(,2 * 1000) }
would be a much faster way.
Some performance measurements:
Command Average Time (ms)
------- -----------------
$a = ,2 * $length 0,135902 # my own
[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length) 7,15362 # JPBlanc
$a = foreach ($i in 1..$length) { 2 } 14,54417
[int[]]$a = -split "2 " * $length 24,867394
$a = for ($i = 0; $i -lt $length; $i++) { 2 } 45,771122 # Ansgar
$a = 1..$length | %{ 2 } 431,70304 # JPBlanc
$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 } 10425,79214 # original code
Taken by running each variant 50 times through Measure-Command, each with the same value for $length, and averaging the results.
Position 3 and 4 are a bit of a surprise, actually. Apparently it's much better to foreach over a range instead of using a normal for loop.
Code to generate above chart:
$length = 16384
$tests = '$a = ,2 * $length',
'[int[]]$a = [System.Linq.Enumerable]::Repeat(2, $length)',
'$a = for ($i = 0; $i -lt $length; $i++) { 2 }',
'$a = foreach ($i in 1..$length) { 2 }',
'$a = 1..$length | %{ 2 }',
'$a = #(); for ($i = 0; $i -lt $length; $i++) { $a += 2 }',
'[int[]]$a = -split "2 " * $length'
$tests | ForEach-Object {
$cmd = $_
$timings = 1..50 | ForEach-Object {
Remove-Variable i,a -ErrorAction Ignore
[GC]::Collect()
Measure-Command { Invoke-Expression $cmd }
}
[pscustomobject]#{
Command = $cmd
'Average Time (ms)' = ($timings | Measure-Object -Average TotalMilliseconds).Average
}
} | Sort-Object Ave* | Format-Table -AutoSize -Wrap
Avoid appending to an array in a loop. It's copying the existing array to a new array with each iteration. Do this instead:
$MyArray = for ($i=1; $i -le $length; $i++) { 2 }
Using PowerShell 3.0 you can use (need .NET Framework 3.5 or upper):
[int[]]$MyArray = ([System.Linq.Enumerable]::Repeat(2, 65000))
Using PowerShell 2.0
$AnArray = 1..65000 | % {2}
It is not clear what you are trying. I tried looking at your code. But, $myArray +=2 means you are just adding 2 as the element. For example, here is the output from my test code:
$myArray = #()
$length = 4
for ($i=1;$i -le $length; $i++) {
Write-Host $myArray
$myArray += 2
}
2
2 2
2 2 2
Why do you need to add 2 as the array element so many times?
If all you want is just fill the same value, try this:
$myArray = 1..$length | % { 2 }
If you need it really fast, then go with ArrayLists and Tuples:
$myArray = New-Object 'Collections.ArrayList'
$myArray = foreach($i in 1..$length) {
[tuple]::create(2)
}
and if you need to sort it later then use this (normally a bit slower):
$myArray = New-Object 'Collections.ArrayList'
foreach($i in 1..$length) {
$myArray.add(
[tuple]::create(2)
)
}
both versions are in the 20ms range for me ;-)