Validate members of array - arrays

I have a string I am pulling from XML that SHOULD contain comma separated integer values. Currently I am using this to convert the string to an array and test each member of the array to see if it is an Int. Ultimately I still want an array in the end, as I also have an array of default success codes and I want to combine them. That said, I have never found this pattern of setting the test condition true then looping and potentially setting it to false to be all that elegant. So, I am wondering if there is a better approach. I mean, this works, and it's fast, and the code is easy to read, so in a sense there is no reason to change it, but if there is a better way...
$supplamentalSuccessCode = ($string.Split(',')).Trim()
$validSupplamentalSuccessCode = $true
foreach ($code in $supplamentalSuccessCode) {
if ($code -as [int] -isNot [int]) {
$validSupplamentalSuccessCode = $false
}
}
EDIT: To clarify, this example is fairly specific, but I am curious about a more generic solution. So imagine the array could contain values that need to be checked against a lookup table, or local drive paths that need to be checked with Test-Path. So more generically, I wonder if there is a better solution than the Set variable true, foreach, if test fails set variable false logic.
Also, I have played with a While loop, but in most situations I want to find ALL bad values, not exit validation on the first bad one, so I can provide the user with a complete error in a log. Thus the ForEach loop approach I have been using.

In PSv4+ you can enlist the help of the .Where() collection "operator" to determine all invalid values:
Here's a simplified example:
# Sample input.
$string = '10, no, 20, stillno, -1'
# Split the list into an array.
$codes = ($string.Split(',')).Trim()
# Test all array members with a script block passed to. Where()
# As usual, $_ refers to the element at hand.
# You can perform whatever validation is necessary inside the block.
$invalidCodes = $codes.Where({ $null -eq ($_ -as [int]) })
$invalidCodes # output the invalid codes, if any
The above yields:
no
stillno
Note that what .Where() returns is not a regular PowerShell array ([object[]]), but an instance of [System.Collections.ObjectModel.Collection[PSObject]], but in most situations the difference shouldn't matter.
A PSv2-compatible solution is a bit more cumbersome:
# Sample input.
$string = '10, no, 20, stillno, -1'
# Split the list into an array.
# Note: In PSv*3* you could use the simpler $codes = ($string.Split(',')).Trim()
# as in the PSv4+ solution.
$codes = foreach ($code in $string.Split(',')) { $code.Trim() }
# Emulate the behavior of .Where() with a foreach loop:
# Note that do you get an [object[]] instance back this time.
$invalidCodes = foreach ($code in $codes) { if ($null -eq ($code -as [int])) { $code } }

Related

Perl: grep from multiple arrays at once

I have multiple arrays (~32). I want to remove all blank elements from them. How can it be done in a short way (may be via one foreach loop or 1-2 command lines)?
I tried the below, but it's not working:
my #refreshArrayList=("#list1", "#list2", "#list3","#list4", "#list5", "#list6" , "#list7");
foreach my $i (#refreshArrayList) {
$i = grep (!/^\s*$/, $i);
}
Let's say, #list1 = ("abc","def","","ghi"); #list2 = ("qwe","","rty","uy", "iop"), and similarly for other arrays. Now, I want to remove all blank elements from all the arrays.
Desired Output shall be: #list1 = ("abc","def","ghi"); #list2 = ("qwe","rty","uy", "iop") ### All blank elements are removed from all the arrays.
How can it be done?
You can create a list of list references and then iterator over these, like
for my $list (\#list1, \#list2, \#list3) {
#$list = grep (!/^\s*$/, #$list);
}
Of course, you could create this list of list references also dynamically, i.e.
my #list_of_lists;
push #list_of_lists, \#list1;
push #list_of_lists, \#list2;
...
for my $list (#list_of_lists) {
#$list = grep (!/^\s*$/, #$list);
}
#$_ = grep /\S/, #$_ for #AoA; # #AoA = (\#ary1, \#ary2, ...)
Explanation
First, this uses the statement modifier, "inverting" the usual for loop syntax into the form STMT for LIST
The for(each) modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn).
It is mostly equivalent to a "normal" for loop, with the notable difference being that no scope is set and so there is no need to tear it down either, adding a small measure of efficiency.† Ae can have only one statement; but then again, that can be a do block. Having no scope means that we cannot declare lexical variables for the statement (unless a do block is used).
So the statement is #$_ = grep /\S/, #$_, executed for each element of the list.
In a for(each) loop, the variable that is set to each element in turn as the list is iterated over ("topicalizer") is an alias to those elements. So changing it changes elements. From perlsyn
If VAR is omitted, $_ is set to each value.
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop.
In our case $_ is always an array reference, and then the underlying array is rewritten by dereferencing it (#$_) and assigning to that the output list of grep, which consists only of elements that have at least one non-space character (/\S/).
† I ran a three-way benchmark, of the statement-modifier loop against a "normal" loop with and without a topical variable.
For adding 100e6 numbers I get 8-11% speedup (on both a desktop and a server) and with a more involved calculation ($r = ($r + $_ ) / sqrt($_)) it's 4-5%.
A side observation: In both cases the full for loop without a variable (using default $_ for the topicalizer) is 1-2% faster than the one with a lexical topical variable set.

IF contains more than one string do "this" else "this"

Trying to make a script that request more info (group Id) if there are SCOM groups with identical names:
function myFunction {
[CmdletBinding()]
Param(
[Parameter(Mandatory=$true)]
[string[]]$ObjectName
)
foreach ($o in $ObjectName) {
$p = Get-SCOMGroup -DisplayName "$o" | select DisplayName
<#
if ($p contains more than one string) {
"Request group Id"
} else {
"do this"
}
#>
}
}
Need help with the functionality in the comment block.
Wrap the value in an array subexpression #() and count how many entries it has:
if(#($p).Count -gt 1){"Request group Id"}
Note: This answer complements Mathias R. Jessen's helpful answer.
Counting the number of objects returned by a command:
Mathias' answer shows a robust, PowerShell v2-compatible solution based on the array sub-expression operator, #().
# #() ensures that the output of command ... is treated as an array,
# even if the command emits only *one* object.
# You can safely call .Count (or .Length) on the result to get the count.
#(...).Count
In PowerShell v3 or higher, you can treat scalars like collections, so that using just (...).Count is typically enough. (A scalar is a single objects, as opposed to a collections of objects).
# Even if command ... returns only *one* object, it is safe
# to call .Count on the result in PSv3+
(...).Count
These methods are typically, but not always interchangeable, as discussed below.
Choose #(...).Count, if:
you must remain PSv2-compatible
you want to count output from multiple commands (separated with ; or newlines)
for commands that output entire collections as a single object (which is rare), you want to count such collections as 1 object.[1]
more generally, if you need to ensure that the command output is returned as a bona fide array, though note that it is invariably of type [object[]]; if you need a specific element type, use a cast (e.g., [int[]]), but note that you then don't strictly need the #(...); e.g.,
[int[]] (...) will do - unless you want to prevent enumeration of collections output as single objects.
Choose (...).Count, if:
only one command's output must be counted
for commands that output entire collections as a single object, you want to count the individual elements of such collections; that is, (...) forces enumeration of command output.[2]
for counting the elements of commands's output already stored in a variable - though, of course, you can then simply omit the (...) and use $var.Count
Caveat: Due to a longstanding bug (still present as of PowerShell Core 6.2.0), accessing .Count on a scalar fails while Set-StrictMode -Version 2 or higher is in effect - use #(...) in that case, but note that you may have to force enumeration.
To demonstrate the difference in behavior with respect to (rare) commands that output collections as single objects:
PS> #(Write-Output -NoEnumerate (1..10)).Count
1 # Array-as-single-object was counted as *1* object
PS> (Write-Output -NoEnumerate (1..10)).Count
10 # Elements were enumerated.
Performance considerations:
If a command's output is directly counted, (...) and #(...) perform about the same:
$arr = 1..1e6 # Create an array of 1 million integers.
{ (Write-Output $arr).Count }, { #(Write-Output $arr).Count } | ForEach-Object {
[pscustomobject] #{
Command = "$_".Trim()
Seconds = '{0:N3}' -f (Measure-Command $_).TotalSeconds
}
}
Sample output, from a single-core Windows 10 VM (the absolute timings aren't important, only that the numbers are virtually the same):
Command Seconds
------- -------
(Write-Output $arr).Count 0.352
#(Write-Output $arr).Count 0.365
By contrast, for large collections already stored in a variable, #(...) introduces substantial overhead, because the collection is recreated as a (new) array (as noted, you can just $arr.Count):
$arr = 1..1e6 # Create an array of 1 million integers.
{ ($arr).Count }, { #($arr).Count } | ForEach-Object {
[pscustomobject] #{
Command = "$_".Trim()
Seconds = '{0:N3}' -f (Measure-Command $_).TotalSeconds
}
}
Sample output; note how the #(...) solution is about 7 times slower:
Command Seconds
------- -------
($arr).Count 0.009
#($arr).Count 0.067
Coding-style considerations:
The following applies in situations where #(...) and (...) are functionally equivalent (and either perform the same or when performance is secondary), i.e., when you're free to choose which construct to use.
Mathias recommends #(...).Count, stating in a comment:
There's another reason to explicitly wrap it in this context - conveying intent, i.e., "We don't know if $p is a scalar or not, hence this construct".
My vote is for (...).Count:
Once you understand that PowerShell (v3 or higher) treats scalars as collections with count 1 on demand, you're free to take advantage of that knowledge without needing to reflect the distinction between a scalar and an array in the syntax:
When writing code, this means you needn't worry about whether a given command situationally may return a scalar rather than a collection (which is common in PowerShell, where capturing output from a command with a single output object captures that object as-is, whereas 2 or more output objects result in an array).
As a beneficial side effect, the code becomes more concise (and sometimes faster).
Example:
# Call Get-ChildItem twice, and, via Select-Object, limit the
# number of output objects to 1 and 2, respectively.
1..2 | ForEach-Object {
# * In the 1st iteration, $var becomes a *scalar* of type [System.IO.DirectoryInfo]
# * In the 2nd iteration, $var becomes an *array* with
# 2 elements of type [System.IO.DirectoryInfo]
$var = Get-ChildItem -Directory / | Select-Object -First $_
# Treat $var as a collection, which in PSv3+ works even
# if $var is a scalar:
[pscustomobject] #{
Count = $var.Count
FirstElement = $var[0]
DataType = $var.GetType().Name
}
}
The above yields:
Count FirstElement DataType
----- ------------ --------
1 /Applications DirectoryInfo
2 /Applications Object[]
That is, even the scalar object of type System.IO.DirectoryInfo reported its .Count sensibly as 1 and allowed access to "its first element" with [0].
For more about the unified handling of scalars and collections, see this answer.
[1] E.g., #(Write-Output -NoEnumerate 1, 2).Count is 1, because the Write-Output command outputs a single object - the array 1, 2 - _as a whole. Because only a single object is output, #(...) wraps that object in an array, resulting in , (1, 2), i.e. a single-element array whose first and only element is itself an array.
[2] E.g., (Write-Output -NoEnumerate 1, 2).Count is 2, because even though the Write-Output command outputs the array as a single object, that array is used as-is. That is, the whole expression is equivalent to (1, 2).Count. More generally, if a command inside (...) outputs just one object, that object is used as-is; if it outputs multiple objects, they are collected in a regular PowerShell array (of type [object[]]) - this is the same behavior you get when capturing command output via a variable assignment ($captured = ...).

Bulk regex removals against large array very slow in PowerShell

I am trying to find the quickest / most efficient way to run many regex removals against an array.
My $hosts array contains tens of thousands of individual items, in domain format. E.g:
test.domain.xyz
domain.xyz
something.com
anotherdomain.net
My $local_regex array contains ~1000 indivdual regexes, in multi-line format. E.g:
^ad. (ad.*)
domain.xyz$ (*domain.xyz)
I am currently trying to exclude any regex matches in the following way, but it is EXTREMELY slow with a large array and many regexes to match:
Function Regex-Remove
{
Param
(
[Parameter(Mandatory=$true)]
$local_regex,
[Parameter(Mandatory=$true)]
$hosts
)
# Loop through each regex and select only non-matching items
foreach($regex in $local_regex)
{
# Multi line, case insensitive
$regex = "(?im)$regex"
# Select hosts that do not match regex
$hosts = $hosts -notmatch $regex
}
return $hosts
}
Is there a better way to do this?
Reassigning a large array is going to be costly. Changing an array's size requires allocating a new array and copying the contents into it. If you have, say, 10 000 hostnames and 1 000 regexes, you have 10 000 000 copy operations. That's going to have some measurable effect. There is a cmdlet Measure-Command which is used to time execution times.
As an alternative approach, try to use indexed an array and overwrite undesired values with $null values. Like so,
foreach($regex in $local_regex) {
$regex = "(?im)$regex"
for($i=0;$i -lt $hosts.length; ++$i) {
if( $hosts[$i] -match $regex) {
$hosts[$i] = $null
}
}
}
You can use System.Collections.ArrayList objects instead of arrays, this will make the process much faster, and you have methods to add / remove items without rebuilding the whole array
$var = New-Object System.Collections.ArrayList
$var.Add()
$var.AddRange()
$var.Remove()
$var.RemoveRange()
As suggested by #Roberto, I switched the $hosts array to a New-Object System.Collections.ArrayList
The ability to remove from the ArrayList on the fly is exactly what I needed, and the while loop makes sure to remove duplicate values.
Function Regex-Remove
{
Param
(
[Parameter(Mandatory=$true)]
$local_regex,
[Parameter(Mandatory=$true)]
$hosts
)
# Loop through each regex and select only non-matching items
foreach($regex in $local_regex)
{
# Multi line, case insensitive
$regex = "(?i)$regex"
# Select hosts that do not match regex
$hosts -match $regex | % {
while($hosts.Contains($_))
{
$hosts.Remove($_)
}
}
}
return $hosts
}

Compare Elements in Array for While Loop - Powershell

I am attempting to create a script to read a CSV, then perform some operations on the contents where the first field are similar. Right now I'm stuck on trying to set up the second While loop to compare the current element to the next one.
I'm fairly new to this, because I wasn't getting anywhere trying this in Java. I can't seem to find a combination of commands that will let the loop work.
Some things I've tried are:
While($csv[$count].ip -eq $csv[$count++].ip)
While((diff $csv[count].ip $csv[$count++].ip) = true)
While($csv[$count].ip = $csv[$count++].ip)
Don't use $count++ unless you want to actually change the value of $count itself. Instead use $count + 1 as the array index
$count = 0
while($count -le $csv.Count){
if($csv[$count].ip -eq $csv[$count + 1].ip){
# Do your stuff here
}
$count++
}

How can I initialize an array of arrays (multidimensional) in PowerShell 5.0

I'm trying to partition out chunks of work using PowerShell (5.0) and am having a hard time instantiating a multidimensional array.
$n = 456;
$MaxChunks = 6;
$Chunks = #();
for($x = 0; $x -lt $MaxChunks; $x++)
{
Write-Host "Creating chunk $x"
$Chunks += #();
}
$Chunks.Count always returns 0 and I cannot access anything in $Chunks by index (i.e. $Chunks[0] is null).
Ultimately, my goal is to access the array located at $Chunks[$i] and add multiple System.Data.DataRow objects to it. However, as I've said, I'm not able to access the array at that index because that array is never instantiated.
I've read through this and this but am not quite able to translate the hashtable scenario to my situation.
Alternatively:
[System.Array] $chunks = [System.Array]::CreateInstance( [Int32], 3, 3 );
$chunks[0,0];
0
Not native PS, but works.
Replicate an empty array.
The same object is referenced in each element but in this particular case it's not a problem because standard arrays are read-only and recreated when elements are added via +=.
$Chunks = ,#() * $MaxChunks
Collect foreach output:
$Chunks = #(foreach ($x in 1..$MaxChunks) { ,#() })
The outer #() handles a theoretically possible case when $MaxChunks = 1.
You can use a more verbose (arguably slower) Write-Output -NoEnumerate:
$Chunks = #(foreach ($x in 1..$MaxChunks) { echo -NoEnumerate #() })
And in case the sub-arrays will be modified a lot, use ArrayList instead of +=:
$Chunks = #(foreach ($x in 1..$MaxChunks) { ,[Collections.ArrayList]#() })
and later:
$Chunks[0].Add($something) >$null
P.S. Don't use += to generate entire arrays from scratch regardless of its size as it's a terrible method that recreates and copies the entire array each time; there's a much faster and simpler output collection via loop statements such as foreach, for, while; there's ArrayList object for the case of frequent non-sequential modifications of an array.

Resources