Powershell Leading zeros are trimmed in array using Measure-Object cmdlet - arrays

When using Powershell to find out the maximum or minimum value in a string array, the leading zeros of the outcome string are trimmed. How to retain the zeros?
$arr = #("0001", "0002", "0003")
($arr | Measure-Object -Maximum).Maximum
>>> 3

Enumerating the array is the fastest method:
$max = ''
foreach ($el in $arr) {
if ($el -gt $max) {
$max = $el
}
}
$max
Or use SortedSet from .NET 4 framework (built-in since Win 8), it's 2 times faster than Measure-Object but two times slower than the manual enumeration above. Still might be useful if you plan to sort the data without duplicates quickly: it's faster than the built-in Sort-Object.
([Collections.Generic.SortedSet[string]]$arr).max
Obviously, it'll allocate some memory for the array index, but not the actual data as it'll be reused from the existing array. If you're concerned about it, just force garbage collection with [gc]::Collect()

try this
$arr = #("0001", "0002", "0003")
$arr | sort -Descending | select -First 1

Related

In PowerShell is there a way to return the index of an array with substring of a string

Is there another way return the indices of every instance of a substring in an array besides old fashioned looping through indices?
$myarray = #('herp','dederp','dedoo','flerp')
$substring = 'erp'
$indices = #()
for ($i=0; $i -lt $myarray.length; $i++) {
if ($myarray[$i] -match $substring){
$indices = $indices + $i
}
}
$thisiswrong = #($myarray.IndexOf($substring))
The conditional inside that kind of for loop is kinda cumbersome, and $thisiswrong only ever gets a value of [-1]
You can use LINQ (adapted from this C# answer):
$myarray = 'herp', 'dederp', 'dedoo', 'flerp'
$substring = 'erp'
[int[]] $indices = [Linq.Enumerable]::Range(0, $myarray.Count).
Where({ param($i) $myarray[$i] -match $substring })
$indices receives 0, 1, 3.
As for what you tried:
$thisiswrong = #($myarray.IndexOf($substring))
System.Array.IndexOf only ever finds one index and matches entire elements, literally and case-sensitively in the case of strings.
There's a more PowerShell-like, but much slower alternative, as hinted at by js2010's answer; you can take advantage of the fact that the match-information objects that Select-String outputs have a .LineNumber property, reflecting the 1-based index of the position in the input collection - even if the input doesn't come from a file:
$myarray = 'herp', 'dederp', 'dedoo', 'flerp'
$substring = 'erp'
[int[]] $indices =
($myarray | Select-String $substring).ForEach({ $_.LineNumber - 1 })
Note the need to subtract 1 from each .LineNumber value to get the 0-based array indices, and the use of the .ForEach() array method, which performs better than the ForEach-Object cmdlet.
If it was a file...
get-content file | select-string erp | select line, linenumber
Line LineNumber
---- ----------
herp 1
dederp 2
flerp 4

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

PowerShell - Create an array that ignores duplicate values

Curious if there a construct in PowerShell that does this?
I know you can do this:
$arr = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$arr = $arr | Get-Unique
But seems like performance-wise it would be better to ignore the value as you are entering it into the array instead of filtering out after the fact.
If are you inserting a large number of items in to an array (thousands) the performance does drop, because the array needs to be reinitialized every time you add to it so it may be better in your case, performance wise, to use something else.
Dictionary, or HashTable could be a way. Your single dimensional unique array could be retrieved with $hash.Keys For example:
$hash = ${}
$hash.Set_Item(1,1)
$hash.Set_Item(2,1)
$hash.Set_Item(1,1)
$hash.Keys
1
2
If you use Set_Item, the key will be created or updated but never duplicated. Put anything else for the value if you're not using it, But maybe you'll have a need for a value with your problem too.
You could also use an Arraylist:
Measure-Command -Expression {
$bigarray = $null
$bigarray = [System.Collections.ArrayList]#()
$bigarray = (1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$bigarray | select -Unique
}
Time passed:
TotalSeconds : 0,0006581
TotalMilliseconds : 0,6581
Measure-Command -Expression {
$array = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$array | select -Unique
}
Time passed:
TotalSeconds : 0,0009261
TotalMilliseconds : 0,9261

Array.Find and IndexOf for multiple elements that are exactly the same object

I have trouble of getting index of the current element for multiple elements that are exactly the same object:
$b = "A","D","B","D","C","E","D","F"
$b | ? { $_ -contains "D" }
Alternative version:
$b = "A","D","B","D","C","E","D","F"
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" })
This will return:
D
D
D
But this code:
$b | % { $b.IndexOf("D") }
Alternative version:
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" }) | % { $b.IndexOf($_) }
Returns:
1
1
1
so it's pointing at the index of the first element. How to get indexes of the other elements?
You can do this:
$b = "A","D","B","D","C","E","D","F"
(0..($b.Count-1)) | where {$b[$_] -eq 'D'}
1
3
6
mjolinor's answer is conceptually elegant, but slow with large arrays, presumably due to having to build a parallel array of indices first (which is also memory-inefficient).
It is conceptually similar to the following LINQ-based solution (PSv3+), which is more memory-efficient and about twice as fast, but still slow:
$arr = 'A','D','B','D','C','E','D','F'
[Linq.Enumerable]::Where(
[Linq.Enumerable]::Range(0, $arr.Length),
[Func[int, bool]] { param($i) $arr[$i] -eq 'D' }
)
While any PowerShell looping solution is ultimately slow compared to a compiled language, the following alternative, while more verbose, is still much faster with large arrays:
PS C:\> & { param($arr, $val)
$i = 0
foreach ($el in $arr) { if ($el -eq $val) { $i } ++$i }
} ('A','D','B','D','C','E','D','F') 'D'
1
3
6
Note:
Perhaps surprisingly, this solution is even faster than Matt's solution, which calls [array]::IndexOf() in a loop instead of enumerating all elements.
Use of a script block (invoked with call operator & and arguments), while not strictly necessary, is used to prevent polluting the enclosing scope with helper variable $i.
The foreach statement is faster than the Foreach-Object cmdlet (whose built-in aliases are % and, confusingly, also foreach).
Simply (implicitly) outputting $i for each match makes PowerShell collect multiple results in an array.
If only one index is found, you'll get a scalar [int] instance instead; wrap the whole command in #(...) to ensure that you always get an array.
While $i by itself outputs the value of $i, ++$i by design does NOT (though you could use (++$i) to achieve that, if needed).
Unlike Array.IndexOf(), PowerShell's -eq operator is case-insensitive by default; for case-sensitivity, use -ceq instead.
It's easy to turn the above into a (simple) function (note that the parameters are purposely untyped, for flexibility):
function get-IndicesOf($Array, $Value) {
$i = 0
foreach ($el in $Array) {
if ($el -eq $Value) { $i }
++$i
}
}
# Sample call
PS C:\> get-IndicesOf ('A','D','B','D','C','E','D','F') 'D'
1
3
6
You would still need to loop with the static methods from [array] but if you are still curious something like this would work.
$b = "A","D","B","D","C","E","D","F"
$results = #()
$singleIndex = -1
Do{
$singleIndex = [array]::IndexOf($b,"D",$singleIndex + 1)
If($singleIndex -ge 0){$results += $singleIndex}
}While($singleIndex -ge 0)
$results
1
3
6
Loop until a match is not found. Assume the match at first by assigning the $singleIndex to -1 ( Which is what a non match would return). When a match is found add the index to a results array.

In Powershell how can I check if all items from one array exist in a second array?

So let's say I have this array:
$requiredFruit= #("apple","pear","nectarine","grape")
And I'm given a second array called $fruitIHave. How can I check that $fruitIHave has everything in $requiredFruit. It doesn't matter if there are more items in $fruitIHave just as long as everything in $requiredFruit is there.
I know I could just iterate over the list, but that seems inefficient, is there a built-in method for doing this?
Do you try Compare-Object :
$requiredFruit= #("apple","pear","nectarine","grape")
$HaveFruit= #("apple","pin","nectarine","grape")
Compare-Object $requiredFruit $haveFruit
InputObject SideIndicator
----------- -------------
pin =>
pear <=
Compare-Object $requiredFruit $haveFruit | where {$_.sideindicator -eq "<="} | % {$_.inputobject}
pear
If you have the arrays:
$requiredFruit= #("apple","pear","nectarine","grape")
$someFruit= #("apple","banana","pear","nectarine","orange","grape")
$moreFruit= #("apple","banana","nectarine","grape")
You can get a boolean result with:
'Check $someFruit for $requiredFruit'
-not #($requiredFruit| where {$someFruit -notcontains $_}).Count
'Check $moreFruit for $requiredFruit'
-not #($requiredFruit| where {$moreFruit -notcontains $_}).Count
Using the count of an array protects against a single value not matching that evaluates as False. For example:
# Incorrect result
-not (0| where {(1,2) -notcontains $_})
# Correct result
-not #(0| where {(1,2) -notcontains $_}).Count
With PowerShell v3, you can use select -first 1 to stop the pipeline when the first mismatch is found (in v2 select -first 1 allows only one object through, but previous elements of the pipeline continue to process).
-not #($requiredFruit| where {$moreFruit -notcontains $_}| select -first 1).Count
Not exactly "builtin" but:
[regex] $RF_regex = ‘(?i)^(‘ + (($requiredFruit |foreach {[regex]::escape($_)}) –join “|”) + ‘)$’
($fruitIHave -match $RF_regex).count -eq $requiredFruit.count
That creates an alternating regex from the elements of $requiredFruit. Matched against $fruitIHave, it will return all the items that matched. If $fruitIhave could potentially have duplicates of the same fruit you may need to run that match result through get-unique before you do the count. It may be slower than iterating over the list for a single comparison, but once you have the regex built it will do repetitive matches very efficiently.
One way or the other, you're going to have to iterate through one or both arrays. Here's a one-liner approach:
$hasAllRequiredFruit = ($requiredFruit | Where-Object { $fruitIHave -contains $_ }).Length -eq $requiredFruit.Length;
A foreach loop would be better because you can stop iterating as soon as you find a required fruit that is missing:
$hasAllRequiredFruit = $true;
foreach ($f in $requiredFruit)
{
if ($fruitIHave -notcontains $f)
{
$hasAllRequiredFruit = $false;
break;
}
}

Resources