Find missing rows in Powershell 2D arrays

Find missing rows in Powershell 2D arrays - arrays

I have 2 arrays, and each array has 2 fields ('item' and 'price' for example).
The following is the get-member result on 1 of my arrays (actually both arrays have the same structure)
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
item NoteProperty System.String field1=computer
price NoteProperty System.String field2=2000
I need to find the items in array $shopA where the items is not found in array $shopB. I am now using 2 loops to find the missing item.
$missing = #()
foreach ($itemA in $shopA) {
$found = 0
foreach ($itemB in $shopB) {
if ($itemB.item -eq $itemA.item) {
$found = 1
}
}
if ($found = 0) {
$missing += $itemA
}
}
This method works for me but my 2 arrays are quite large and I want a quicker method than looping thru the whole array...
I have been finding a better way to do this and the compare-object almost does the job but all the examples seem to work for single dimension array only.
Thanks

From what I can see you do have two 1D arrays, despite you claiming the opposite.
A naïve way of finding the missing items would be
$missing = $shopA | ? { $x = $_; !($shopB | ? {$_.item -eq $x.item})}
However, this will always be O(n²); to make it quicker you can collect all items from $shopB in a hastable first, which makes checking for existence O(1), not O(n):
$hash = #{}
$shopB | %{ $hash[$_.item] = 1 }
$missing = $shopA | ?{ !$hash.ContainsKey($_.item) }

Something like this?
$shopA= #(1, 2, 3, 4)
$shopB= #(4, 3, 5, 6)
$shopB | where { $shopA -notcontains $_ }

Related

Query PSCustomObject Array for row with largest value

I'm trying to find the row with an attribute that is larger than the other row's attributes. Example:
$Array
Name Value
---- ----
test1 105
test2 101
test3 512 <--- Selects this row as it is the largest value
Here is my attempt to '1 line' this but It doesn't work.
$Array | % { If($_.value -gt $Array[0..($Array.Count)].value){write-host "$_.name is the largest row"}}
Currently it outputs nothing.
Desired Output:
"test1 is the largest row"
I'm having trouble visualizing how to do this efficiently with out some serious spaghetti code.

You could take advantage of Sort-Object to rank them by the property "Value" like this
$array = #(
[PSCustomObject]#{Name='test1';Value=105}
[PSCustomObject]#{Name='test2';Value=101}
[PSCustomObject]#{Name='test3';Value=512}
)
$array | Sort-Object -Property value -Descending | Select-Object -First 1
Output
Name Value
---- -----
test3 512
To incorporate your write host you can just run the one you select through a foreach.
$array | Sort-Object -Property value -Descending |
Select-Object -First 1 | Foreach-Object {Write-host $_.name,"has the highest value"}
test3 has the highest value
Or capture to a variable
$Largest = $array | Sort-Object -Property value -Descending | Select-Object -First 1
Write-host $Largest.name,"has the highest value"
test3 has the highest value

PowerShell has many built in features to make tasks like this easier.
If this is really an array of PSCustomObjects you can do something like:
$Array =
#(
[PSCustomObject]#{ Name = 'test1'; Value = 105 }
[PSCustomObject]#{ Name = 'test2'; Value = 101 }
[PSCustomObject]#{ Name = 'test3'; Value = 512 }
)
$Largest = ($Array | Sort-Object Value)[-1].Name
Write-host $Largest,"has the highest value"
This will sort your array according to the Value property. Then reference the last element using the [-1] syntax, then return the name property of that object.
Or if you're a purist you can assign the variable like:
$Largest = $Array | Sort-Object Value | Select-Object -Last 1 -ExpandProperty Name
If you want the whole object just remove .Name & -ExpandProperty Name respectively.
Update:
As noted PowerShell has some great tools to help with common tasks like sorting & selecting data. However, that doesn't mean there's never a need for looping constructs. So, I wanted to make a couple of points about the OP's own answer.
First, if you do need to reference array elements by index use a traditional For loop, which might look something like:
For( $i = 0; $i -lt $Array.Count; ++$i )
{
If( $array[$i].Value -gt $LargestValue )
{
$LargestName = $array[$i].Name
$LargestValue = $array[$i].Value
}
}
$i is commonly used as an iteration variable, and within the script block is used as the array index.
Second, even the traditional loop is unnecessary in this case. You can stick with the ForEach loop and track the largest value as and when it's encountered. That might look something like:
ForEach( $Row in $array )
{
If( $Row.Value -gt $LargestValue )
{
$LargestName = $Row.Name
$LargestValue = $Row.Value
}
}
Strictly speaking you don't need to assign the variables beforehand, though it may be a good practice to precede either of these with:
$LargestName = ""
$LargestValue = 0
In these examples you'd have to follow with a slightly modified Write-Host command
Write-host $LargestName,"has the highest value"
Note: Borrowed some of the test code from Doug Maurer's Fine Answer. Considering our answers were similar, this was just to make my examples more clear to the question and easier to test.

Figured it out, hopefully this isn't awful:
$Count = 1
$CurrentLargest = 0
Foreach($Row in $Array) {
# Compare This iteration vs the next to find the largest
If($Row.value -gt $Array.Value[$Count]){$CurrentLargest = $Row}
Else {$CurrentLargest = $Array[$Count]}
# Replace the existing largest value with the new one if it is larger than it.
If($CurrentLargest.Value -gt $Largest.Value){ $Largest = $CurrentLargest }
$Count += 1
}
Write-host $Largest.name,"has the highest value"
Edit: its awful, look at the other answers for a better way.

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12

You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique

Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID

I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}

Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

PowerShell - Create an array that ignores duplicate values

Curious if there a construct in PowerShell that does this?
I know you can do this:
$arr = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$arr = $arr | Get-Unique
But seems like performance-wise it would be better to ignore the value as you are entering it into the array instead of filtering out after the fact.

If are you inserting a large number of items in to an array (thousands) the performance does drop, because the array needs to be reinitialized every time you add to it so it may be better in your case, performance wise, to use something else.
Dictionary, or HashTable could be a way. Your single dimensional unique array could be retrieved with $hash.Keys For example:
$hash = ${}
$hash.Set_Item(1,1)
$hash.Set_Item(2,1)
$hash.Set_Item(1,1)
$hash.Keys
1
2
If you use Set_Item, the key will be created or updated but never duplicated. Put anything else for the value if you're not using it, But maybe you'll have a need for a value with your problem too.

You could also use an Arraylist:
Measure-Command -Expression {
$bigarray = $null
$bigarray = [System.Collections.ArrayList]#()
$bigarray = (1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$bigarray | select -Unique
}
Time passed:
TotalSeconds : 0,0006581
TotalMilliseconds : 0,6581
Measure-Command -Expression {
$array = #(1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
$array | select -Unique
}
Time passed:
TotalSeconds : 0,0009261
TotalMilliseconds : 0,9261

Quick File to Hashtable in PowerShell

Given an array of key-value pairs (for example read in through ConvertFrom-StringData), is there a streamlined way of turning this into a Hashtable or similar to allow quick lookup? I.e. a way not requiring me to loop through the array and manually build up the hashtable myself.
Example data
10.0.0.1=alice.example.com
10.0.0.2=bob.example.com
Example usage
$names = gc .\data.txt | ConvertFrom-StringData
// $names is now Object[]
$map = ?
// $map should now be Hashtable or equivalent
echo $map['10.0.0.2']
// Output should be bob.example.com
Basically what I'm looking for is a, preferably, built-in file-to-hashtable function. Or an array-to-hashtable function.
Note: As #mjolnior explained, I actually got hash tables, but an array of single value ones. So this was fixed by reading the file -raw and hence didn't require any array to hashtable conversion. Updated the question title to match that.

Convertfrom-Stringdata does create a hash table.
You need to give it the key-value pairs as a single multi-line string (not a string array)
$map = Get-Content -raw .\data.txt | ConvertFrom-StringData
$map['10.0.0.2']
bob.example.com
When you use Get-Content without the -Raw switch, you're giving ConvertFrom-StringData an array of single-line strings, and it's giving you back an array of single-element hash tables:
$map = Get-Content .\data.txt | ConvertFrom-StringData
$map.gettype()
$Map[0].GetType()
$map[0]
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
True True Hashtable System.Object
Key : 10.0.0.1
Value : alice.example.com
Name : 10.0.0.1

I usually do the following to create a hashtable from a list of key/value pairs:
$hash = #{}
Get-Content 'C:\input.txt' | Where-Object {
$_ -like '*=*'
} | ForEach-Object {
$key, $value = $_ -split '\s*=\s*', 2
$hash[$key] = $value
}

This might not be what you're looking for, but it avoids converting the whole thing into a hash.
$content = #("10.0.0.1=alice.example.com","10.0.0.2=bob.example.com");
$content | ForEach-Object {
$keyval = $_.split("=");
if ($keyval[0] -eq "10.0.0.2") {
$keyval[1]
}
}
The output will be every value on the right side of the = where the left side matches that IP.

PowerShell function for adding elements to an array

I'm still quite new to PowerShell and am trying to create a few functions that weaves together for creating and administrating an array. And I'm having some problems with getting one of these functions to work as intended.
I need the second function (AddToArray) to add an element to the specified index. None of the existing elements can be overwritten or removed.
For example, if I have an array with four indexes and all have the value 5 and I call the function AddToArray 2 4. I need the function to write for in the third index and move the existing ones one down step, so the array now looks like this:
5
5
4
5
5
This is my code so far that shows my CreateArray function and the little code piece for AddToArray function. I've been trying for a while now, but I just can't see the solution.
function CreateArray($Item1, $Item2)
{
$arr = New-Object Array[] $Item1;
# Kontrollerar om $Item2 har fått någon input och skriver in det i arrayen
if ($Item2)
{
for($i = 0; $i -lt $arr.length; $i++)
{
$arr[$i] = $Item2;
}
}
# Standard värde på arrayens index om inget värde anges vid funktionens anrop
else
{
$Item2 = "Hej $env:username och välkommen till vårat script!";
for($i = 0; $i -lt $arr.length; $i++)
{
$arr[$i] = $Item2;
}
}
$script:MainArray = $arr;
}
function AddToArray ($index, $add)
{
$MainArray[$index] = $add;
}

Arrays in .NET don't directly support insertion and they are normally fixed size. PowerShell does allow for easy array resizing but if the array gets large and you're appending (causing a resize) a lot, the performance can be bad.
One easy way to do what you want is to create a new array from the pieces e.g.:
if ($index -eq 0) {
$MainArray = $add,$MainArray
}
elseif ($index -eq $MainArray.Count - 1) {
$MainArray += $add
}
else {
$MainArray = $MainArray[0..($index-1)], $add, $MainArray[$index..($MainArray.Length-1)]
}
But that is kind of a spew. I would use a List for this, which supports insertion and is more efficient than an array.
$list = new-object 'System.Collections.Generic.List[object]'
$list.AddRange((1,2,3,4,5))
$list.Insert(2,10)
$list
And if you really need an array, call the $list.ToArray() method when you're done manipulating the list.

Arrays don't have an .insert() method, but collections do. An easy way to produce a collection from an array is to use the .invoke() method of scriptblock:
$array = 5,5,4,5,5
$collection = {$array}.invoke()
$collection
$collection.GetType()
5
5
4
5
5
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Collection`1 System.Object
Now you can use the .insert() method to insert an element at an arbitrary index:
$collection.Insert(2,3)
$collection
5
5
3
4
5
5
If you need it to be an array again, an easy way to convert it back to an array is to use the pipeline:
$collection | set-variable array
$array
$array.GetType()
5
5
3
4
5
5
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight