PowerShell: Remove duplicated items from array

PowerShell: Remove duplicated items from array - arrays

I am trying to locate discrepancies in BIND DNS records. I would like to output a CSV file that only has those discrepancies. I have a CSV file that has all records from all locations in BIND (ns.prvt, ns.pub, common, includes). What I'm trying to figure out is how to output a CSV that only shows the discrepancies. For 2 records to be considered a discrepancy, they must meet the following criteria:
Both records have the same RecordName and RecordType.
Both records have different Data or TTL.
Both records come from different locations.
I am almost there with the following script but it keeps showing me a couple of rows that don't necessarily meet the above criteria.
$Records = Import-Csv C:\Temp\Domain_ALL.csv | Select * | Sort Data,Location
$RecordsRev = #()
$Records | % {
$Record = $_
$Records | % {
$DataFE = $_
If (
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).RecordName -eq $DataFE.RecordName) -and
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).RecordType -eq $DataFE.RecordType) -and
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).Location -ne $DataFE.Location) -and
(([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).Data -ne $DataFE.Data) -or
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).TTL -ne $DataFE.TTL))
) {
$RecordsRev += $_
}
}
}
$RecordsRev | Export-Csv C:\Temp\Domain_Discrepancies.csv -NoType
The results that I get are:
RecordName RecordType Data TTL Location
---------- ---------- ---- --- --------
domain.com TXT "MS=abc1234566" 600 Includes
domain.com TXT "MS=abc1234566" 600 Common
domain.com TXT "site-verification=abcd1234" 600 Includes
domain.com TXT "site-verification=abcd1234" 600 Common
www CNAME somedomain.com.test. 600 Includes
www CNAME somedomain.com. 600 Common
The results that I expect are:
RecordName RecordType Data TTL Location
---------- ---------- ---- --- --------
www CNAME somedomain.com.test. 600 Includes
www CNAME somedomain.com. 600 Common
How do I delete all duplicated rows in the array? This is different from "Select * -unique" as I don't want to keep any row that contains the duplicated information.
EDIT: I think the main problem is that, since the script checks each record against every record in the CSV, it technically is a discrepancy. For example, in the below table, record 1 meets the criteria to be a discrepancy because it differs from record 4. However, since record 1 is the same as record 2, it should actually be omitted from the results.
RecordNumber RecordName RecordType Data TTL Location
------------ ---------- ---------- ---- --- --------
1 domain.com TXT "MS=abc1234566" 600 Includes
2 domain.com TXT "MS=abc1234566" 600 Common
3 domain.com TXT "site-verification=abcd1234" 600 Includes
4 domain.com TXT "site-verification=abcd1234" 600 Common
5 www CNAME somedomain.com.test. 600 Includes
6 www CNAME somedomain.com. 600 Common
Any help would be greatly appreciated.
Kyle

I was able to figure this out with the help of someone who deleted their post... Here is the script that I am using now to find all records that meet ALL of the following criteria:
Both records have the same RecordName and RecordType. -AND
Both records have different Data or TTL. -AND
Both records come from different locations.
$Records = Import-Csv C:\Temp\Domain_ALL.csv | Select * | Sort Data,Location
$Discrepancies = #()
$GoodRecords = #()
$BadRecords = #()
$Records | ForEach-Object {
# for each record $_, compare it against every other record..
foreach ($R in $Records) {
# if Both records have the same RecordName and RecordType..
if (($_.RecordName -eq $R.RecordName) -and ($_.RecordType -eq $R.RecordType)) {
# and if Both records come from different locations..
if ($_.Location -ne $R.Location) {
# if Both records have the same Data and TTL then they are considered good:
if (($_.Data -eq $R.Data) -and ($_.TTL -eq $R.TTL)) {
$GoodRecords += $_
}
Else{
# if Both records have different Data or TTL then they are considered bad:
$BadRecords += $_
}
}
}
}
}
ForEach ($BadRecord in $BadRecords){
If (($GoodRecords -notcontains $BadRecord)){
$Discrepancies += $BadRecord
}
}
$Discrepancies | Select * -Unique | Sort RecordName,Location,Data | ft

Related

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12

You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique

Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID

I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}

Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

Splitting groups of CSV ROW's into separate CSV's OR Variables

$groups = Get-Content c:\devices.csv | Group {$_.Substring(0,3)}| %{$_.Group; ""}
AAAGroup1,192.168.1.1
AAAGroup1,192.168.1.2
BBBGroup2,192.168.2.1
BBBGroup2,192.168.2.2
CCCGroup3,192.168.3.1
CCCGroup3,192.168.3.2
I have searched far and wide and can only find solutions based on column selections To output the above data to either a separate variable per group or separate CSV file. Technically, there's a space in between each group in terms of a spare row, that's it as far as i can see.
$groups = Get-Content c:\devices.csv | Group {$_.Substring(0,3)}| %{$_.Group; ""}
ForEach ($Group in $Groups)...

Provided you have a file devices.csv with headers Device,IPv4
The following script will add a calculated property named Group you can use to sort, output to a table and -GroupBy or whatever.
## Q:\Test\2018\07\10\SO_51267179.ps1
$data = Import-Csv '.\devices.csv' |
Select-Object Device,IPv4,#{n='Group';e={$_.Device.Substring(0,3)}}
$data | Format-Table -GroupBy Group
Sample output
Group: AAA
Device IPv4 Group
------ ---- -----
AAAGroup1 192.168.1.1 AAA
AAAGroup1 192.168.1.2 AAA
...snip...
To output each group to it's own .csv
$data = Import-Csv '.\devices.csv' |
| Group-Object {$_.Device.Substring(0,3)}| ForEach-Object {
$_.Group | Export-Csv "$($_.Name).csv" -NoTypeInformation
}
Sample output:
> gc .\AAA.csv
"Device","IPv4"
"AAAGroup1","192.168.1.1"
"AAAGroup1","192.168.1.2"

Powershell match properties and then selectively combine objects to create a third

I have a solution for this but I believe it is not the best method as it takes forever so I am looking for a faster/better/smarter way.
I have multiple pscustomObject objects pulled from .csv files. Each object has at least one common property. One is relatively small (around 200-300 items/lines in the object) but the other is sizable (around 60,000-100,000 items). The contents of one may or may not match the contents of the other.
I need to find where the two objects match on a specific property and then combine the properties of each object into one object with all or most properties.
An example snippet of the code (not exact but for this it should work - see the image for the sample data):
DataTables
Write-Verbose "Pulling basic Fruit data together"
$Purchase = import-csv "C:\Purchase.csv"
$Selling = import-csv "C:\Selling.csv"
Write-Verbose "Combining Fruit names and removing duplicates"
$Fruits = $Purchase.Fruit
$Fruits += $Selling.Fruit
$Fruits = $Fruits | Sort-Object -Unique
$compareData = #()
Foreach ($Fruit in $Fruits) {
$IndResults = #()
$IndResults = [pscustomobject]#{
#Adding Purchase and Selling data
Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer
Region = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Region
Water = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Water
Market = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Market
Cost = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Cost
Tax = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Tax
}
Write-Verbose "Loading Individual results into response"
$CompareData += $IndResults
}
Write-Output $CompareData
I believe the issue is in lines like these:
Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer
If I understand this it is looking through the $Purchase object each time it goes through this line. I am looking for a way to speed that whole process up instead of having it look through the entire object for each match attempt.

Using this Join-Object:
$Purchase | Join $Selling -On Fruit | Format-Table
Result (using Simon Catlin's data):
Fruit Farmer Region Water Market Cost Tax
----- ------ ------ ----- ------ ---- ---
Apple Adam Alabama 1 MarketA 10 0.1
Cherry Charlie Cincinnati 2 MarketC 20 0.2
Damson Daniel Derby 3 MarketD 30 0.3
Elderberry Emma Eastbourne 4 MarketE 40 0.4
Fig Freda Florida 5 MarketF 50 0.5

using Join-Object
http://ramblingcookiemonster.github.io/Join-Object/
Join-Object -Left $purchase -Right $selling -LeftJoinProperty fruit -RightJoinProperty fruit -Type OnlyIfInBoth | ft

I had this very problem when trying to consolidate employee data from our HR system against employee data in our AD forest. With many thousands of rows, the process was taking an age.
I eventually walked away from custom objects and reverted to old school hash tables.
The hash tables entries themselves then held a sub-hash table with the data. In your instance, the outer hash would be keyed on $fruit, with the sub-hash containing the various attributes, e.g.: farmer, region, Etc.
Hash tables are lightning quick in comparison. It's a shame that PowerShell is slow in this regard.
Shout if you need more info.
26/01 Example code... assuming I'm correctly understanding the requirement:
PURCHASE.CSV:
Fruit,Farmer,Region,Water
Apple,Adam,Alabama,1
Cherry,Charlie,Cincinnati,2
Damson,Daniel,Derby,3
Elderberry,Emma,Eastbourne,4
Fig,Freda,Florida,5
SELLING.CSV
Fruit,Market,Cost,Tax
Apple,MarketA,10,0.1
Cherry,MarketC,20,0.2
Damson,MarketD,30,0.3
Elderberry,MarketE,40,0.4
Fig,MarketF,50,0.5
CODE
[String] $Local:strPurchaseFile = 'c:\temp\purchase.csv';
[String] $Local:strSellingFile = 'c:\temp\selling.csv';
[HashTable] $Local:objFruitHash = #{};
[System.Array] $Local:objSelectStringHit = $null;
[String] $Local:strFruit = '';
if ( (Test-Path -LiteralPath $strPurchaseFile -PathType Leaf) -and (Test-Path -LiteralPath $strSellingFile -PathType Leaf) ) {
#
# Populate data from purchase file.
#
foreach ( $objSelectStringHit in (Select-String -LiteralPath $strPurchaseFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
$objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] = #{ 'Farmer' = $objSelectStringHit.Matches[0].Groups[2].Value;
'Region' = $objSelectStringHit.Matches[0].Groups[3].Value;
'Water' = $objSelectStringHit.Matches[0].Groups[4].Value;
};
} #foreach-purchase-row
#
# Populate data from selling file.
#
foreach ( $objSelectStringHit in (Select-String -LiteralPath $strSellingFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
$objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] += #{ 'Market' = $objSelectStringHit.Matches[0].Groups[2].Value;
'Cost' = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[3].Value );
'Tax' = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[4].Value );
};
} #foreach-selling-row
#
# Output data. At this point, you could now build a PSCustomObject.
#
foreach ( $strFruit in ($objFruitHash.Keys | Sort-Object) ) {
Write-Host -Object ( '{0,-15}{1,-15}{2,-15}{3,-10}{4,-10}{5,10:C}{6,10:P}' -f
$strFruit,
$objFruitHash[$strFruit]['Farmer'],
$objFruitHash[$strFruit]['Region'],
$objFruitHash[$strFruit]['Water'],
$objFruitHash[$strFruit]['Market'],
$objFruitHash[$strFruit]['Cost'],
$objFruitHash[$strFruit]['Tax']
);
} #foreach
} else {
Write-Error -Message 'File error.';
} #else-if

I needed to do this myself for something similar. I wanted to take two system array objects and compare them pulling out the matches without having to manipulate the input data each time. Here's the method I used, which although I appreciate this is inefficient, it was instantaneous for the 200 or so records I had to work with.
I tried to translate what I was doing (users and their old and new home directories) into farmers, fruit and markets etc so I hope it makes sense!
$Purchase = import-csv "C:\Purchase.csv"
$Selling = import-csv "C:\Selling.csv"
$compareData = #()
foreach ($iPurch in $Purchase) {
foreach ($iSell in $Selling) {
if ($iPurch.fruit -match $iSell.fruit) {
write-host "Match found between $($iPurch.Fruit) and $($iSell.Fruit)"
$hash = #{
Fruit = $iPurch.Fruit
Farmer = $iPurch.Farmer
Region = $iPurch.Region
Water = $iPurch.Water
Market = $iSell.Market
Cost = $iSell.Cost
Tax = $iSell.Tax
}
$Build = New-Object PSObject -Property $hash
$Total = $Total + 1
$compareData += $Build
}
}
}
Write-Host "Processed $Total records"

Powershell Array Export not equal operator not working

I have a .csv with a few hundred records that I need to dissect into several different files. There is a part of the code that takes an array of objects and filters the file based on the array. It works great for the part that finds things equal to whats in the array, but when I try to filter based on whats not contained in the array it ignores any version of a "not equal" operator I can find. I think it has something to do with the data type, but can't figure why it would make a difference when the equal operator works.
CSV File
"Number","Ticket","Title","Customer User","CustomerID","Accounted time","Billing"
"1","2014041710000096","Calendar issues","george.jetson","Widget, Inc","0.25","Labor",""
"2","2014041710000087","Redirected Folder permission","jane.smith","Mars Bars, Inc.","1","Labor",""
"3","2014041610000203","Completed with No Files Changed ""QB Data""","will.smith","Dr. Smith","0","Labor",""
PowerShell Code
$msaClients = #("Widget, Inc","Johns Company")
$billingList = import-csv "c:\billing\billed.csv"
$idZero = "0"
$msaArray = foreach ($msa in $msaClients) {$billingList | where-object {$_.CustomerID -eq $msa -and $_."Accounted time" -ne $idZero}}
$laborArray = foreach ($msa in $msaClients) {$billingList | where-object {$_.CustomerID -ne $msa -and $_."Accounted time" -ne $idZero}}
$msaArray | export-csv c:\billing\msa.csv -notypeinformation
$laborArray | export-csv c:\billing\labor.csv -notypeinformation
I have tried all the different logical operators for the not equal and it just seems to ignore that part. I have much more to the code if something doesn't seem right.
What am I missing, and Thanks in advance for any help!

If i understand this correct, you want the values in $msaArray, where $billingList contains customerIDs which are present in $msaClients but their corresponding Accounted time should not be eual to $idzero( 0 in this case)
PS C:\> $msaArray = ($billingList | where {(($msaclients -contains $_.customerid)) -and ($_.'accounted time' -ne $idzero)})
PS C:\> $msaArray | ft -auto
Number Ticket Title Customer User CustomerID Accounted time Billing
------ ------ ----- ------------- ---------- -------------- -------
1 2014041710000096 Calendar issues george.jetson Widget, Inc 0.25 Labor
And for $laborArray, where $billingList does not contain customerIDs which are present in $msaClients and their corresponding Accounted time should not be eual to $idzero as well( 0 in this case)
PS C:\> $laborArray = ($billingList | where {(!($msaclients -contains $_.customerid)) -and ($_.'accounted time' -ne $idZero)})
PS C:\> $laborArray | ft -auto
Number Ticket Title Customer User CustomerID Accounted time Billing
------ ------ ----- ------------- ---------- -------------- -------
2 2014041710000087 Redirected Folder permission jane.smith Mars Bars, Inc. 1 Labor
Your -ne operator is working, but you are looping too many times in $msaclients to get $laborArray.i.e, when $msa = "Widget, Inc", you got "Mars Bars, Inc." as output, but foreach loop ran again and $msa value changed to "Johns Company" and in this case you got "Mars Bars, Inc." and "Widget, Inc" as output too. Hence you ended up with three outputs.

How can I import a .csv into multiple arrays in a simpler way?

I am new to powershell and am writing my first somewhat complicated script. I would like to import a .csv file and create multiple text arrays with it. I think that I have found a way that will work but it will be time consuming to generate all of the lines that I need. I assume I can do it more simply using foreach-object but I can't seem to get the syntax right.
See my current code...
$vmimport = Import-Csv "gss_prod.csv"
$gssall = $vmimport | ForEach-Object {$_.vmName}`
$gssweb = $vmimport | Where-Object {$_.tier -eq web} | ForEach-Object {$_.vmName}
$gssapp = $vmimport | Where-Object {$_.tier -eq app} | ForEach-Object {$_.vmName}
$gsssql = $vmimport | Where-Object {$_.tier -eq sql} | ForEach-Object {$_.vmName}
The goal is to make 1 group with all entries containing only the vmName value, and then 3 separate groups containing only the vmName value but using the tier value to sort them.
Can anyone help me with an easier way to do this?
Thanks!

For the last three you can group the object by the Tier property and have the result as a hasthable. Then you can reference the Tier name to get its VMs.
#group objects by tier
$gs = $vmimport | Group-Object tier -AsHashTable
# get web VMs
$gs['web']
# get sql VMs
$gs['app']

You may want to use a dictionary for storing the data:
$vmimport = Import-Csv "gss_prod.csv"
$gssall = $vmimport | % { $_.vmName }
$categories = "web", "app", "sql", ...
$gss = #{}
foreach ($cat in $categories) {
$gss[$cat] = $vmimport | ? { $_.tier -eq $cat } | % { $_.vmName }
}

I like the Shay Levy way, but the values of hash tables remain hash tables. Here is an other more efficient approach where values are jagged arrays, and categories are made automatically (contrary to Ansgar Wiechers solution):
# define hashtable
$gs = #{};
# fill it
$vmimport | foreach {$gs[$_.tier]+=, $_.vmName};
# get web VMs
$gs['web'] # the result is an array of 'web' vmNames.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight