Quick File to Hashtable in PowerShell - arrays

Given an array of key-value pairs (for example read in through ConvertFrom-StringData), is there a streamlined way of turning this into a Hashtable or similar to allow quick lookup? I.e. a way not requiring me to loop through the array and manually build up the hashtable myself.
Example data
10.0.0.1=alice.example.com
10.0.0.2=bob.example.com
Example usage
$names = gc .\data.txt | ConvertFrom-StringData
// $names is now Object[]
$map = ?
// $map should now be Hashtable or equivalent
echo $map['10.0.0.2']
// Output should be bob.example.com
Basically what I'm looking for is a, preferably, built-in file-to-hashtable function. Or an array-to-hashtable function.
Note: As #mjolnior explained, I actually got hash tables, but an array of single value ones. So this was fixed by reading the file -raw and hence didn't require any array to hashtable conversion. Updated the question title to match that.

Convertfrom-Stringdata does create a hash table.
You need to give it the key-value pairs as a single multi-line string (not a string array)
$map = Get-Content -raw .\data.txt | ConvertFrom-StringData
$map['10.0.0.2']
bob.example.com
When you use Get-Content without the -Raw switch, you're giving ConvertFrom-StringData an array of single-line strings, and it's giving you back an array of single-element hash tables:
$map = Get-Content .\data.txt | ConvertFrom-StringData
$map.gettype()
$Map[0].GetType()
$map[0]
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
True True Hashtable System.Object
Key : 10.0.0.1
Value : alice.example.com
Name : 10.0.0.1

I usually do the following to create a hashtable from a list of key/value pairs:
$hash = #{}
Get-Content 'C:\input.txt' | Where-Object {
$_ -like '*=*'
} | ForEach-Object {
$key, $value = $_ -split '\s*=\s*', 2
$hash[$key] = $value
}

This might not be what you're looking for, but it avoids converting the whole thing into a hash.
$content = #("10.0.0.1=alice.example.com","10.0.0.2=bob.example.com");
$content | ForEach-Object {
$keyval = $_.split("=");
if ($keyval[0] -eq "10.0.0.2") {
$keyval[1]
}
}
The output will be every value on the right side of the = where the left side matches that IP.

Related

Powershell Compare 2 Arrays of Hashtables based on a property value

I have one array of hashtables like the one below:
$hashtable1 = #{}
$hashtable1.name = "aaa"
$hashtable1.surname =#()
$hashtable1.surname += "bbb"
$hashtable2 = #{}
$hashtable2.name = "aaa"
$hashtable2.surname =#()
$hashtable2.surname += "ccc"
$hashtable3 = #{}
$hashtable3.name = "bbb"
$hashtable3.surname = #()
$hashtable3.surname += "xxx"
$A = #($hashtable1; $hashtable2; $hashtable3)
I need to iterate though the array and I need to find out duplicates based on hashtable[].name
Then I need to group those hashtable.surname to hashtable[].surname so that the result will be an array of hashtables that will group all for name all the surnames:
$hashtable1.name = "aaa"
$hashtable1.surname = ("bbb","ccc")
$hashtable3.name = "bbb"
$hashtable3.surname = ("xxx")
I was looking into iterating to empty array
+
I have found this link:
powershell compare 2 arrays output if match
but I am not sure on how to reach into the elements of the hashtable.
My options:
I was wondering if -contain can do it.
I have read about compare-object but I am not sure it can be done like that.
(It looks a bit scary in the moment)
I am on PS5.
Thanks for your help,
Aster
You can group your array items by the names using a scriptblock like so.
Once grouped, you can easily build your output to do what you seek.
#In PS 7.0+ you can use Name directly but earlier version requires the use of the scriptblock when dealing with arrays of hashtables.
$Output = $A | Group-Object -Property {$_.Name} | % {
[PSCustomObject]#{
Name = $_.Name
Surname = $_.Group.Surname | Sort-Object -Unique
}
}
Here is the output variable content.
Name Surname
---- -------
aaa {bbb, ccc}
bbb xxx
Note
Improvements have been made in PS 7.0 that allows you to use simply the property name (eg: Name) in Group-Object for arrays of hashtables, just like you would do for any other arrays type. For earlier version though, these particular arrays must be accessed by passing the property in a scriptblock, like so: {$_.Name}
References
MSDN - Group_Object
SS64 - Group Object
Dr Scripto - Use a Script block to create custom groupings in PowerShell

Format an array of hashtables as table with Powershell

I wish to display an array of hastables as a Table. I found several threads dealing with this issue, but so far no solution has worked for me (see here and here).
My array consists of 18 hashtables in this form:
Array = #(
#{Company=1}
#{Company=2}
#{Company=3}
#{Contact=X}
#{Contact=Y}
#{Contact=Y}
#{Country=A}
#{Country=B}
#{Country=C}
)
I would like to get the following output:
Company Contact Country
------------------------------
1 X A
2 Y B
3 Z C
I've tried the following:
$Array | ForEach-Object { New-Object -Type PSObject -Property $_ } | Format-Table
That displays the following:
Company
----------
1
2
3
A Format-List works better; I then get this:
Company: 1 2 3
Contact: X Y Z
Country: A B C
Is there any way to accomplish my desired output?
You can do the following if you want to work with CSV data and custom objects:
$Array = #(
#{Company=1}
#{Company=2}
#{Company=3}
#{Contact='X'}
#{Contact='Y'}
#{Contact='Z'}
#{Country='A'}
#{Country='B'}
#{Country='C'}
)
$keys = $Array.Keys | Get-Unique
$data = for ($i = 0; $i -lt $Array.($Keys[0]).Count; $i++) {
($Keys | Foreach-Object { $Array.$_[$i] }) -join ','
}
($Keys -join ','),$data | ConvertFrom-Csv | Format-Table
Explanation:
Since $Array is an array of hash tables, then the .Keys property will return all the keys of those hash tables. Since we only care about unique keys when building our object, Get-Unique is used to remove duplicates.
$Array.($Keys[0]).Count counts the number of items in a group. $Keys[0] here will be Company. So it returns the number of hash tables (3) that contain the key Company.
$Array.Company for example returns all values from the hash tables that contain the key Company. The Foreach-Object loops through each unique key's value at a particular index ($i). Once each key-value is read at a particular index, the values are joined by a comma.
When the loop completes, ($Keys -join ','),$data outputs the data in CSV format, which is piped into ConvertFrom-Csv to create a custom object.
Note: that if your data contains commas, you may want to consider the alternative method below.
Alernatively, you can work with hash tables and custom objects with the following:
$Array = #(
#{Company=1}
#{Company=2}
#{Company=3}
#{Contact='X'}
#{Contact='Y'}
#{Contact='Z'}
#{Country='A'}
#{Country='B'}
#{Country='C'}
)
$keys = $Array.Keys | Get-Unique
$data = for ($i = 0; $i -lt $Array.($Keys[0]).Count; $i++) {
$hash = [ordered]#{}
$Keys | Foreach-Object { $hash.Add($_,$Array.$_[$i]) }
[pscustomobject]$hash
}
$data | Format-Table

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

Removing Strings if Substring of the String is Present in Same Array

Noob here.
I'm trying to pare down a list of domains by eliminating all subdomains if the parent domain is present in the list. I've managed to cobble together a script that somewhat does this with PowerShell after some searching and reading. The output is not exactly what I want, but will work OK. The problem with my solution is that it takes so long to run because of the size of my initial list (tens of thousands of entries).
UPDATE: I've updated my example to clarify my question.
Example "parent.txt" list:
adk2.co
adk2.com
adobe.com
helpx.adobe.com
manage.com
list-manage.com
graph.facebook.com
Example output "repeats.txt" file:
adk2.com (different top level domain than adk2.co but that's ok)
helpx.adobe.com
list-manage.com (not subdomain of manage.com but that's ok)
I would then take and eliminate the repeats from the parent, leaving a list of "unique" subdomains and domains. I have this in a separate script.
Example final list with my current script:
adk2.co
adobe.com
manage.com
graph.facebook.com (it's not facebook.com because facebook.com wasn't in the original list.)
Ideal final list:
adk2.co
adk2.com (since adk2.co and adk2.com are actually distinct domains)
adobe.com
manage.com
graph.facebook.com
Below is my code:
I've taken my hosts list (parent.txt) and checked it against itself, and spit out any matches into a new file.
$parent = Get-Content("parent.txt")
$hosts = Get-Content("parent.txt")
$repeats =#()
$out_file = "$PSScriptRoot\repeats.txt"
$hosts | where {
$found = $FALSE
foreach($domains in $parent){
if($_.Contains($domains) -and $_ -ne $domains){
$found = $TRUE
$repeats += $_
}
if($found -eq $TRUE){
break
}
}
$found
}
$repeats = $repeats -join "`n"
[System.IO.File]::WriteAllText($out_file,$repeats)
This seems like a really inefficient way to do it since I'm going through each element of the array. Any suggestions on how to best optimize this? I have some ideas like putting more conditions on what elements to check and check against, but I feel like there's a drastically different approach that would be far better.
First, a solution based strictly on shared domain names (e.g., helpx.adobe.com and adobe.com are considered to belong to the same domain, but list-manage.com and manage.com are not).
This is not what you asked for, but perhaps more useful to future readers:
Get-Content parent.txt | Sort-Object -Unique { ($_ -split '\.')[-2,-1] -join '.' }
Assuming list.manage.com rather than list-manage.com in your sample input, the above command yields:
adk2.co
adk2.com
adobe.com
graph.facebook.com
manage.com
{ ($_ -split '\.')[-2,-1] -join '.' } sorts the input lines by the last 2 domain components (e.g., adobe.com):
-Unique discards duplicates.
A shared-suffix solution, as requested:
# Helper function for (naively) reversing a string.
# Note: Does not work properly with Unicode combining characters
# and surrogate pairs.
function reverse($str) { $a = $str.ToCharArray(); [Array]::Reverse($a); -join $a }
# * Sort the reversed input lines, which effectively groups them by shared suffix
# with the shortest entry first (e.g., the reverse of 'manage.com' before the
# reverse of 'list-manage.com').
# * It is then sufficient to output only the first entry in each group, using
# wildcard matching with -notlike to determine group boundaries.
# * Finally, sort the re-reversed results.
Get-Content parent.txt | ForEach-Object { reverse $_ } | Sort-Object |
ForEach-Object { $prev = $null } {
if ($null -eq $prev -or $_ -notlike "$prev*" ) {
reverse $_
$prev = $_
}
} | Sort-Object
One approach is to use a hash table to store all your parent values, then for each repeat, remove it from the table. The value 1 when adding to the hash table does not matter since we only test for existence of the key.
$parent = #(
'adk2.co',
'adk2.com',
'adobe.com',
'helpx.adobe.com',
'manage.com',
'list-manage.com'
)
$repeats = (
'adk2.com',
'helpx.adobe.com',
'list-manage.com'
)
$domains = #{}
$parent | % {$domains.Add($_, 1)}
$repeats | % {if ($domains.ContainsKey($_)) {$domains.Remove($_)}}
$domains.Keys | Sort

Filtering Search Results Against a CSV in Powershell

Im interested in some ideas on how one would approach coding a search of a filesystem for files that match any entries contained in a master CSV file. I have a function to search the filesystem, but filtering against the CSV is proving harder than I expect. I have a csv with headers in it for Name & IPaddr:
#create CSV object
$csv = import-csv filename.csv
#create filter object containing only Name column
$filter = $csv | select-object Name
#Now run the search function
SearchSubfolders | where {$_.name -match $filter} #returns no results
I guess my question is this: Can I filter against an array within a pipeline like this???
You need a pair of loops:
#create CSV object
$csv = import-csv filename.csv
#Now run the search function
#loop through the folders
foreach ($folder in (SearchSubfolders)) {
#check that folder against each item in the csv filter list
#this sets up the loop
foreach ($Filter in $csv.Name) {
#and this does the checking and outputs anything that is matched
If ($folder.name -match $Filter) { "$filter" }
}
}
Usually CSVs are 2-dimensional data structures, so you can't use them directly for filtering. You can convert the 2-dimensional array into a 1-dimensional array, though:
$filter = Import-Csv 'C:\path\to\some.csv' | % {
$_.PSObject.Properties | % { $_.Value }
}
If the CSV has just a single column, the "mangling" can be simplified to this (replace Name with the actual column name):
$filter = Import-Csv 'C:\path\to\some.csv' | % { $_.Name }
or this:
$filter = Import-Csv 'C:\path\to\some.csv' | select -Expand Name
Of course, if the CSV has just a single column, it would've been better to make it a flat list right away, so it could've been imported like this:
$filter = Get-Content 'C:\path\to\some.txt'
Either way, with the $filter prepared, you can apply it to your input data like this:
SearchSubFolders | ? { $filter -contains $_.Name } # ARRAY -contains VALUE
The -match operator won't work, because it compares a value (left operand) against a regular expression (right operand).
See Get-Help about_Comparison_Operators for more information.
Another option is to create a regex from the filename collection and use that to filter for all the filenames at once:
$filenames = import-csv filename.csv |
foreach { $_.name }
[regex]$filename_regex = ‘(?i)^(‘ + (($filenames | foreach {[regex]::escape($_)}) –join “|”) + ‘)$’
$SearchSubfolders |
where { $_.name -match $filename_regex }
You can use Compare-Object to do this pretty easily if you are matching the actual Names of the files to names in the list. An example:
$filter = import-csv files.csv
ls | Compare-Object -ReferenceObject $filter -IncludeEqual -ExcludeDifferent -Property Name
This will print the files in the current directory that match the any Name in files.csv. You could also print only the different ones by dropping -IncludeEqual and -ExcludeDifferent flags. If you need full regex matching you will have to loop through each regex in the csv and see if it is a match.
Here's any alternate solution that uses regular expression filters. Note that we will create and cache the regex instances so we don't have to rely on the runtime's internal cache (which defaults to 15 items). First we have a useful helper function, Test-Any that will loop through an array of items and stop if any of them satisfies a criteria:
function Test-Any() {
param(
[Parameter(Mandatory=$True,ValueFromPipeline=$True)]
[object[]]$Items,
[Parameter(Mandatory=$True,Position=2)]
[ScriptBlock]$Predicate)
begin {
$any = $false
}
process {
foreach($item in $items) {
if ($predicate.Invoke($item)) {
$any = $true
break
}
}
}
end { $any }
}
With this, the implementation is relatively simple:
$filters = import-csv files.csv | foreach { [regex]$_.Name }
ls -recurse | where { $name = $_.Name; $filters | Test-Any { $_.IsMatch($name) } }
I ended up using a 'loop within a loop' construct to get this done after much trial and error:
#the SearchSubFolders function was amended to force results in a variable, SearchResults
$SearchResults2 = #()
foreach ($result in $SearchResults){
foreach ($line in $filter){
if ($result -match $line){
$SearchResults2 += $result
}
}
}
This works great after collapsing my CSV file down to a text-based array containing only the necessary column data from that CSV. Much thanks to Ansgar Wiechers for assisting me with that particular thing!!!
All of you presented viable solutions, some more complex than I cared for, nevertheless if I could mark multiple answers as correct, I would!! I chose the correct answer based on not only correctness but also simplicity.....

Resources