Create an array from a CSV list - arrays

I have a list in orders.csv like so:
Order
1025405008
1054003899
1055003868
1079004365
I wish to add the unit number (2nd-4th chars) and the entire order number into an array, so it will be like:
"0254","1025405008"
"0540","1054003899"
etc
etc
I wish to ignore the prefix "1". So far, with my limited PS knowledge, I have created the variables:
$Orders = Import-csv c:\Orderlist.csv
$Units = $Orders | Select #{LABEL="Unit";EXPRESSION={$_.Order.Substring(1,4)}}
So I wish to combine the two into an array. I have tried
$array = $Units,Orders
Any help will be appreciated.

In case of a big CSV file that has just this one column using regexp is much faster than Select:
$combined = [IO.File]::ReadAllText('c:\Orderlist.csv') `
-replace '(?m)^\d(\d{4})\d+', '"$1","$&"' `
-replace '^Order', 'Unit, Order' | ConvertFrom-Csv
~6x faster on 100k records in a 2MB file (700ms vs 4100ms)

You can just select the Order within your Select statement and use the ConvertTo-Csv cmdlet to get the desired output:
$Orders = Import-csv c:\Orderlist.csv
$unitOrderArray = $Orders | Select #{LABEL="Unit";EXPRESSION={$_.Order.Substring(1,4)}}, Order
$unitOrderArray | ConvertTo-Csv -NoTypeInformation
Output:
"Unit","Order"
"0254","1025405008"
"0540","1054003899"
"0550","1055003868"
"0790","1079004365"

Related

Efficient way to remove duplicates from large 2D arrays in PowerShell

I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values

Getting "System.Collections.Generic.List`1[System.String]" in CSV File export when data is okay on screen

I am new to PowerShell and trying to get a list of VM names and their associated IP Addresses from within Hyper-V.
I am getting the information fine on the screen but when I try to export to csv all I get for the IP Addresses is System.Collections.Generic.List`1[System.String] on each line.
There are suggestions about "joins" or "ConvertTo-CSV" but I don't understand the syntax for these.
Can anyone help?
This is the syntax I am using...
Get-VM | Select -ExpandProperty VirtualNetworkAdapters | select name, IPV4Addresses | Export-Csv -Path "c:\Temp\VMIPs.csv"
If an object you export as CSV with Export-Csv or ConvertTo-Csv has property values that contain a collection (array) of values, these values are stringified via their .ToString() method, which results in an unhelpful representation, as in the case of your array-valued .IPV4Addresses property.
To demonstrate this with the ConvertTo-Csv cmdlet (which works analogously to Export-Csv, but returns the CSV data instead of saving it to a file):
PS> [pscustomobject] #{ col1 = 1; col2 = 2, 3 } | ConvertTo-Csv
"col1","col2"
"1","System.Object[]"
That is, the array 2, 3 stored in the .col2 property was unhelpfully stringified as System.Object[], which is what you get when you call .ToString() on a regular PowerShell array; other .NET collection types - such as [System.Collections.Generic.List[string]] in your case - stringify analogously; that is, by their type name.
Assuming you want to represent all values of an array-valued property in a single CSV column, to fix this problem you must decide on a meaningful string representation for the collection as a whole and implement it using Select-Object with a calculated property:
E.g., you can use the -join operator to create a space-separated list of the elements:
PS> [pscustomobject] #{ col1 = 1; col2 = 2, 3 } |
Select-Object col1, #{ n='col2'; e={ $_.col2 -join ' ' } } |
ConvertTo-Csv
"col1","col2"
"1","2 3"
Note how array 2, 3 was turned into string '2 3'.
OtherObjectPipedStuff | Select-object name,IPV4Addresses | export-csv PP.csv -NoTypeinformation

Export array with "sub-array"

I am currently trying to automate license counting in Office 365 across multiple partner tenants using PowerShell.
My current code (aquired from the internet) with some modifications gives me this output:
Column A Column B Column C
-------- -------- --------
CustA LicA,LicB 1,3
CustB LicA,LicB 7,3
CustC LicA 4
But the output I want from this code is:
Column A Column B Column C
-------- -------- --------
CustA LicA 1
LicB 3
CustB LicA 7
LicB 3
Here is my current code which is exported using Export-Csv -NoType:
$tenantID = (Get-MsolPartnerContract).tenantid
foreach($i in $tenantID){
$tenantName = Get-MsolPartnerInformation -TenantId $i
$tenantLicense = Get-MsolSubscription -TenantId $i
$properties = [ordered]#{
'Company' = ($tenantName.PartnerCompanyName -join ',')
'License' = ($tenantLicense.SkuPartNumber -join ',')
'LicenseCount' = ($tenantLicense.TotalLicenses -join ',')
}
$obj = New-Object -TypeName psobject -Property $properties
Write-Output $obj
}
I have tried this along with several other iterations of code which all fail catastophically:
$properties = [ordered]#{
'Company' = ($tenantName.PartnerCompanyName -join ','),
#{'License' = ($tenantLicense.SkuPartNumber -join ',')},
#{'LicenseCount' = ($tenantLicense.TotalLicenses -join',')}
}
I was thinking about making a "sub-array" $tenantLicense.SkuPartnumber and $tenantLicense.TotalLicenses, but I'm not quite sure how to approach this with appending it to the object or "main-array".
A second loop for each $tenantLIcense should do the trick for you. I don't have access to an environment like yours so I cannot test this.
$tenantID | ForEach-Object{
$tenantName = Get-MsolPartnerInformation -TenantId $_
$tenantLicense = Get-MsolSubscription -TenantId $_
# Make an object for each $tenantLicense
$tenantLicense | ForEach-Object{
$properties = [ordered]#{
'Company' = $tenantName.PartnerCompanyName
'License' = $_.SkuPartNumber
'LicenseCount' = $_.TotalLicenses
}
# Send the new object down the pipe.
New-Object -TypeName psobject -Property $properties
}
}
Since you have multiple $tenantLicenses that have the same company name lets just loop over those and use the same company name in the output. Assuming this worked it would not have the same output as you desired since there no logic to omit company in subsequent rows. I would argue that is it better this way since you can sort the data now with out loss of data / understanding.
Notice I change foreach() to ForEach-Object. This makes it simpler to send object down the pipe.
Without providing the code solution I can say that you need to build up the array.
In terms of programming, you will need to iterate the array ARRAY1 and populate another one ARRAY2with the extra rows. For example if columns A,B are simple value and C is an array of 3 items, then you would add 3 rows in the new table with A,B,C1, A,B,C2 and A,B,C3. On each iteration of the loop you need to calculate all the permutations, for example in your case the ones generated by columnB and columnC.
This should be also possible with pipelining using the ForEach-Object cmdlet but that is more difficult and as you mentioned your relatively new relationship with powershell I would not pursuit this path, unless of coarse you want to learn.

Filtering Search Results Against a CSV in Powershell

Im interested in some ideas on how one would approach coding a search of a filesystem for files that match any entries contained in a master CSV file. I have a function to search the filesystem, but filtering against the CSV is proving harder than I expect. I have a csv with headers in it for Name & IPaddr:
#create CSV object
$csv = import-csv filename.csv
#create filter object containing only Name column
$filter = $csv | select-object Name
#Now run the search function
SearchSubfolders | where {$_.name -match $filter} #returns no results
I guess my question is this: Can I filter against an array within a pipeline like this???
You need a pair of loops:
#create CSV object
$csv = import-csv filename.csv
#Now run the search function
#loop through the folders
foreach ($folder in (SearchSubfolders)) {
#check that folder against each item in the csv filter list
#this sets up the loop
foreach ($Filter in $csv.Name) {
#and this does the checking and outputs anything that is matched
If ($folder.name -match $Filter) { "$filter" }
}
}
Usually CSVs are 2-dimensional data structures, so you can't use them directly for filtering. You can convert the 2-dimensional array into a 1-dimensional array, though:
$filter = Import-Csv 'C:\path\to\some.csv' | % {
$_.PSObject.Properties | % { $_.Value }
}
If the CSV has just a single column, the "mangling" can be simplified to this (replace Name with the actual column name):
$filter = Import-Csv 'C:\path\to\some.csv' | % { $_.Name }
or this:
$filter = Import-Csv 'C:\path\to\some.csv' | select -Expand Name
Of course, if the CSV has just a single column, it would've been better to make it a flat list right away, so it could've been imported like this:
$filter = Get-Content 'C:\path\to\some.txt'
Either way, with the $filter prepared, you can apply it to your input data like this:
SearchSubFolders | ? { $filter -contains $_.Name } # ARRAY -contains VALUE
The -match operator won't work, because it compares a value (left operand) against a regular expression (right operand).
See Get-Help about_Comparison_Operators for more information.
Another option is to create a regex from the filename collection and use that to filter for all the filenames at once:
$filenames = import-csv filename.csv |
foreach { $_.name }
[regex]$filename_regex = ‘(?i)^(‘ + (($filenames | foreach {[regex]::escape($_)}) –join “|”) + ‘)$’
$SearchSubfolders |
where { $_.name -match $filename_regex }
You can use Compare-Object to do this pretty easily if you are matching the actual Names of the files to names in the list. An example:
$filter = import-csv files.csv
ls | Compare-Object -ReferenceObject $filter -IncludeEqual -ExcludeDifferent -Property Name
This will print the files in the current directory that match the any Name in files.csv. You could also print only the different ones by dropping -IncludeEqual and -ExcludeDifferent flags. If you need full regex matching you will have to loop through each regex in the csv and see if it is a match.
Here's any alternate solution that uses regular expression filters. Note that we will create and cache the regex instances so we don't have to rely on the runtime's internal cache (which defaults to 15 items). First we have a useful helper function, Test-Any that will loop through an array of items and stop if any of them satisfies a criteria:
function Test-Any() {
param(
[Parameter(Mandatory=$True,ValueFromPipeline=$True)]
[object[]]$Items,
[Parameter(Mandatory=$True,Position=2)]
[ScriptBlock]$Predicate)
begin {
$any = $false
}
process {
foreach($item in $items) {
if ($predicate.Invoke($item)) {
$any = $true
break
}
}
}
end { $any }
}
With this, the implementation is relatively simple:
$filters = import-csv files.csv | foreach { [regex]$_.Name }
ls -recurse | where { $name = $_.Name; $filters | Test-Any { $_.IsMatch($name) } }
I ended up using a 'loop within a loop' construct to get this done after much trial and error:
#the SearchSubFolders function was amended to force results in a variable, SearchResults
$SearchResults2 = #()
foreach ($result in $SearchResults){
foreach ($line in $filter){
if ($result -match $line){
$SearchResults2 += $result
}
}
}
This works great after collapsing my CSV file down to a text-based array containing only the necessary column data from that CSV. Much thanks to Ansgar Wiechers for assisting me with that particular thing!!!
All of you presented viable solutions, some more complex than I cared for, nevertheless if I could mark multiple answers as correct, I would!! I chose the correct answer based on not only correctness but also simplicity.....

How can I import a .csv into multiple arrays in a simpler way?

I am new to powershell and am writing my first somewhat complicated script. I would like to import a .csv file and create multiple text arrays with it. I think that I have found a way that will work but it will be time consuming to generate all of the lines that I need. I assume I can do it more simply using foreach-object but I can't seem to get the syntax right.
See my current code...
$vmimport = Import-Csv "gss_prod.csv"
$gssall = $vmimport | ForEach-Object {$_.vmName}`
$gssweb = $vmimport | Where-Object {$_.tier -eq web} | ForEach-Object {$_.vmName}
$gssapp = $vmimport | Where-Object {$_.tier -eq app} | ForEach-Object {$_.vmName}
$gsssql = $vmimport | Where-Object {$_.tier -eq sql} | ForEach-Object {$_.vmName}
The goal is to make 1 group with all entries containing only the vmName value, and then 3 separate groups containing only the vmName value but using the tier value to sort them.
Can anyone help me with an easier way to do this?
Thanks!
For the last three you can group the object by the Tier property and have the result as a hasthable. Then you can reference the Tier name to get its VMs.
#group objects by tier
$gs = $vmimport | Group-Object tier -AsHashTable
# get web VMs
$gs['web']
# get sql VMs
$gs['app']
You may want to use a dictionary for storing the data:
$vmimport = Import-Csv "gss_prod.csv"
$gssall = $vmimport | % { $_.vmName }
$categories = "web", "app", "sql", ...
$gss = #{}
foreach ($cat in $categories) {
$gss[$cat] = $vmimport | ? { $_.tier -eq $cat } | % { $_.vmName }
}
I like the Shay Levy way, but the values of hash tables remain hash tables. Here is an other more efficient approach where values are jagged arrays, and categories are made automatically (contrary to Ansgar Wiechers solution):
# define hashtable
$gs = #{};
# fill it
$vmimport | foreach {$gs[$_.tier]+=, $_.vmName};
# get web VMs
$gs['web'] # the result is an array of 'web' vmNames.

Resources