Extracting values from strings in an array using powershell - arrays

I've got an issue trying to extract particular values from an Array. I have an array that contains 40010 rows each of which is a string of pipe separated values (64 on each line).
I need to extract values 7, 4, 22, 23, 24, 52 and 62 from each row and write it into a new array so that I will end up with a new array containing 40010 rows with only 7 pipe separated values in each (could be comma separated) row.
I've looked at split and can't seem to get my head around it to even get close to what I need.
I'd also be open to doing this from a file as I'm currently creating my 1st array with
$data = (Get-content $statement_file|Select-String "^01")
If I can add to that command to do the split on the input so I only have one array and don't need the intermediate array that would be even better.
I know if I was in Linux I could do the split with AWK quite easily but I'm fairly new to powershell so would appreciate any suggestions

# create an array of header columns (assuming your pipe separated file doesn't have headers)
$header = 1..64 | ForEach-Object { "h$_" }
# import the file as 'csv' but with pipes as separators, use the above header, then select columns 7,4,22,23,24,52,62
# edit 1: then only return rows that start with 01
# edit 2: then join these into a pipe separated string
$smallerArray = $statement_file |
Import-Csv -Delimiter '|' -Header $header |
Where-Object { $_.h1.StartsWith('01') } |
Select-Object #{Name="piped"; Expression={ #($_.h7,$_.h4,$_.h22,$_.h23,$_.h24,$_.h52,$_.h62) -join '|' }} |
Select-Object -ExpandProperty piped

Related

Powershell creating an array from a csv of column A based on contents of column B

I have a csv file that contains a host of columns but only 2 are relevant: "Emails" and "Followup".
The Emails column contains a list of email addresses, and Followup contains either Y's or N's; I have 1 Y and 7 N's.
I'm working with a header row and then 8 rows of data.
So far, I have this script:
$File = Import-Csv -Path "C:\file.csv"
$Addresses = #($File.email)
$Followup = #($File.followup)
$To = #($Addresses | Where-Object {$Followup -eq "Y"})
$Addresses
$Followup
$To
I'm trying to create another array ($To) that contains only those email addresses that are paired with a "Y" in the Followup column.
For my output, I'm getting 8 email addresses (good), 7 N's and 1 Y (good), and 8 email addresses again (bad, desired outcome is 1).
What am I fouling up here?
Instead of splitting the email and followup columns into separate arrays, keep them together, so that Where-Object can operate on the followup property and output the object with the corresponding email:
$File = Import-Csv -Path "C:\file.csv"
# Filter on the `followup` column
$ShouldFollowup = $File |Where-Object followup -eq 'Y'
# Grab ONLY the value of the `email` column
$Addresses = $ShouldFollowup.email

Getting "System.Collections.Generic.List`1[System.String]" in CSV File export when data is okay on screen

I am new to PowerShell and trying to get a list of VM names and their associated IP Addresses from within Hyper-V.
I am getting the information fine on the screen but when I try to export to csv all I get for the IP Addresses is System.Collections.Generic.List`1[System.String] on each line.
There are suggestions about "joins" or "ConvertTo-CSV" but I don't understand the syntax for these.
Can anyone help?
This is the syntax I am using...
Get-VM | Select -ExpandProperty VirtualNetworkAdapters | select name, IPV4Addresses | Export-Csv -Path "c:\Temp\VMIPs.csv"
If an object you export as CSV with Export-Csv or ConvertTo-Csv has property values that contain a collection (array) of values, these values are stringified via their .ToString() method, which results in an unhelpful representation, as in the case of your array-valued .IPV4Addresses property.
To demonstrate this with the ConvertTo-Csv cmdlet (which works analogously to Export-Csv, but returns the CSV data instead of saving it to a file):
PS> [pscustomobject] #{ col1 = 1; col2 = 2, 3 } | ConvertTo-Csv
"col1","col2"
"1","System.Object[]"
That is, the array 2, 3 stored in the .col2 property was unhelpfully stringified as System.Object[], which is what you get when you call .ToString() on a regular PowerShell array; other .NET collection types - such as [System.Collections.Generic.List[string]] in your case - stringify analogously; that is, by their type name.
Assuming you want to represent all values of an array-valued property in a single CSV column, to fix this problem you must decide on a meaningful string representation for the collection as a whole and implement it using Select-Object with a calculated property:
E.g., you can use the -join operator to create a space-separated list of the elements:
PS> [pscustomobject] #{ col1 = 1; col2 = 2, 3 } |
Select-Object col1, #{ n='col2'; e={ $_.col2 -join ' ' } } |
ConvertTo-Csv
"col1","col2"
"1","2 3"
Note how array 2, 3 was turned into string '2 3'.
OtherObjectPipedStuff | Select-object name,IPV4Addresses | export-csv PP.csv -NoTypeinformation

Can't write character array to file in Powershell

OK, Powershell may not be the best tool for the job but it's the only one available to me.
I have a bunch of 600K+ row .csv data files. Some of them have delimiter errors e.g. " in the middle of a text field or "" at the start of one. They are too big to edit (even in UltraEdit) and fix manually even if I wanted to which I don't!
Because the double-""-delimeter at the start of some text fields and rogue-"-delimiter in the middle of some text fields, I haven't used a header row to define the columns because these rows appear as if there is an extra column in them due to the extra delimiter.
I need to parse the file looking for "" instead of " at the start of a text-field and also to look for " in the middle of a text field and remove them.
I have managed to write the code to do this (after a fashion) by basically reading the whole file into an array, looping through it and adding output characters to an output array.
What I haven't managed to do is successfully write this output array to a file.
I have read every part of https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Utility/out-file?view=powershell-5.1 that seemed relevant. I've also trawled through about 10 similar questions on this site and attempted various code gleaned from them.
The output array prints perfectly to screen using a Write-Host but I can't get the data back into a file for love or money. I have a total of 1.5days Powershell experience so far! All suggestions gratefully received.
Here is my code to read/identify rogue delimiters (not pretty (at all), refer previous explanation of data and available technology constraints):
$ContentToCheck=get-content 'myfile.csv' | foreach { $_.ToCharArray()}
$ContentOutputArray=#()
for ($i = 0; $i -lt $ContentToCheck.count; $i++)
{
if (!($ContentToCheck[$i] -match '"')) {#not a quote
if (!($ContentToCheck[$i] -match ',')) {#not a comma i.e. other char that could be enclosed in ""
if ($ContentToCheck[$i-1] -match '"' ) {#check not rogue " delimiter in previous char allow for start of file exception i>1?
if (!($ContentToCheck[$i-2] -match ',') -and !($ContentToCheck[$i-3] -match '"')){
Write-Host 'Delimiter error' $i
$ContentOutputArray+= ''
}#endif not preceded by ",
}#endif"
else{#previous char not a " so move on
$ContentOutputArray+= $ContentToCheck[$i]
}
}#endifnotacomma
else
{#a comma, include it
$ContentOutputArray+= $ContentToCheck[$i]
}#endacomma
}#endifnotaquote
else
{#a quote so just append it to the output array
$ContentOutputArray+= $ContentToCheck[$i]
}#endaquote
}#endfor
So far so good, if inelegant. if I do a simple
Write-Host $ContentOutputArray
data displays nicely " 6 5 " , " 652 | | 999 " , " 99 " , " " , " 678 | | 1 " ..... furthermore when I check the size of the array (based on a cut-down version of one of the problem files)
$ContentOutputArray.count
I get 2507 character length of array. Happy out. However, then variously using:
$ContentOutputArray | Set-Content 'myfile_FIXED.csv'
creates blank file
$ContentOutputArray | out-file 'myfile_FIXED.csv' -encoding ASCII
creates blank file
$ContentOutputArray | export-csv 'myfile_FIXED.csv'
gives only '#TYPE System.Char' in file
$ContentOutputArray | Export-Csv 'myfile_FIXED.csv' -NoType
gives empty file
$ContentOutputArray >> 'myfile_FIXED.csv'
gives blanks separated by ,
What else can I try to write an array of characters to a flat file? It seems such a basic question but it has me stumped. Thanks for reading.
Convert (or cast) the char array to a string before exporting it.
(New-Object string (,$ContentOutputArray)) |Set-Content myfile_FIXED.csv

PowerShell regex to extract SID from filename

I have an array $vhdlist with contents similar to the following filenames:
UVHD-S-1-5-21-8746256374-654813465-374012747-4533.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-6175.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-8147.vhdx
UVHD-template.vhdx
I want to use a regex and be left with an array containing only SID portion of the filenames.
I am using the following:
$sids = foreach ($file in $vhdlist)
{
[regex]::split($file, '^UVHD-(?:([(\d)(\w)-]+)).vhdx$')
}
There are 2 problems with this: in the resulting array there are 3 blank lines for every SID; and the "template" filename matches (the resulting line in the output is just "template"). How can I get an array of SIDs as the output and not include the "template" line?
You seem to want to filter the list down to those filenames that contain an SID. Filtering is done with Where-Object (where for short); you don't need a loop.
An SID could be described as "S- and then a bunch of digits and dashes" for this simple case. That leaves us with ^UVHD-S-[\d-]*\.vhdx$ for the filename.
In combination we get:
$vhdlist | where { $_ -Match "^UVHD-S-[\d-]*\.vhdx$" }
When you don't really have an array of strings, but actually an array of files, use them directly.
dir C:\some\folder | where { $_.Name -Match "^UVHD-S-[\d-]*\.vhdx$" }
Or, possibly you can even make it as simple as:
dir C:\some\folder\UVHD-S-*.vhdx
EDIT
Extracting the SIDs from a list of strings can be thought as a combined transformation (for each element, extract the SID) and filter (remove non-matches) operation.
PowerShell's ForEach-Object cmdlet (foreach for short) works like map() in other languages. It takes every input element and returns a new value. In effect it transforms a list of input elements into output elements. Together with the -replace operator you can extract SIDs this way.
$vhdlist | foreach { $_ -replace ^(?:UVHD-(S-[\d-]*)\.vhdx|.*)$,"`$1" } | where { $_ -gt "" }
The regex back-reference for .NET languages is $1. The $ is a special character in PowerShell strings, so it needs to be escaped, except when there is no ambiguity. The backtick is the PS escape character. You can escape the $ in the regex as well, but there it's not necessary.
As a final step we use where to remove empty strings (i.e. non-matches). Doing it this way around means we only need to apply the regex once, instead of two times when filtering first and replacing second.
PowerShell operators can also work on lists directly. So the above could even be shortened:
$vhdlist -replace "^UVHD-(S-[\d-]*)\.vhdx$","`$1" | where { $_ -gt "" }
The shorter version only works on lists of actual strings or objects that produce the right thing when .ToString() is called on them.
Regex breakdown:
^ # start-of-string anchor
(?: # begin non-capturing group (either...)
UVHD- # 'UVHD-'
( # begin group 1
S-[\d-]* # 'S-' and however many digits and dashes
) # end group 1
\.vhdx # '.vhdx'
| # ...or...
.* # anything else
) # end non-capturing group
$ # end-of-string anchor

Create an array from a CSV list

I have a list in orders.csv like so:
Order
1025405008
1054003899
1055003868
1079004365
I wish to add the unit number (2nd-4th chars) and the entire order number into an array, so it will be like:
"0254","1025405008"
"0540","1054003899"
etc
etc
I wish to ignore the prefix "1". So far, with my limited PS knowledge, I have created the variables:
$Orders = Import-csv c:\Orderlist.csv
$Units = $Orders | Select #{LABEL="Unit";EXPRESSION={$_.Order.Substring(1,4)}}
So I wish to combine the two into an array. I have tried
$array = $Units,Orders
Any help will be appreciated.
In case of a big CSV file that has just this one column using regexp is much faster than Select:
$combined = [IO.File]::ReadAllText('c:\Orderlist.csv') `
-replace '(?m)^\d(\d{4})\d+', '"$1","$&"' `
-replace '^Order', 'Unit, Order' | ConvertFrom-Csv
~6x faster on 100k records in a 2MB file (700ms vs 4100ms)
You can just select the Order within your Select statement and use the ConvertTo-Csv cmdlet to get the desired output:
$Orders = Import-csv c:\Orderlist.csv
$unitOrderArray = $Orders | Select #{LABEL="Unit";EXPRESSION={$_.Order.Substring(1,4)}}, Order
$unitOrderArray | ConvertTo-Csv -NoTypeInformation
Output:
"Unit","Order"
"0254","1025405008"
"0540","1054003899"
"0550","1055003868"
"0790","1079004365"

Resources