Parsing a String with quoted Fields like a CSV-line in Powershell - arrays

I have to parse a variable input-string into a string-array.
The input is a CSV-style comma-separated field-list where each field has its own quoted string.
Because I dont want to write my own full-blown CSV-parser the only working solution I could create till now is this one:
$input = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic"""'
Add-Type -AssemblyName Microsoft.VisualBasic
$enc = [System.Text.Encoding]::UTF8
$bytes = $enc.GetBytes($input)
$stream = [System.IO.MemoryStream]::new($bytes)
$parser = [Microsoft.VisualBasic.FileIO.TextFieldParser]::new($stream)
$parser.Delimiters = ','
$parser.HasFieldsEnclosedInQuotes = $true
$list = $parser.ReadFields()
$list
Output looks like this:
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Is there any better solution available via another .NET-library for Powersell?
In best case I could avoid this extra bytes-array and stream.
I am also not sure if this VisualBasic-Assembly will be avail on a long term.
Any ideas here?

With some extra precautions for security and to prevent inadvertent string extrapolation, you can combine Invoke-Expression with Write-Output, though note that Invoke-Expression should generally be avoided:
$fieldList = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
# Parse into array.
$fields = (
Invoke-Expression ("Write-Output -- " + ($fieldList -replace '\$', "`0"))
) -replace "`0", '$$'
Note:
-replace '\$', "`0" temporarily replaces literal $ chars. in the input with NUL chars. to prevent accidental (or malicious) string expansion (interpolation); the second -replace operation restores the original $ chars.
See this answer for more information about the regex-based -replace operator.
Prepending Write-Output -- to the resulting string and interpreting the result as a PowerShell command via Invoke-Expression causes Write-Output to parse the remainder of the string as individual arguments and output them as such. -- ensures that any arguments that happen to look like Write-Output's own parameters are not interpreted as such.
If and only if the input string is guaranteed to never contain embedded $ characters, the solution can be simplified to:
$fields = Invoke-Expression "Write-Output -- $fieldList"
Outputting $fields yields the following:
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Honey, I'm $HOME
Explanation and list of constraints:
The solution relies on making the input string part of a string whose content is a syntactically valid Write-Output call, with the input string serving as the latter's arguments. Invoke-Expression then evaluates this string as if its content had directly been submitted as a command and therefore executes the Write-Output command. Based on how PowerShell parses command arguments, this implies the following constraints:
Supported field separators:
Either: ,-separated (with per-field (unquoted) leading and/or trailing whitespace getting removed, as shown above).
Or: whitespace-separated, using one or more whitespace characters between the fields.
Non-/quoting of embedded fields:
Fields can be quoted:
If single-quoted ('...'), field-internal ' characters must be escaped as ''.
If double-quoted, field-internal " characters must be escaped as either "" or `".
Fields can also be unquoted:
However, such fields mustn't contain any PowerShell argument-mode metacharacters (of these, < > # # are only metacharacters at the start of a token):
<space> ' " ` , ; ( ) { } | & < > # #
Alternative, via ConvertFrom-Csv:
iRon's helpful answer shows a solution based on ConvertFrom-Csv, given that the field list embedded in the input string is comma-separated (,):
On the one hand, it is more limited in that it only supports "..."-quoting of fields and ""-escaping of field-internal ", and doesn't support fields separated by varying amounts of whitespace (only).
On the other hand, it is more flexible, in that it supports any single-character separator between the fields (irrespective of incidental leading/trailing per-field whitespace), which can be specified via the -Delimiter parameter.
What makes the solution awkward is the need to anticipate the max. number of embedded fields and to provide dummy headers (column names) for them (-Header (0..99)) in order to make ConvertFrom-Csv work, which is both fragile and potentially wasteful.
However, a simple trick can bypass this problem: Submit the input string twice, in which case ConvertFrom-Csv treats the fields in the input string as both the column names and as the column values of the one and only output row (object), whose values can then be queried:
$fieldList = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
# Creates the same array as the solution at the top.
$fields = ($fieldList, $fieldList | ConvertFrom-Csv).psobject.Properties.Value

If the list is limited, you might use the parser of the ConvertFrom-Csv cmdlet, like:
$List = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
($List | ConvertFrom-Csv -Header (0..99)).PSObject.Properties.Value.Where{ $Null -ne $_ }
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Honey, I'm $HOME

Related

How do I remove characters like "?" from an array powershell

I am trying to validate strings of text taken from PC descriptions in Active Directory.
But I want to remove rogue characters like a single value of "??" from any text before validating any text.
I have this test code as an example. But whenever it hits the random character "??"
It throws this error:
Error:
parsing "??" - Quantifier {x,y} following nothing.
At C:\Users\#####\OneDrive\Workingscripts\testscripts\removeingfromarray.ps1:11 char:5
+ If ($charigmorematch -match $descstr)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], ArgumentException
+ FullyQualifiedErrorId : System.ArgumentException
When all I want to do is remove it from the array!
Any help greatly appreciated.
This is the example code I have.
##Type characters to remove in array.
$charigmorematch = #("?"; "#"; "$")
##array declare
$userdesc = #()
###Where this would be an AD description from AD.
$ADUser = "Offline - ?? - test"
###Split AD Descrip into individual strings
$userdesc = $ADUser.Split("-").Trim()
###Run through them to check for rogue characters to remove
ForEach($descstr in $userdesc)
{
###If match found try and take it out
If ($charigmorematch -match $descstr)
{
###Store match in variable.
$strmatch = ($charigmorematch -match $descstr)
###Get the index of the string
$indexstr = $userdesc.indexof($descstr)
Write=host "Match: $strmatch Index: $indexstr"
###Once found a match of a rogue character then remove from the array!
##But I haven't figured out that code yet.
###Then a command to remove the string from the array with the index number.
###In this case it's likely to be [1] to remove. But the code has to work that out.
}
}
# Sample input.
$ADUser = "Offline - ?? - test"
# Split into tokens by "-", potentially surrounded by spaces,
# and filter out tokens that contain '?', '#', or '$'.
($ADUser -split ' *- *') -notmatch '[?#$]'
The result is the following array of tokens: 'Offline', 'test'
Note that -notmatch, like all comparison operators that (also) operate on strings, acts as a filter with an array as the LHS, as is the case here (-split always returns an an array).
Based on the additional requirements you mentioned in later comments, you're probably looking for something like this (splitting by - or |, trimming of surrounding (...)):
# Sample input
$ADUser = "Notebook PC | (Win 10) | E1234567 - simple ^^ user | Location ?? not # set"
($ADUser -split ' *[-|] *') -notmatch '[?#$]' -replace '^\(|\)$'
This results in the following array of tokens: 'Notebook PC', 'Win 10', 'E1234567', 'simple ^^ user'
Note that unless your input strings have leading or trailing spaces, there is no need for calling .Trim()
As for what you tried:
$charigmorematch -match $descstr
The -match operator:
requires the input string(s) to be the LHS (left-hand side) operand.
requires a regex (regular expression) as the RHS (right-hand side) operand, to formulate a pattern that the input is matched against.
By contrast, your attempted operation:
mistakenly reversed the order of operands ($descstr, as the string in which to look for regex patterns must be the LHS).
mistakenly used an array as the comparison pattern ($charigmorematch), instead of a (single) regex (expressed as a string) that uses a character set ([...]) to specify the characters of interest.

Powershell to split a string into an array by start and end characters

Given values to extract from a string, where each value is surrounded by a starting character and ending character, what would be the most effective way to achieve this?
eg, to get an array containing values: a b c
$mystring = "=a; =b; =c;"
$CharArray = $mystring.Split("=").Split(";")
There are numerous combinations of -replace, -split, .Split(), and .Replace() that could be used for this task. Here are some examples:
# Since each element is separated by a space, you can replace extraneous characters first
# -split operator alone splits by a space character
# This can have issues if your values contain spaces too
($mystring -replace '=|;').Split(' ')
-split ($mystring -replace '=|;')
# Since splitting creates white space at times, `Trim()` handles that.
# Because you expect an array after splitting, -ne '' will only return non-empty elements
$mystring.Split("=").Split(";").Trim() -ne ''
# Creates a array of of System.Char elements. Take note of the type here as it may not be what you want.
($mystring -replace ' ?=|;').ToCharArray()
# Uses Split(String[], StringSplitOptions) method
($myString -replace ' ?=').Split(';',[StringSplitOptions]::RemoveEmptyEntries)
David, what you have looks good yet here is another way to do it. The -replace method handles the space (" ") and equal sign (=).
$mystring = "=a; =b; =c;"
$CharArray = $mystring -split ";" -replace " |=",""

how to handle $ inside parameters in powershell [duplicate]

The following code (at the bottom) produces one of the following outputs in the file
4/12/2019 = (get-date).AddDays(2).ToShortDateString();
4/13/2019 = (get-date).AddDays(2 + 1).ToShortDateString();
or if I haven't initialized the variable
= (get-date).AddDays(2).ToShortDateString();
= (get-date).AddDays(2 + 1).ToShortDateString();
This is the code block, I would like the parent ps1 file to write the child ps1 file verbatim.
$multiLineScript2 = #"
$startDate2 = (get-date).AddDays($resultOfSubtraction).ToShortDateString();
$endDate2 = (get-date).AddDays($resultOfSubtraction + 1).ToShortDateString();
"#
$multiLineScript2 | Out-File "c:\file2.ps1";
tl;dr:
To create a verbatim multi-line string (i.e., a string with literal contents), use a single-quoted here-string:
$multiLineScript2 = #'
$startDate2 = (get-date).AddDays($resultOfSubtraction).ToShortDateString();
$endDate2 = (get-date).AddDays($resultOfSubtraction + 1).ToShortDateString();
'#
Note the use of #' and '# as the delimiters.
Use a double-quoted here-string only if string expansion (interpolation) is needed; to selectively suppress expansion, escape $ chars. to be included verbatim as `$, as shown in your own answer.
String Literals in PowerShell
Get-Help about_quoting rules discusses the types of string literals supported by PowerShell:
To get a string with literal content (no interpolation, what C# would call a verbatim string), use single quotes: '...'
To embed ' chars. inside a '...' string, double them (''); all other chars. can be used as-is.
To get an expandable string (string interpolation), i.e., a string in which variable references (e.g., $var or ${var}) and expressions (e.g., $($var.Name)) can be embedded that are replaced with their values, use double quotes: "..."
To selectively suppress expansion, backtick-escape $ chars.; e.g., to prevent $var from being interpolated (expanded to its value) inside a "..." string, use `$var; to embed a literal backtick, use ``
For an overview of the rules of string expansion, see this answer.
Both fundamental types are also available as here-strings - in the forms #'<newline>...<newline>'# and #"<newline>...<newline>"# respectively (<newline> stands for an actual newline (line break)) - which make defining multi-line strings easier.
Important:
Nothing (except whitespace) must follow the opening delimiter - #' or #" - on the same line - the string's content must be defined on the following lines.
The closing delimiter - '# or "# (matching the opening delimiter) - must be at the very start of a line.
Here-strings defined in files invariably use the newline format of their enclosing file (CRLF vs. LF), whereas interactively defined ones always use LF only.
Examples:
# Single-quoted: literal:
PS> 'I am $HOME'
I am $HOME
# Double-quoted: expandable
PS> "I am $HOME"
I am C:\Users\jdoe
# Here-strings:
# Literal
PS> #'
I am
$HOME
'#
I am
$HOME
# Expandable
PS> #"
I am
$HOME
"#
I am
C:\Users\jdoe
I couldn't find this anywhere, but it appears every single variable in the script (string literal) has to be escaped with a tick like so. Instead of deleting the question I'll leave it up for a search hit.
$multiLineScript2 = #"
`$startDate2 = (get-date).AddDays($resultOfSubtraction).ToShortDateString();
`$endDate2 = (get-date).AddDays($resultOfSubtraction + 1).ToShortDateString();
"#

Out-File output is missing line feeds between lines of data

I am passing in an array of $users.
PS C:\> $users | ft
ID DisplayName AdminID first last Password
---- ----------- ------- ----- ---- --------
Axyz Axyz, Bill NBX_Admin Bill Axyz Secret
The code:
$y = #()
$y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
foreach ($x in $users) {
$y += "User $($x.DisplayName) with NNN of $($x.ID)"
}
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
$y | Out-File "Log.txt"
$y is now an unformatted string array. When I type $y to the screen, it looks great.
If I direct it to Format-Table, it looks great (no headings).
When I output it to a file, and type that file at a Command Prompt (cmd.exe), it looks great.
However, when I pull it up in Notepad, all the output appears on a single line. To be precise, all the data is there, there are no lines of data missing, but there are no CR/LF so all of the data appears on a single line within the file when viewed with Notepad.exe.
As AdminOfThings correctly points out:
While $y = #() assigns an empty array to $y, it doesn't type-constrain that variable, so your very next assignment - $y = "Create Users process ..." - changes the variable type to a string.
Simply using += instead of = in that subsequent assignment would have prevented the problem: $y += "Create Users process ...".
Alternatively, type-constraining the variable creation - [array] $y = #() - i.e., placing a type literal to the left of the variable being assigned (akin to a cast) - would have prevented the problem too.
Subsequent use of += therefore performs simple string concatenation rather than the desired gradual building of an array, with no separators between the "lines" added.[1]
By contrast, had you used an array as intended, both Out-File and Set-Content would automatically insert platform-appropriate newlines[2] between the elements, plus one at the end, on saving (in PSv5+ you can use the -NoNewline switch to opt out).
That said, using += to "extend" an array is inefficient, because what PowerShell must do behind the scenes is create a new array containing the old elements plus the new one(s), given that arrays are fixed-size data structures.
While the performance penalty for use of += to "extend" arrays in a loop only really matters with high iteration counts, it is more concise, convenient and efficient to let PowerShell create arrays for you implicitly, by using your foreach loop as an expression:
# Initialize the array and assign the first element.
# Due to the type constraint ([array]), the RHS string implicitly becomes
# the array's 1st element.
[array] $y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Add the strings output by the foreach loop to the array.
# PowerShell implicitly collects foreach output in an array when
# you use it in as an expression.
$y += foreach ($x in $users)
{
"User $($x.displayname) with NNN of $($x.ID)"
}
# Add the final string to the array.
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Send the array to a file with Out-File, which separates
# the elements with newlines and adds a trailing one.
# Windows PowerShell:
# Out-File creates UTF-16LE-encoded files.
# Set-Content, which can alternatively be used, creates "ANSI"-encoded files.
# PowerShell Core:
# Both cmdlets create UTF-8-encoded files without BOM.
$y | Out-File "Log.txt"
Note that you can similarly use for, if, do / while / switch statements as expressions.
In all cases, however, as of PowerShell 7.0, these statements can only serve as expressions by themselves; regrettably, using them as the first segment of a pipeline or embedding them in larger expressions does not work - see this GitHub issue.
[1] A simple demonstration of your problem:
# The initialization of $y as #() is overridden by $y = 'first'.
PS> $y = #(); $y = 'first'; $y += 'second'; $y
firstsecond # !! $y contains a single string built with string concatenation
The description of your symptoms is therefore not consistent with your code, as you should have seen a single-line output string in all scenarios (printing directly to the screen / via Format-Table, sending to a file and type-ing that from cmd.exe).
[2] The platform-appropriate newline is reflected in [Environment]::NewLine, and it is "`r`n" (CRLF) on Windows, and just "`n" (LF) on Unix-like platforms (in PowerShell Core).
As using += recreates the array on every iteration I'd suggest to assign the output of a ForEach-Object with it's -Begin, -Process and -End sections to a variable also using a more common approach of the format operator.:
$Log = $users | ForEach-Object -Begin {
"Create Users process. Run started at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
} -Process {
"User {0} with NNN of {1}" -f $_.DisplayName,$_.ID
} -End {
"Completed at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
}
$Log | Set-Content "Log.txt"

PowerShell regex to extract SID from filename

I have an array $vhdlist with contents similar to the following filenames:
UVHD-S-1-5-21-8746256374-654813465-374012747-4533.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-6175.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-8147.vhdx
UVHD-template.vhdx
I want to use a regex and be left with an array containing only SID portion of the filenames.
I am using the following:
$sids = foreach ($file in $vhdlist)
{
[regex]::split($file, '^UVHD-(?:([(\d)(\w)-]+)).vhdx$')
}
There are 2 problems with this: in the resulting array there are 3 blank lines for every SID; and the "template" filename matches (the resulting line in the output is just "template"). How can I get an array of SIDs as the output and not include the "template" line?
You seem to want to filter the list down to those filenames that contain an SID. Filtering is done with Where-Object (where for short); you don't need a loop.
An SID could be described as "S- and then a bunch of digits and dashes" for this simple case. That leaves us with ^UVHD-S-[\d-]*\.vhdx$ for the filename.
In combination we get:
$vhdlist | where { $_ -Match "^UVHD-S-[\d-]*\.vhdx$" }
When you don't really have an array of strings, but actually an array of files, use them directly.
dir C:\some\folder | where { $_.Name -Match "^UVHD-S-[\d-]*\.vhdx$" }
Or, possibly you can even make it as simple as:
dir C:\some\folder\UVHD-S-*.vhdx
EDIT
Extracting the SIDs from a list of strings can be thought as a combined transformation (for each element, extract the SID) and filter (remove non-matches) operation.
PowerShell's ForEach-Object cmdlet (foreach for short) works like map() in other languages. It takes every input element and returns a new value. In effect it transforms a list of input elements into output elements. Together with the -replace operator you can extract SIDs this way.
$vhdlist | foreach { $_ -replace ^(?:UVHD-(S-[\d-]*)\.vhdx|.*)$,"`$1" } | where { $_ -gt "" }
The regex back-reference for .NET languages is $1. The $ is a special character in PowerShell strings, so it needs to be escaped, except when there is no ambiguity. The backtick is the PS escape character. You can escape the $ in the regex as well, but there it's not necessary.
As a final step we use where to remove empty strings (i.e. non-matches). Doing it this way around means we only need to apply the regex once, instead of two times when filtering first and replacing second.
PowerShell operators can also work on lists directly. So the above could even be shortened:
$vhdlist -replace "^UVHD-(S-[\d-]*)\.vhdx$","`$1" | where { $_ -gt "" }
The shorter version only works on lists of actual strings or objects that produce the right thing when .ToString() is called on them.
Regex breakdown:
^ # start-of-string anchor
(?: # begin non-capturing group (either...)
UVHD- # 'UVHD-'
( # begin group 1
S-[\d-]* # 'S-' and however many digits and dashes
) # end group 1
\.vhdx # '.vhdx'
| # ...or...
.* # anything else
) # end non-capturing group
$ # end-of-string anchor

Resources