PowerShell regex to extract SID from filename - arrays

I have an array $vhdlist with contents similar to the following filenames:
UVHD-S-1-5-21-8746256374-654813465-374012747-4533.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-6175.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-8147.vhdx
UVHD-template.vhdx
I want to use a regex and be left with an array containing only SID portion of the filenames.
I am using the following:
$sids = foreach ($file in $vhdlist)
{
[regex]::split($file, '^UVHD-(?:([(\d)(\w)-]+)).vhdx$')
}
There are 2 problems with this: in the resulting array there are 3 blank lines for every SID; and the "template" filename matches (the resulting line in the output is just "template"). How can I get an array of SIDs as the output and not include the "template" line?

You seem to want to filter the list down to those filenames that contain an SID. Filtering is done with Where-Object (where for short); you don't need a loop.
An SID could be described as "S- and then a bunch of digits and dashes" for this simple case. That leaves us with ^UVHD-S-[\d-]*\.vhdx$ for the filename.
In combination we get:
$vhdlist | where { $_ -Match "^UVHD-S-[\d-]*\.vhdx$" }
When you don't really have an array of strings, but actually an array of files, use them directly.
dir C:\some\folder | where { $_.Name -Match "^UVHD-S-[\d-]*\.vhdx$" }
Or, possibly you can even make it as simple as:
dir C:\some\folder\UVHD-S-*.vhdx
EDIT
Extracting the SIDs from a list of strings can be thought as a combined transformation (for each element, extract the SID) and filter (remove non-matches) operation.
PowerShell's ForEach-Object cmdlet (foreach for short) works like map() in other languages. It takes every input element and returns a new value. In effect it transforms a list of input elements into output elements. Together with the -replace operator you can extract SIDs this way.
$vhdlist | foreach { $_ -replace ^(?:UVHD-(S-[\d-]*)\.vhdx|.*)$,"`$1" } | where { $_ -gt "" }
The regex back-reference for .NET languages is $1. The $ is a special character in PowerShell strings, so it needs to be escaped, except when there is no ambiguity. The backtick is the PS escape character. You can escape the $ in the regex as well, but there it's not necessary.
As a final step we use where to remove empty strings (i.e. non-matches). Doing it this way around means we only need to apply the regex once, instead of two times when filtering first and replacing second.
PowerShell operators can also work on lists directly. So the above could even be shortened:
$vhdlist -replace "^UVHD-(S-[\d-]*)\.vhdx$","`$1" | where { $_ -gt "" }
The shorter version only works on lists of actual strings or objects that produce the right thing when .ToString() is called on them.
Regex breakdown:
^ # start-of-string anchor
(?: # begin non-capturing group (either...)
UVHD- # 'UVHD-'
( # begin group 1
S-[\d-]* # 'S-' and however many digits and dashes
) # end group 1
\.vhdx # '.vhdx'
| # ...or...
.* # anything else
) # end non-capturing group
$ # end-of-string anchor

Related

Parsing a String with quoted Fields like a CSV-line in Powershell

I have to parse a variable input-string into a string-array.
The input is a CSV-style comma-separated field-list where each field has its own quoted string.
Because I dont want to write my own full-blown CSV-parser the only working solution I could create till now is this one:
$input = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic"""'
Add-Type -AssemblyName Microsoft.VisualBasic
$enc = [System.Text.Encoding]::UTF8
$bytes = $enc.GetBytes($input)
$stream = [System.IO.MemoryStream]::new($bytes)
$parser = [Microsoft.VisualBasic.FileIO.TextFieldParser]::new($stream)
$parser.Delimiters = ','
$parser.HasFieldsEnclosedInQuotes = $true
$list = $parser.ReadFields()
$list
Output looks like this:
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Is there any better solution available via another .NET-library for Powersell?
In best case I could avoid this extra bytes-array and stream.
I am also not sure if this VisualBasic-Assembly will be avail on a long term.
Any ideas here?
With some extra precautions for security and to prevent inadvertent string extrapolation, you can combine Invoke-Expression with Write-Output, though note that Invoke-Expression should generally be avoided:
$fieldList = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
# Parse into array.
$fields = (
Invoke-Expression ("Write-Output -- " + ($fieldList -replace '\$', "`0"))
) -replace "`0", '$$'
Note:
-replace '\$', "`0" temporarily replaces literal $ chars. in the input with NUL chars. to prevent accidental (or malicious) string expansion (interpolation); the second -replace operation restores the original $ chars.
See this answer for more information about the regex-based -replace operator.
Prepending Write-Output -- to the resulting string and interpreting the result as a PowerShell command via Invoke-Expression causes Write-Output to parse the remainder of the string as individual arguments and output them as such. -- ensures that any arguments that happen to look like Write-Output's own parameters are not interpreted as such.
If and only if the input string is guaranteed to never contain embedded $ characters, the solution can be simplified to:
$fields = Invoke-Expression "Write-Output -- $fieldList"
Outputting $fields yields the following:
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Honey, I'm $HOME
Explanation and list of constraints:
The solution relies on making the input string part of a string whose content is a syntactically valid Write-Output call, with the input string serving as the latter's arguments. Invoke-Expression then evaluates this string as if its content had directly been submitted as a command and therefore executes the Write-Output command. Based on how PowerShell parses command arguments, this implies the following constraints:
Supported field separators:
Either: ,-separated (with per-field (unquoted) leading and/or trailing whitespace getting removed, as shown above).
Or: whitespace-separated, using one or more whitespace characters between the fields.
Non-/quoting of embedded fields:
Fields can be quoted:
If single-quoted ('...'), field-internal ' characters must be escaped as ''.
If double-quoted, field-internal " characters must be escaped as either "" or `".
Fields can also be unquoted:
However, such fields mustn't contain any PowerShell argument-mode metacharacters (of these, < > # # are only metacharacters at the start of a token):
<space> ' " ` , ; ( ) { } | & < > # #
Alternative, via ConvertFrom-Csv:
iRon's helpful answer shows a solution based on ConvertFrom-Csv, given that the field list embedded in the input string is comma-separated (,):
On the one hand, it is more limited in that it only supports "..."-quoting of fields and ""-escaping of field-internal ", and doesn't support fields separated by varying amounts of whitespace (only).
On the other hand, it is more flexible, in that it supports any single-character separator between the fields (irrespective of incidental leading/trailing per-field whitespace), which can be specified via the -Delimiter parameter.
What makes the solution awkward is the need to anticipate the max. number of embedded fields and to provide dummy headers (column names) for them (-Header (0..99)) in order to make ConvertFrom-Csv work, which is both fragile and potentially wasteful.
However, a simple trick can bypass this problem: Submit the input string twice, in which case ConvertFrom-Csv treats the fields in the input string as both the column names and as the column values of the one and only output row (object), whose values can then be queried:
$fieldList = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
# Creates the same array as the solution at the top.
$fields = ($fieldList, $fieldList | ConvertFrom-Csv).psobject.Properties.Value
If the list is limited, you might use the parser of the ConvertFrom-Csv cmdlet, like:
$List = '"Miller, Steve", "Zappa, Frank", "Johnson, Earvin ""Magic""", "Honey, I''m $HOME"'
($List | ConvertFrom-Csv -Header (0..99)).PSObject.Properties.Value.Where{ $Null -ne $_ }
Miller, Steve
Zappa, Frank
Johnson, Earvin "Magic"
Honey, I'm $HOME

Compare Two Arrays, Email Results in PowerShell

I'm attempting to compare two arrays: one contains a list of usernames (dynamic, sometimes more usernames, sometimes less) and the other contains a list of file names (also dynamic). Every file name contains the username along with other text, e.g "Username report [date].xlsx". The goal is to match the elements between Array A and Array B.
Array A is just usernames.
Output of Array A, contained in $Username is just:
PersonA
PersonB
PersonC
etc...
Array B contains filepaths, but I can narrow it down to just filenames like so $ArrayB.Name (the full path would be $ArrayB.FullName). The naming format for Array B is "Username report [date].xlsx".
Output of Array B, contained within $LatestFiles.Name (for the file name) is:
PersonA Report 1-1-21.xlsx
PersonB Report 1-1-21.xlsx
PersonC Report 1-1-21.xlsx
After matching, the final piece would be if element in Array A matches element in Array B, attach ArrayB.FullName to the corresponding username + "#domain.com".
Unfortunately I can't even get the matching to work properly.
I've tried:
foreach ($elem in $UserName) { if ($LatestFiles.Name -contains $elem) { "there is a match" } }
and
foreach ($elem in $UserName) {
if($LatestFiles.Name -contains $elem) {
"There is a match"
} else {
"There is no match"
}
}
and a couple different variations, but I can't get them to output the matches. Any assistance is appreciated.
Short answer to why you can't get matches:
-Contains is meant for matching a value against a collection, not a String. You would be better off using -Like for your comparison operator. Or, at the very least, you may be trying to see if the name of the file, and not simply the part of the name that holds the user, is in the collection of user names.
I sounds like you are not simply comparing the arrays, but the more complicated matter of what you do with the two matching elements.
$LatestFiles |
ForEach-Object {
# first, let's take the formatted file name and make it more usable
$fileName = $_.BaseName -split ' ' # BaseName over Name so that it strips the extension
Write-Output #{
User = $fileName[0]
Date = $fileName[2]
File = $_
}
} -PipelineVariable FileData |
# next, only accept valid users
Where-Object User -In $UserName |
# note that we don't need the value from $UserName because we already have it in $FileData.User (what we matched)
ForEach-Object {
# finally do your thing with it.
$userWithDomain = "$($FileData.User)#domain.com"
Write-Output $userWithDomain # or whatever you want
}

Out-File output is missing line feeds between lines of data

I am passing in an array of $users.
PS C:\> $users | ft
ID DisplayName AdminID first last Password
---- ----------- ------- ----- ---- --------
Axyz Axyz, Bill NBX_Admin Bill Axyz Secret
The code:
$y = #()
$y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
foreach ($x in $users) {
$y += "User $($x.DisplayName) with NNN of $($x.ID)"
}
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
$y | Out-File "Log.txt"
$y is now an unformatted string array. When I type $y to the screen, it looks great.
If I direct it to Format-Table, it looks great (no headings).
When I output it to a file, and type that file at a Command Prompt (cmd.exe), it looks great.
However, when I pull it up in Notepad, all the output appears on a single line. To be precise, all the data is there, there are no lines of data missing, but there are no CR/LF so all of the data appears on a single line within the file when viewed with Notepad.exe.
As AdminOfThings correctly points out:
While $y = #() assigns an empty array to $y, it doesn't type-constrain that variable, so your very next assignment - $y = "Create Users process ..." - changes the variable type to a string.
Simply using += instead of = in that subsequent assignment would have prevented the problem: $y += "Create Users process ...".
Alternatively, type-constraining the variable creation - [array] $y = #() - i.e., placing a type literal to the left of the variable being assigned (akin to a cast) - would have prevented the problem too.
Subsequent use of += therefore performs simple string concatenation rather than the desired gradual building of an array, with no separators between the "lines" added.[1]
By contrast, had you used an array as intended, both Out-File and Set-Content would automatically insert platform-appropriate newlines[2] between the elements, plus one at the end, on saving (in PSv5+ you can use the -NoNewline switch to opt out).
That said, using += to "extend" an array is inefficient, because what PowerShell must do behind the scenes is create a new array containing the old elements plus the new one(s), given that arrays are fixed-size data structures.
While the performance penalty for use of += to "extend" arrays in a loop only really matters with high iteration counts, it is more concise, convenient and efficient to let PowerShell create arrays for you implicitly, by using your foreach loop as an expression:
# Initialize the array and assign the first element.
# Due to the type constraint ([array]), the RHS string implicitly becomes
# the array's 1st element.
[array] $y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Add the strings output by the foreach loop to the array.
# PowerShell implicitly collects foreach output in an array when
# you use it in as an expression.
$y += foreach ($x in $users)
{
"User $($x.displayname) with NNN of $($x.ID)"
}
# Add the final string to the array.
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Send the array to a file with Out-File, which separates
# the elements with newlines and adds a trailing one.
# Windows PowerShell:
# Out-File creates UTF-16LE-encoded files.
# Set-Content, which can alternatively be used, creates "ANSI"-encoded files.
# PowerShell Core:
# Both cmdlets create UTF-8-encoded files without BOM.
$y | Out-File "Log.txt"
Note that you can similarly use for, if, do / while / switch statements as expressions.
In all cases, however, as of PowerShell 7.0, these statements can only serve as expressions by themselves; regrettably, using them as the first segment of a pipeline or embedding them in larger expressions does not work - see this GitHub issue.
[1] A simple demonstration of your problem:
# The initialization of $y as #() is overridden by $y = 'first'.
PS> $y = #(); $y = 'first'; $y += 'second'; $y
firstsecond # !! $y contains a single string built with string concatenation
The description of your symptoms is therefore not consistent with your code, as you should have seen a single-line output string in all scenarios (printing directly to the screen / via Format-Table, sending to a file and type-ing that from cmd.exe).
[2] The platform-appropriate newline is reflected in [Environment]::NewLine, and it is "`r`n" (CRLF) on Windows, and just "`n" (LF) on Unix-like platforms (in PowerShell Core).
As using += recreates the array on every iteration I'd suggest to assign the output of a ForEach-Object with it's -Begin, -Process and -End sections to a variable also using a more common approach of the format operator.:
$Log = $users | ForEach-Object -Begin {
"Create Users process. Run started at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
} -Process {
"User {0} with NNN of {1}" -f $_.DisplayName,$_.ID
} -End {
"Completed at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
}
$Log | Set-Content "Log.txt"

Powershell: Delete Every n Files

I just imported a bunch of pictures, and realized that there's 3 copies of each pictures, but they're named sequentially.
Basically these three files are the same:
P5240901.dng
P5240902.dng
P5240903.dng
And that, for about 1600 pictures.
I was looking into writing a simple PowerShell script (I use Windows) that would look into the directory of these files, and keep 1 file out of three, just looping through a range of files.
I didn't find something that would deal with the 'P' character before my file, and I'm not familiar with PowerShell language.
Any ideas?
Thank you!
Assuming everything in the dir follows the naming convention & is in a set of 3 something like this should work:
$mydir = 'C:\path\to\files'
[int]$idx = 1
get-childitem $mydir|sort-object {$_.Name} |foreach-object{
if ($idx % 3 -ne 1){ #get the modulus
$_ |remove-item
}
$idx++
}
Try the following, which will keep only the 1st file in each group of files whose names are the same except for the last character before the filename extension, assuming that character is a digit (syntax assumes PSv3+):
'P5240901.dng', 'P5240902.dng', 'P5240903.dng', 'A1.dng', 'A2.dng', 'singleton.dng' |
Group-Object { $_ -replace '^(.+)\d\.', '$1' } |
? Count -gt 1 |
% { $_.Group[1..$($_.Group.Count)] }
yields:
P5240902.dng
P5240903.dng
A2.dng
Replace the sample input array with a call to Get-ChildItem -File, and prepend Remove-Item to $_.Group[1..$($_.Group.Count)] to perform actual deletion.
The above command uses a string array with input filenames, but the [System.IO.FileInfo] instances output by Get-ChildItem will effectively act the same in a string context: they will expand to their respective filenames.
The advantage of this solution is that it doesn't rely on input files appearing strictly in groups of 3:
Any group of input files sharing the same name except for a digit before the filename extension that has at least 2 members (and any number beyond that) will have every member but the 1st deleted.
Any other files are left untouched.
Explanation:
Group-Object { $_ -replace '^(.+)\d\.', '$1' }
effectively groups the input files by the portion of the filename they share (but only if they share everything but the last char. before the filename extension, and if that char. is a digit).
? Count -gt 1
only passes on those resulting groups that have at least 2 members.
% { $_.Group[1..$($_.Group.Count)] }
processes each group's files, except the 1st.
Update: Here's a variation prompted by the OP's later comments:
The following, given input filenames such as P5240901.dng, P5240902.dng, ..., P5240910.dng, P5240911.dng, ..., P5240990.dng, P5240991.dng, ..., P5240999.dng, will consider each group of 10 files a group (based on the tens place), and within each group only retain the 1st file:
1..99 | % { "P52409$('{0:00}' -f $_).dng" } |
Group-Object { $_ -replace '^(.+\d)\d\.', '$1' } |
? Count -gt 1 |
% { $_.Group[1..$($_.Group.Count)]}
yields:
# tens place of 0; skips ...01.dng
P5240902.dng
P5240903.dng
... # up to ...09.dng
# tens place of 1; skips ...10.dng
P5240911.dng
P5240912.dng
... # skips ...20.dng, ...30.dng, ...
# tens place of 9; skips ...90.dng
P5240991.dng
P5240992.dng
...
P5240999.dng
In order to only pass the files of interest to the command, replace the sample input array with
Get-ChildItem P52515[0-9][0-9].dng.

How do I modify elements in a Perl array inside a foreach loop?

My goal with this piece of code is to sanitize an array of elements (a list of URL's, some with special characters like %) so that I can eventually compare it to another file of URL's and output which ones match. The list of URL's is from a .csv file with the first field being the URL that I want (with some other entries that I skip over with a quick if() statement).
foreach my $var(#input_1) {
#Skip anything that doesn't start with http:
if ((/^[#U]/ ) || !(/^h/)) {
next;
}
#Split the .csv into the relevant field:
my #fields = split /\s?\|\s?/, $_;
$var = uri_unescape($fields[0]);
}
My delimiter is a | in the csv. In its current setup, and also when I change the $_ to $var, it only returns blank lines. When I remove the $var declaration at the beginning of the loop and use $_, it will output the URL's in the correct format. But in that case, how can I assign the output to the same element in the array? Would this require a second array to output the value to?
I'm relatively new to perl, so I'm sure there is some stuff that I'm missing. I have no clue at this moment why removing the $var at the foreach declaration breaks the parsing of the #fields line, but removing it and using $_ doesn't. Reading the perlsyn documentation did not help as much as I would have liked. Any help appreciated!
/^h/ is not bound to anything, so the match happens against $_. If you want to match $var, you have to bind it:
if ($var =~ /^[#U]/ || $var !~ /^h/) {
Using || with two matches could probably be incorporated into a single regular expression with an alternative:
next if $var =~ /^(?: [#U] | [^h] | $ )/x;
i.e. The line has to start with #, U, something else than h, or be empty.
You can populate a new array with the results by using push:
push #results, $var;
Also note that if your data can contain | quoted or escaped (or newlines etc.), you should use Text::CSV instead of split.

Resources