Powershell: Delete Every n Files - loops

I just imported a bunch of pictures, and realized that there's 3 copies of each pictures, but they're named sequentially.
Basically these three files are the same:
P5240901.dng
P5240902.dng
P5240903.dng
And that, for about 1600 pictures.
I was looking into writing a simple PowerShell script (I use Windows) that would look into the directory of these files, and keep 1 file out of three, just looping through a range of files.
I didn't find something that would deal with the 'P' character before my file, and I'm not familiar with PowerShell language.
Any ideas?
Thank you!

Assuming everything in the dir follows the naming convention & is in a set of 3 something like this should work:
$mydir = 'C:\path\to\files'
[int]$idx = 1
get-childitem $mydir|sort-object {$_.Name} |foreach-object{
if ($idx % 3 -ne 1){ #get the modulus
$_ |remove-item
}
$idx++
}

Try the following, which will keep only the 1st file in each group of files whose names are the same except for the last character before the filename extension, assuming that character is a digit (syntax assumes PSv3+):
'P5240901.dng', 'P5240902.dng', 'P5240903.dng', 'A1.dng', 'A2.dng', 'singleton.dng' |
Group-Object { $_ -replace '^(.+)\d\.', '$1' } |
? Count -gt 1 |
% { $_.Group[1..$($_.Group.Count)] }
yields:
P5240902.dng
P5240903.dng
A2.dng
Replace the sample input array with a call to Get-ChildItem -File, and prepend Remove-Item to $_.Group[1..$($_.Group.Count)] to perform actual deletion.
The above command uses a string array with input filenames, but the [System.IO.FileInfo] instances output by Get-ChildItem will effectively act the same in a string context: they will expand to their respective filenames.
The advantage of this solution is that it doesn't rely on input files appearing strictly in groups of 3:
Any group of input files sharing the same name except for a digit before the filename extension that has at least 2 members (and any number beyond that) will have every member but the 1st deleted.
Any other files are left untouched.
Explanation:
Group-Object { $_ -replace '^(.+)\d\.', '$1' }
effectively groups the input files by the portion of the filename they share (but only if they share everything but the last char. before the filename extension, and if that char. is a digit).
? Count -gt 1
only passes on those resulting groups that have at least 2 members.
% { $_.Group[1..$($_.Group.Count)] }
processes each group's files, except the 1st.
Update: Here's a variation prompted by the OP's later comments:
The following, given input filenames such as P5240901.dng, P5240902.dng, ..., P5240910.dng, P5240911.dng, ..., P5240990.dng, P5240991.dng, ..., P5240999.dng, will consider each group of 10 files a group (based on the tens place), and within each group only retain the 1st file:
1..99 | % { "P52409$('{0:00}' -f $_).dng" } |
Group-Object { $_ -replace '^(.+\d)\d\.', '$1' } |
? Count -gt 1 |
% { $_.Group[1..$($_.Group.Count)]}
yields:
# tens place of 0; skips ...01.dng
P5240902.dng
P5240903.dng
... # up to ...09.dng
# tens place of 1; skips ...10.dng
P5240911.dng
P5240912.dng
... # skips ...20.dng, ...30.dng, ...
# tens place of 9; skips ...90.dng
P5240991.dng
P5240992.dng
...
P5240999.dng
In order to only pass the files of interest to the command, replace the sample input array with
Get-ChildItem P52515[0-9][0-9].dng.

Related

How can I use PowerShell to copy a range of files where the file name is as sequence of numbers

How can I use PowerShell to copy a range of files where the file name is as sequence of numbers?
For example, say I have a bunch of files where the names are numbers starting at 23540987577 to 27495847547388. However I only want to copy files where the middle 5 numbers are between 43565 and 43769. I have made a few attempts, but it either copies everything, or errors out.
So far I have the following :
$START = read-host -prompt "Enter starting number"
$END = read-host -Prompt "Enter ending number"
$Files = Get-ChildItem
$Files = $Files.name
$i = 1
foreach ($i in $Files) {
if ($Files[$i] -ge "*$START*" -and $Files[$i] -le "*$END*") {
/
Copy-Item $Files[$i] .\pulled
$i++
}
else {
Write-Host "no"
}
}
I have a list of files where the file name is a large sequence of numbers. Within said sequence (some where towards the middle) is a transaction number. I need to find and copy a small subset of transaction numbers that are within a specific range.
If I am searching for said files manually in Windows Explorer I would have search for each number in the range as follows:
*43565*
*43566*
*43567*
*43568*
and so on...
I want to automate this process as it takes a long time to search for each transaction number with larger batches.
what about the following:
it's not regex but I dont see any advantage of using regex here:
I didn't use much Powershell yet, therefore I stick to Pseude Code, but you should be able to adapt this to Powershell script easily:
CopyFiles(int lowerBound, int upperBound)
{
foreach (file in fileList)
{
int filename = (int)file.filename.substring(1,X) // you have to know the length of your substring, maybe pass it also as parameter
if(filename >= lowerBound && filename <= upperBound)
{
move (file.filename, new location)
}
}
}

Can't write character array to file in Powershell

OK, Powershell may not be the best tool for the job but it's the only one available to me.
I have a bunch of 600K+ row .csv data files. Some of them have delimiter errors e.g. " in the middle of a text field or "" at the start of one. They are too big to edit (even in UltraEdit) and fix manually even if I wanted to which I don't!
Because the double-""-delimeter at the start of some text fields and rogue-"-delimiter in the middle of some text fields, I haven't used a header row to define the columns because these rows appear as if there is an extra column in them due to the extra delimiter.
I need to parse the file looking for "" instead of " at the start of a text-field and also to look for " in the middle of a text field and remove them.
I have managed to write the code to do this (after a fashion) by basically reading the whole file into an array, looping through it and adding output characters to an output array.
What I haven't managed to do is successfully write this output array to a file.
I have read every part of https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Utility/out-file?view=powershell-5.1 that seemed relevant. I've also trawled through about 10 similar questions on this site and attempted various code gleaned from them.
The output array prints perfectly to screen using a Write-Host but I can't get the data back into a file for love or money. I have a total of 1.5days Powershell experience so far! All suggestions gratefully received.
Here is my code to read/identify rogue delimiters (not pretty (at all), refer previous explanation of data and available technology constraints):
$ContentToCheck=get-content 'myfile.csv' | foreach { $_.ToCharArray()}
$ContentOutputArray=#()
for ($i = 0; $i -lt $ContentToCheck.count; $i++)
{
if (!($ContentToCheck[$i] -match '"')) {#not a quote
if (!($ContentToCheck[$i] -match ',')) {#not a comma i.e. other char that could be enclosed in ""
if ($ContentToCheck[$i-1] -match '"' ) {#check not rogue " delimiter in previous char allow for start of file exception i>1?
if (!($ContentToCheck[$i-2] -match ',') -and !($ContentToCheck[$i-3] -match '"')){
Write-Host 'Delimiter error' $i
$ContentOutputArray+= ''
}#endif not preceded by ",
}#endif"
else{#previous char not a " so move on
$ContentOutputArray+= $ContentToCheck[$i]
}
}#endifnotacomma
else
{#a comma, include it
$ContentOutputArray+= $ContentToCheck[$i]
}#endacomma
}#endifnotaquote
else
{#a quote so just append it to the output array
$ContentOutputArray+= $ContentToCheck[$i]
}#endaquote
}#endfor
So far so good, if inelegant. if I do a simple
Write-Host $ContentOutputArray
data displays nicely " 6 5 " , " 652 | | 999 " , " 99 " , " " , " 678 | | 1 " ..... furthermore when I check the size of the array (based on a cut-down version of one of the problem files)
$ContentOutputArray.count
I get 2507 character length of array. Happy out. However, then variously using:
$ContentOutputArray | Set-Content 'myfile_FIXED.csv'
creates blank file
$ContentOutputArray | out-file 'myfile_FIXED.csv' -encoding ASCII
creates blank file
$ContentOutputArray | export-csv 'myfile_FIXED.csv'
gives only '#TYPE System.Char' in file
$ContentOutputArray | Export-Csv 'myfile_FIXED.csv' -NoType
gives empty file
$ContentOutputArray >> 'myfile_FIXED.csv'
gives blanks separated by ,
What else can I try to write an array of characters to a flat file? It seems such a basic question but it has me stumped. Thanks for reading.
Convert (or cast) the char array to a string before exporting it.
(New-Object string (,$ContentOutputArray)) |Set-Content myfile_FIXED.csv

PowerShell regex to extract SID from filename

I have an array $vhdlist with contents similar to the following filenames:
UVHD-S-1-5-21-8746256374-654813465-374012747-4533.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-6175.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-8147.vhdx
UVHD-template.vhdx
I want to use a regex and be left with an array containing only SID portion of the filenames.
I am using the following:
$sids = foreach ($file in $vhdlist)
{
[regex]::split($file, '^UVHD-(?:([(\d)(\w)-]+)).vhdx$')
}
There are 2 problems with this: in the resulting array there are 3 blank lines for every SID; and the "template" filename matches (the resulting line in the output is just "template"). How can I get an array of SIDs as the output and not include the "template" line?
You seem to want to filter the list down to those filenames that contain an SID. Filtering is done with Where-Object (where for short); you don't need a loop.
An SID could be described as "S- and then a bunch of digits and dashes" for this simple case. That leaves us with ^UVHD-S-[\d-]*\.vhdx$ for the filename.
In combination we get:
$vhdlist | where { $_ -Match "^UVHD-S-[\d-]*\.vhdx$" }
When you don't really have an array of strings, but actually an array of files, use them directly.
dir C:\some\folder | where { $_.Name -Match "^UVHD-S-[\d-]*\.vhdx$" }
Or, possibly you can even make it as simple as:
dir C:\some\folder\UVHD-S-*.vhdx
EDIT
Extracting the SIDs from a list of strings can be thought as a combined transformation (for each element, extract the SID) and filter (remove non-matches) operation.
PowerShell's ForEach-Object cmdlet (foreach for short) works like map() in other languages. It takes every input element and returns a new value. In effect it transforms a list of input elements into output elements. Together with the -replace operator you can extract SIDs this way.
$vhdlist | foreach { $_ -replace ^(?:UVHD-(S-[\d-]*)\.vhdx|.*)$,"`$1" } | where { $_ -gt "" }
The regex back-reference for .NET languages is $1. The $ is a special character in PowerShell strings, so it needs to be escaped, except when there is no ambiguity. The backtick is the PS escape character. You can escape the $ in the regex as well, but there it's not necessary.
As a final step we use where to remove empty strings (i.e. non-matches). Doing it this way around means we only need to apply the regex once, instead of two times when filtering first and replacing second.
PowerShell operators can also work on lists directly. So the above could even be shortened:
$vhdlist -replace "^UVHD-(S-[\d-]*)\.vhdx$","`$1" | where { $_ -gt "" }
The shorter version only works on lists of actual strings or objects that produce the right thing when .ToString() is called on them.
Regex breakdown:
^ # start-of-string anchor
(?: # begin non-capturing group (either...)
UVHD- # 'UVHD-'
( # begin group 1
S-[\d-]* # 'S-' and however many digits and dashes
) # end group 1
\.vhdx # '.vhdx'
| # ...or...
.* # anything else
) # end non-capturing group
$ # end-of-string anchor

How to write an array to a file?

I try to write a short script what will count average value for CPU for last XY minutes.
I wrote something like that (just short overview). First part of the script just stored values in tmp file. From this values is count avg value:
$CPU= ........ Add-Content "myfile.txt" "$CPU"
$array=(Get-Content -Path myfile.txt);
$AVG=($array | Measure-Object -Average).average;
Then I set first-in first-out function:
if ($array.length -gt XY) {$array=($array[1..($array.Length-0)])>myfile.txt}.
When this condition is completed next execution write to script "strange" character and not number. Type command report "?" as last character in file instead number so average function don`t know work with it.
It doesn't work for PowerShell version 2. I don't have such issue in version 3.
"FIFO" snippet :
(get-content c:\temp\test.txt) |select -skip 1 | set-content c:\temp\test.txt

Unique Combos from powershell array - No duplicate combos

I'm trying to figure out the best way to get unique combinations from a powershell array. For instance, my array might be
#(B,C,D,E)
I would be hoping for an output like this :
B
C
D
E
B,C
B,D
B,E
C,D
C,E
D,E
B,C,D
C,D,E
B,C,D,E
I do not want re-arranged combos. If combo C,D exists already then I do not want combo D,C. It's redundant for my purposes.
I looked into the functions here : Get all combinations of an array
But they aren't what I want. I've been working on figuring this out myself, but have spent quite a bit of time without success. I thought I'd ask the question here so that if someone else already know I'm not wasting my time.
Thanks!
This is an adaptation from a solution for a C# class I took that asked this same question. For any set find all subsets, including the empty set.
function Get-Subsets ($a){
#uncomment following to ensure only unique inputs are parsed
#e.g. 'B','C','D','E','E' would become 'B','C','D','E'
#$a = $a | Select-Object -Unique
#create an array to store output
$l = #()
#for any set of length n the maximum number of subsets is 2^n
for ($i = 0; $i -lt [Math]::Pow(2,$a.Length); $i++)
{
#temporary array to hold output
[string[]]$out = New-Object string[] $a.length
#iterate through each element
for ($j = 0; $j -lt $a.Length; $j++)
{
#start at the end of the array take elements, work your way towards the front
if (($i -band (1 -shl ($a.Length - $j - 1))) -ne 0)
{
#store the subset in a temp array
$out[$j] = $a[$j]
}
}
#stick subset into an array
$l += -join $out
}
#group the subsets by length, iterate through them and sort
$l | Group-Object -Property Length | %{$_.Group | sort}
}
Use like so:
PS C:>Get-Subsets #('b','c','d','e')
b
c
d
e
bc
bd
be
cd
ce
de
bcd
bce
bde
cde
bcde
Note that computational costs go up exponentially with the length of the input array.
Elements SecondstoComplete
15 46.3488228
14 13.4836299
13 3.6316713
12 1.2542701
11 0.4472637
10 0.1942997
9 0.0867832
My tired attempt at this. I did manage to get it to produce the expected results but how it does it is not as elegant. Uses a recursive functionality.
Function Get-Permutations{
Param(
$theInput
)
$theInput | ForEach-Object{
$element = $_
$sansElement = ($theInput | Where-Object{$_ -ne $element})
If($sansElement.Count -gt 1){
# Build a collection of permutations using the remaining elements that were not isolated in this pass.
# Use the single element since it is a valid permutation
$perms = ,$element
For($elementIndex = 0;$elementIndex -le ($sansElement.Count - 1);$elementIndex++){
$perms += ,#(,$element + $sansElement[0..$elementIndex] | sort-object)
}
# For loop does not send to output properly so that is the purpose of collecting the results of this pass in $perms
$perms
# If there are more than 2 elements in $sansElement then we need to be sure they are accounted for
If($sansElement -gt 2){Get-Permutations $sansElement}
}
}
}
Get-Permutations B,C,D,E | %{$_ -join ","} | Sort-Object -Unique
I hope I can explain myself clearly....So each pass of the function will take an array. Each individual element of that array will be isolated from the rest of the array which is represented by the variables $element and $sansElement.
Using those variables we build individual and progressively larger arrays composing of those elements. Let this example show using the array 1,2,3,4
1
1,2
1,2,3
1,2,3,4
The above is done for each "number"
2
2,1
2,1,3
2,1,3,4
and so forth. If the returned array contains more that two elements (1,2 would be the same as 2,1 in your example so we don't care about pairs beyond one match) we would take that array and run it through the same function.
The real issue is that the logic here (I know this might be hard to swallow) creates several duplicates. I suppose you could create a hashtable instead which I will explore but it does not remove the logic flaw.
Regardless of me beating myself up as long as you don't have thousands of elements the process would still produce results.
Get-Permutations would return and array of arrays. PowerShell would display that one element per line. You asked for comma delimited output which is where -join comes in. Sort-Object -Unique takes those sorted string an discards the duplicates.
Sample Output
B
B,C
B,C,D
B,C,D,E
B,C,E #< Missing from your example output.
B,D
B,D,E #< Missing from your example output.
B,E
C
C,D
C,D,E
C,E
D
E

Resources