Use txt file as list in PowerShell array/variable - arrays

I've got a script that searches for a string ("End program" in this case). It then goes through each file within the folder and outputs any files not containing the string.
It works perfectly when the phrase is hard coded, but I want to make it more dynamic by creating a text file to hold the string. In the future, I want to be able to add to the list of string in the text file. I can't find this online anywhere, so any help is appreciated.
Current code:
$Folder = "\\test path"
$Files = Get-ChildItem $Folder -Filter "*.log" |
? {$_.LastWriteTime -gt (Get-Date).AddDays(-31)}
# String to search for within the file
$SearchTerm = "*End program*"
foreach ($File in $Files) {
$Text = Get-Content "$Folder\$File" | select -Last 1
if ($Text | WHERE {$Text -inotlike $SearchTerm}) {
$Arr += $File
}
}
if ($Arr.Count -eq 0) {
break
}
This is a simplified version of the code displaying only the problematic area. I'd like to put "End program" and another string "End" in a text file.
The following is what the contents of the file look like:
*End program*,*Start*

If you want to check whether a file contains (or doesn't contain) a number of given terms you're better off using a regular expression. Read the terms from a file, escape them, and join them to an alternation:
$terms = Get-Content 'C:\path\to\terms.txt' |
ForEach-Object { [regex]::Escape($_) }
$pattern = $terms -join '|'
Each term in the file should be in a separate line with no leading or trailing wildcard characters. Like this:
End program
Start
With that you can check if the files in a folder don't contain any of the terms like this:
Get-ChildItem $folder | Where-Object {
-not $_.PSIsContainer -and
(Get-Content $_.FullName | Select-Object -Last 1) -notmatch $pattern
}
If you want to check the entire files instead of just their last line change
Get-Content $_.FullName | Select-Object -Last 1
to
Get-Content $_.FullName | Out-String

Related

Powershell Set-Content corrupts file formatting after -Replace is used

Bond.out file example (looking to replace what is highlighted):
Out.csv file (data to be used):
Code:
#set paths up
$filepath= 'C:\folder\path\bond.out'
$filepath2= 'C:\folder\path\temp.txt'
$Ticklist='C:\folder\path\tick.txt'
$ratelist='C:\folder\path\rate.txt'
#Import needed data from an excel file which creates and array
$csv = Import-CSV C:\folder\path\RateIDTable.csv | Where { $_.'Rate' -ne "" } | Export-Csv C:\folder\path\out.csv -NoTypeInformation
$bond = Import-CSV C:\folder\path\out.csv | select -Property TickerID, Rate
#Put array from Excel file into two text files
$Tick = $bond | foreach-object {$_.TickerID} | set-content $Ticklist
$replace = $bond | foreach-object {$_.rate} | set-content $Ratelist
#Create two separate arrays from the new text files
$Tickdata = (Get-content $Ticklist ) -join ','
foreach ($t in $Tickdata)
{
$t = $t -split(",")
$First = $t[0]}
$Ratedata = (Get-content $Ratelist ) -join ','
foreach ($r in $Ratedata)
{
$r = $r -split(",")
$First = $r[0]}
#Get main file to search (bond.out) and search for the word that is in the first line from "t" array file
$data = Select-String $filepath -pattern $t[0] | Select-Object -ExpandProperty Line
$data
#Once found, split the line, replace the rate on the 3rd line with the rate in the first line from the "r" array file, the put the line back to together
$split=$data.split("{|}")
$split[3]=$r[0]
$join = $split -join "|"
$join
#Put the updated line back into the "bond.out" file from whence it came
(get-content $filepath) -replace($data,$join) | set-content $filepath
#computer says no :(
Output:
As you can see, it actually replaces the rate and puts it all back like I need it to. But that last line doesn't seem to work. Instead I get the file back like so:
It appears as though it is repeating the same line from the $join parameter and adding letters to the beginning of each iteration.
I believe it has something to do with the '|' at the end of the line, and remember reading something about marking the beginning and end of lines some time ago, but can't find it anywhere.
Here's an idea. Instead of using regular expressions ...
The Import-Csv command has a -Delimiter parameter. Can you just import bond.out as a "CSV" (but with a pipe delimiter), and update it just like you would a CSV file?
Pseudo-code
### Convert bond.out to objects
$BondOut = Import-Csv -Delimiter '|' -Path $FilePath
### Get the line you want to update
$LineToUpdate = $BondOut.Where({ $PSItem.TickerID -eq 'BBG0019K2QZ5' })
### Update the Rate property from your source (out.csv)
$LineToUpdate.Rate = $SomeSource.Rate
### Export the modified objects to a new bond.out.modified file
$BondOut | Export-Csv -Delimiter '|' -Path 'bond.out.modified' -NoTypeInformation
As per PetSerAI's clue:
#set paths up
$filepath= 'C:\folder\path\bond.out'
$filepath2= 'C:\folder\path\temp.txt'
$Ticklist='C:\folder\path\tick.txt'
$ratelist='C:\folder\path\rate.txt'
#Import needed data from an excel file which creates and array
$csv = Import-CSV C:\folder\path\RateIDTable.csv | Where { $_.'Rate' -ne "" } | Export-Csv C:\folder\path\out.csv -NoTypeInformation
$bond = Import-CSV C:\folder\path\out.csv | select -Property TickerID, Rate
#Put array from Excel file into two text files
$Tick = $bond | foreach-object {$_.TickerID} | set-content $Ticklist
$replace = $bond | foreach-object {$_.rate} | set-content $Ratelist
#Create two separate arrays from the new text files
$Tickdata = (Get-content $Ticklist ) -join ','
foreach ($t in $Tickdata)
{
$t = $t -split(",")
}
$Ratedata = (Get-content $Ratelist ) -join ','
foreach ($r in $Ratedata)
{
$r = $r -split(",")
}
#Get main file to search (bond.out) and search for the word that is in the first line from "t" array file
###Replace all pipes with a comma
(get-content $filepath) -replace('\|', ',') | set-content $filepath
$data = Select-String $filepath -pattern $t[0] | Select-Object -ExpandProperty Line
$data
#Once found, split the line, replace the rate on the 3rd line with the rate in the first line from the "r" array file, the put the line back to together
$split=$data.split("{,}")
$split[3]=$r[0]
$join = $split -join ","
#Put the updated line back into the "bond.out" file from whence it came
###change all commas back to pipes
(get-content $filepath) -replace($data,$j) | set-content $filepath
(get-content $filepath) -replace(',', '|') | set-content $filepath
#computer says yay :D

Recursing Through Multiple Text Files with References

I have hundreds of text files in a folder which can often reference each other, and go serveral levels deep. Not sure if I am explaining this well, so I will explain with an example.
Let's say folder "A" contains 500 .txt files. The first one could be called A.txt and somewhere in there it mentions B.txt, which in turn mentions C.txt and so on. I believe the number of levels down is no more than 10.
Now, I want to find a certain text strings which relate to A.txt by programmitically going through that file, then if it sees references to other .txt files go through them as well and so on. The resulting output would be something like A_out.txt which contains everything it found based on a regex.
I started out with this using Powershell but am now a little stuck:
$files = Get-ChildItem "C:\TEST\" -Filter *.txt
$regex = ‘PCB.*;’
for ($i=0; $i -lt $files.Count; $i++) {
$infile = $files[$i].FullName
$outfile = $files[$i].BaseName + "_out.txt"
select-string $infile -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $outfile
}
It goes through every .txt file and outputs everything that matches the PCB.*; expression to its corresponding _out.txt file.
I have absolutely no idea how to now expand this to include references to the other files. I'm not even sure if this is possible in PowerShell or whether I need to use another language to achieve what I want.
I could get some office monkey's to do all this manually but if this is relatively simple to code then it would save us a lot of time. Any help would be greatly appreciated :)
/Edit
Whilst running through this in my head, I thought I could build up an array for every time another one of the files is mentioned, and then repeat the process for those as well. However, back to my original problem, I have no idea how I would go about this.
/Edit 2:
Sorry, had been away for a few days and am only just picking this up. I have been using what I've learnt from this question and a few others to come up with the following:
function Get-FileReference
{
Param($FileName, $OutputFileName='')
if ($OutputFileName -eq '')
{
Get-FileReference $FileName ($FileName -replace '.xml$', '_out.xml')
}
else
{
Select-String $FileName -Pattern 'BusinessObject.[^"rns][w.]*' -AllMatches | % { $_.Matches } | % { $_.Value } | Add-Content $OutputFileName
Set-Location C:\TEST
$References = (Select-String -Pattern '(?<=resid=")d+' -AllMatches -path $FileName | % { $_.Matches } | % { $_.Value })
Write "SC References: $References" | Out-File OUTPUT.txt -Append
foreach ($Ref in $References)
{
$count
Write "$count" | Out-File OUTPUT.txt -Append
$count++
Write "SC Reference: $Ref" | Out-File OUTPUT.txt -Append
$xml = [xml](Get-Content 'C:\TEST\package.xml')
$res = $xml.SelectSingleNode('//res[#id = $Ref]/child::resver[last()]')
$resource = $res.id + ".xml"
Write "File to Check $resource" | Out-File OUTPUT.txt -Append
Get-FileReference $resource $OutputFileName
}
}
}
$files = gci "C:\TEST" *.xml
ForEach ($file in $files) {
Get-FileReference $file.FullName
}
Following my original question, I realised that this was a little bit more extensive than I originally thought and therefore had to tinker.
These are the noteable points:
All the parent files are .xml and code that matches on
"BusinessObject" etc works as expected.
The references to other
files are not simply .txt but require a pattern match of
'(?<=resid=")d+'.
This pattern match needs to be cross referenced with another file package.xml and based on the value
it returns, the file it next needs to look into is [newname].xml
As before, those child .xml files could reference some of the
other .xml files
The code I have pasted above seems to be getting stuck in endless loops (hence why I have debugging in there at the moment) and it is not liking the use of $Ref in:
$res = $xml.SelectSingleNode('//res[#id = $Ref]/child::resver[last()]')
That results in the following error:
Exception calling "SelectSingleNode" with "1" argument(s): "Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function."
Since there could be hundreds of files it dies when it gets over 1000+.
A recursive function which tries to do what you want.
function Get-FileReference
{
Param($FileName, $OutputFileName='')
if ($OutputFileName -eq '')
{
Get-FileReference $FileName ($FileName -replace '\.txt$', '_out.txt')
}
else
{
Select-String -Pattern 'PCB.*;' -Path $FileName -AllMatches | Add-Content $OutputFileName
$References = (Select-String -Pattern '^.*\.txt' -AllMatches -path $FileName).Matches.Value
foreach ($Ref in $References)
{
Get-FileReference $Ref $OutputFileName
}
}
}
$files = gci *.txt
ForEach ($file in $files) { Get-FileReference $file.FullName }
It takes two parameters - a filename and an output filename. If called without an output filename, it assumes it's at the top of a new recursion tree and generates an output filename to append to.
If called with an output filename (i.e. by itself) it searches for PCB patterns, appends to the output, then calls itself on any file references, with the same output filename.
Assuming that file references are lines on their own with no spaces xyz.txt.

How to compare substrings within a folder and an array with PowerShell?

I have a folder with 100,000 files (pictures) which are named by their UPC code (8 to 14 numerical digits) followed by an underscore and other digits:
000012345678_00_1
And I have a list of 20,000 unique UPC codes in a word document (separated by commas) which should match a fifth of these pictures (I also have this list in an Excel table).
000000000000, 000000000001, 000000000011
What I'm trying to do, is to find matches between my array (the 20,000 elements list) and files in my folder so as to extract only those 20,000 pictures from the folder.
I've started by cutting the file name up to the "__" so as to get only the relevant part of the file name:
$FName = ($File -split '_')[0]
To make things harder, I also need to add a wild card " * " to the elements in the array since some extra "0" at the beginning of the files name might have been added and are not present in our array. For example, this UPC in the array "05713901" refers to this file name "00005713901_00.png "; so to find matches I will have to use the "like" operator.
Then when I've found those matches, I'll just have to use Move-Item to a new folder or subfolder.
This is what I've started to code without any result:
$Directory = "C:path_to_my_folder";
$AllFiles = Get-ChildItem $Directory
$FileNames = New-Object System.Collections.ArrayList;
foreach($File in $AllFiles)
{
$FName = ($File -split '_')[0]
$FileNames.Add($FName)
}
$Upc = Get-Content C:\path_to_my_word.docx
Compare-Object $FileNames $Upc
You can't read a docx-file using Get-Content, and even if it did, Compare-Object wouldn't work because your word file was a list over UPC-codes separated by a commas (a single string in powershell), while $FileNames is an array (multiple-objects).
Copy the UPC-codes from excel to notepad so you get a simple textfile with one code per line similar to this sample.
UPC.txt - Content:
000000000000
000000000001
000000000011
....
It would take a long time to run 100.000 files through a 20.000 -like test-loop each. I would create a regex-pattern that looks for either of the codes with an underscore at the end. Ex:
$Directory = "C:\path_to_my_folder";
$AllFiles = Get-ChildItem $Directory
#Generate regex that matches 00001_ or 00002_ etc. Trimming leading and trailing whitespace just to be safe.
$regex = ((Get-Content -Path "c:\UPC.txt") | ForEach-Object { "$($_.Trim())_" }) -join '|'
#Get files that match
$AllFiles | Where-Object { $_.Name -match $regex } | ForEach-Object {
#Do something, ex. Move file.
Move-Item -Path $_.FullName -Dest C:\Destination
}
Or simply
$AllFiles | Where-Object { $_.Name -match $regex } | Move-Item -Destination "C:\Destination"
Save your UPC codes as a plain text file. As Frode F. suggested, copying them from Excel to Notepad is probably the easiest way to do it. Save that list. Then we will load that list into PowerShell, and for each file we will split at the underscore like you did, and trim any leading zeros, then check if it is in the list of known codes. Move any files that are in the list of known UPCs with Move-Item
#Import Known UPC List
$UPCList = Get-Content C:\Path\To\UPCList.txt
#Remove Leading Zeros From List
$UPCList = $UPCList | ForEach{$_.TrimStart('0')}
$Directory = "C:path_to_my_folder"
Get-ChildItem $Directory | Where{$_.Name.Split('_')[0].TrimStart('0') -in $UPCList} | Move-Item -Dest C:\Destination

Script/tool to delete specified filename

I need a batch file /script/tool to delete specified files in folder.
I have a folder with a lot of .xml files. It can contain files named difference of only a few characters (indicating the date).
aa_bb_000000001_2015_9_1.xml
aa_bb_000000001_2015_9_15.xml
aa_bb_000000001_2015_10_1.xml
aa_bb_000000002_2015_5_5.xml
aa_bb_000000002_2015_8_14.xml
aa_bb_000000002_2015_10_1.xml
aa_bb_000000005_2015_7_7.xml
.
.
The length of this part is 15 string
aa_bb_000000001
This part represents a date
2015_10_1
I need to delete all the files that part of the name with a date is earliest.
As a result batch should stay only files:
aa_bb_000000001_2015_10_1.xml
aa_bb_000000002_2015_10_1.xml
aa_bb_000000005_2015_7_7.xml
.
.
Here's one solution that's fairly short. To understand how the code works, it would be best to focus on what the Group-Object command does, what regular expressions are, and how they interact with the -match operator:
$Groups = Get-ChildItem "C:\XMLFiles\*.xml" | Group-Object {$_.Name.Substring(0, 15)}
$FilesToKeep = #{}
foreach ($Group in $Groups) {
$MaxDate = "00000000"
foreach ($FileInfo in $Group.Group) {
$FileInfo.name -match "(\d{4})_(\d{1,2})_(\d{1,2}).xml$" | Out-Null
$Date = $Matches[1]+([int]$Matches[2]).ToString("00")+([int]$Matches[3]).ToString("00")
if ($Date -gt $MaxDate) {
$MaxDate = $Date
$FilesToKeep[$Group.Name] = $FileInfo.FullName
}
}
}
Get-ChildItem "C:\XMLFiles\*.xml" | Where-Object {-not $FilesToKeep.ContainsValue($_.FullName)} | Remove-Item

Adding row of data to an array via iterator

I am iterating through a directory full of sub directories, looking for the newest file at each level.
The code below does this, but I need to be able to add each line/loop of the iterator to an array so that at the end I can output all the data in tabular format for use in Excel.
Any advice on how I can do this?
$arr = get-childItem -Path "\\network location\directory" | select FullName
$res = #()
foreach($fp in $arr)
{
get-childItem -Path $fp.FullName | sort LastWriteTime | select -last 1 Directory, FullName, Name, LastWriteTime
}
Here's a one-liner for you, split onto multiple lines for readability with the backtick escape character. You can copy paste this and it will run as is. The csv file will be created in the folder where you run this from.
dir -rec -directory | `
foreach {
dir $_.fullname -file | `
sort -Descending lastwritetime | `
select -first 1
} | `
export-csv newestfiles.csv
dir is an alias for get-childitem. foreach is an alias for foreach-object. %, gci and ls are even shorter aliases for get-childitem. Note that I am avoiding storing things in arrays, as this is doubling the work required. There is no need to enumerate the folders, and then enumerate the array afterwards as two separate operations.
Hope this helps.
If I understand you correctly, you just need to pipe the results into $res. So adding | %{$res += $_} should do the trick
$arr = get-childItem -Path "\\network location\directory" | select FullName
$res = #()
foreach($fp in $arr)
{
get-childItem -Path $fp.FullName | sort LastWriteTime | select -last 1 Directory, FullName, Name, LastWriteTime | % {$res += $_}
}
$res | % {write-host $_}

Resources