Powershell read file for filename and path - file

I have the following script which runs on .zip files in a directory which have a whole directory structure with many files. These files then have 7zip run on them to extract them and then .eml is added to the extracted file.
& "c:\program files\7-zip\7z.exe" x c:\TestZip -r -oC:\TestRestore 2> c:\TestLog\ziplog.txt
& "c:\program files\7-zip\7z.exe" x c:\TestRestore -r -aos -oc:\TestExtract 2> c:\TestLog\sevenzip.txt
gci -path "c:\TestExtract" -file | rename-item -newname {$PSItem.name + ".eml"}
My problem is that out of these files sometimes the final extraction cannot be done by 7zip as it does not see it as an archive. I have found that these particular files if I just put .eml on them they are accessible as emails. So when the archive fails to extract I write the output to the sevenzip.txt file.
What I need help with is how do I read this file to get the filenames and place them in a directory so I can add the .eml extension.
An example of the output in the sevenzip.txt file is as follows
ERROR: c:\TestRestore\0\0\54\3925ccb78f80d28b7569b6759554d.0_4011
Can not open the file as archive
ERROR: c:\TestRestore\0\0\5b\6fa7acb219dec5d9e55d4eadd6eb1.0_3958
Can not open the file as archive
Any help would be greatly appreciated on how to do this.
Sorry for all the comments but I am working on this
$SourceFile = 'c:\testlog\sevenzip.txt'
$DestinationFile = 'c:\testlog\testlogextractnew.txt'
$Pattern = 'c:\\TestRestore\\' (Get-Content $SourceFile) |
% {if ($_ -match $Pattern){$_}} |
Set-Content $DestinationFile (Get-Content $DestinationFile).replace('ERROR: ', '') |
Set-Content $DestinationFile (Get-Content$DestinationFile).replace('7z.exe : ', '') |
Set-Content $DestinationFile

If there are no other errors, then you can pick those filenames out of the file with a regex pattern match:
$filesToFix = Select-String -Pattern "ERROR: (.*)" -LiteralPath 'c:\testlog\sevenzip.txt' |
ForEach-Object {$_.Matches.Groups[1].Value}
And probably rename them with
$filesToFix | ForEach-Object {
Rename-Item -LiteralPath $_ -NewName {"$_.eml"}
}
If there might be other errors and you only want these, you'll need more work on the matching:
$fileContent = Get-Content -LiteralPath 'c:\testlog\sevenzip.txt' -Raw
$filesToFix = [regex]::Matches($fileContent,
"ERROR: (.*)\nCan not open the file as archive",
'Multiline') |
ForEach-Object {$_.Groups[1].Value}

This is what I ended going with it is in the comments because I didn't know how to format but have worked it out now so hopefully this makes it easier to understand in case it is useful.
This section creates a list of all the files that failed to extract with their directory location. For use in next section.
$SourceFile = 'c:\testlog\sevenzip.txt' $DestinationFile = c:\testlog\testlogextractnew.txt'
$Pattern = 'c:\\TestRestore\\'
(Get-Content $SourceFile) | % {if ($_ -match $Pattern){$_}} | Set-Content $DestinationFile
(Get-Content $DestinationFile).replace('ERROR: ', '') | Set-Content $DestinationFile
(Get-Content $DestinationFile).replace('7z.exe : ', '') | Set-Content $DestinationFile
This is just some result checking
Get-Content C:\testlog\testlogextractnew.txt | Measure-Object –Line > c:\testlog\numberoffilesthatfailed.txt
This copies all files listed in the txt file to another directory
c:\testlog\testlogextractnew.txt | copy-item -destination "c:\TestCopy"
This renames the current extensions to .eml
gci -path "C:\TestCopy" -file | rename-item -newname { [io.path]::changeextension($_.name, "eml")}

Related

Powershell - Iterate files in a directory and split each file into 80% and 20%

I have large tsv files (>1gb each) in a directory and I need to split each file into 80/20 split. per my limited knowledge on power shell I did below but its hell slow. I know I can do this in milliseconds with cygwin /bash but I need to automate this process through batch files. I am sure there is better and faster solution to this.
$DataSourceFolder="D:\Data"
$files = Get-ChildItem "$DataSourceFolder" -Filter *".tsv"
foreach ($file in $files)
{
$outputTrainfile="$DataSourceFolder\partitions\"+ $file.BaseName + "-train.tsv"
$outputTestfile="$DataSourceFolder\partitions\"+ $file.BaseName + "-test.tsv"
$filepath = "$DataSourceFolder\"+ $file
# Get number of rows in the file
Get-Content $filepath | Measure-Object | ForEach-Object { $sourcelinecount = $_.Count }
# Get top and tail count to be fetched from source file
$headlinecount = ($sourcelinecount * 80) /100
$taillinecount = $sourcelinecount - $headlinecount
# Create the files
New-Item -ItemType file $outputTrainfile -force
New-Item -ItemType file $outputTestfile -force
#set content to the files
Get-Content $filepath -TotalCount $headlinecount | Set-Content $outputTrainfile
Get-Content $filepath -Tail $taillinecount | Set-Content $outputTestfile
}
Sorry late to post the answer : Hopefully it will save efforts for others:
I used bash.exe to split the files from power shell. Fast and Furious.
Create a bash file and call it from powershell to split the files in desired partitions
Bash File: Ex: name it as "Partition.sh"
foldername=$1
filenamePrefix=$2
$echo $foldername
$echo $filenamePrefix
for filename in $foldername/$filenamePrefix*.tsv
do
$echo "Partitioning the $filename"
cat $filename | shuf > tmp
lines=$(wc -l tmp | cut -d' ' -f1)
$echo "Read file successfully"
head -n$(echo $lines*0.8/1 | bc) tmp > $filename.train.tsv
tail -n$(echo $lines*0.2/1-1 | bc) tmp > tmp1 > $filename.test.tsv
rm tmp tmp1
done
Call from powerhshell:
bash.exe /mnt/c/Partition.sh /mnt/c/trainingData/ "FilePrefix"

Use txt file as list in PowerShell array/variable

I've got a script that searches for a string ("End program" in this case). It then goes through each file within the folder and outputs any files not containing the string.
It works perfectly when the phrase is hard coded, but I want to make it more dynamic by creating a text file to hold the string. In the future, I want to be able to add to the list of string in the text file. I can't find this online anywhere, so any help is appreciated.
Current code:
$Folder = "\\test path"
$Files = Get-ChildItem $Folder -Filter "*.log" |
? {$_.LastWriteTime -gt (Get-Date).AddDays(-31)}
# String to search for within the file
$SearchTerm = "*End program*"
foreach ($File in $Files) {
$Text = Get-Content "$Folder\$File" | select -Last 1
if ($Text | WHERE {$Text -inotlike $SearchTerm}) {
$Arr += $File
}
}
if ($Arr.Count -eq 0) {
break
}
This is a simplified version of the code displaying only the problematic area. I'd like to put "End program" and another string "End" in a text file.
The following is what the contents of the file look like:
*End program*,*Start*
If you want to check whether a file contains (or doesn't contain) a number of given terms you're better off using a regular expression. Read the terms from a file, escape them, and join them to an alternation:
$terms = Get-Content 'C:\path\to\terms.txt' |
ForEach-Object { [regex]::Escape($_) }
$pattern = $terms -join '|'
Each term in the file should be in a separate line with no leading or trailing wildcard characters. Like this:
End program
Start
With that you can check if the files in a folder don't contain any of the terms like this:
Get-ChildItem $folder | Where-Object {
-not $_.PSIsContainer -and
(Get-Content $_.FullName | Select-Object -Last 1) -notmatch $pattern
}
If you want to check the entire files instead of just their last line change
Get-Content $_.FullName | Select-Object -Last 1
to
Get-Content $_.FullName | Out-String

Loop Through Subfolders And Combine Text files Within - Output Combined File To a Folder

I have a directory of subfolders. Each one contains text files within it. I am trying to combine the files found in each subfolder.
Example:
SubFolder 1 → a.txt + b.txt + c.txt → SubFolder1Merged.txt
SubFolder 2 → x.txt + y.txt + z.txt → SubFolder2Merged.txt
I have referenced this thread.
This is what I have so far:
$startingDir = "C:\Users\WP\Desktop\TextFiles"
function CombineLogs {
param([string]$startingDir)
dir $startingDir -Filter *.txt | Get-Content |
Out-File (Join-Path $startingDir COMBINED.txt)
dir $startingDir | ?{ $_.PsIsContainer } | %{ CombineLogs $_.FullName }
}
CombineLogs 'C:\Users\WP\Desktop\CombinedTextFiles' #output the combined text files here
I get a combined.txt generated in CombinedTextFiles - but not individual files merged.
Also the file is empty.
I simply want to loop through each subfolder, merge the text files within each folder, then output to my CombinedTextfiles Folder.
function CombineLogs
{
param([string] $startingDir)
$outputFile = (Split-Path $startingDir -Leaf) + "COMBINED.txt"
dir $startingDir -Filter *.txt |
Get-Content |
Out-File (Join-Path $outputDir $outputFile)
dir $startingDir |?{ $_.PsIsContainer } | %{ CombineLogs $_.FullName }
}
$outputDir ='C:\Users\WP\Desktop\CombinedTextFiles' # output the combined text files here
CombineLogs "C:\Users\WP\Desktop\TextFiles"
Above code snippet would solve TextFilesCOMBINED.txt and NewCOMBINED.txt however does not solve ABCCOMBINED.txt nor xyzCOMBINED.txt in next scenario:
C:\Users\WP\Desktop\TextFiles\ABC\ABC
C:\Users\WP\Desktop\TextFiles\ABC\xyz\ABC
C:\Users\WP\Desktop\TextFiles\New
C:\Users\WP\Desktop\TextFiles\xyz\ABC
Recursion can be tricky if you don't know how to handle it. And in this case you don't need to implement recursion yourself anyway. Just let PowerShell do the heavy lifting for you:
$startingDir = 'C:\Users\WP\Desktop\TextFiles'
$combinedDir = 'C:\Users\WP\Desktop\CombinedTextFiles'
Get-ChildItem $startingDir -Recurse | Where-Object {
$txtfiles = Join-Path $_.FullName '*.txt'
$_.PSIsContainer -and (Test-Path $txtfiles)
} | ForEach-Object {
$merged = Join-Path $combinedDir ($_.Name + '_Merged.txt')
Get-Content $txtfiles | Set-Content $merged
}

Zipping older files with powershell

I have the below Powershell script which tries to archive the logs.
1st step is to move all the files that contain Spotfire.Dxp...* string the ProjectLogsDir directory.
2nd step is identify the Spotfire.Dxp...* files in RotatedLogsDir directory that are older than 60 days and put them in a zip file together in a ArchivedLogsDir directory
3rd step is to delete zipped file older than 120 days.
here 2nd step function isn't working with error
E:\TIBCO\logsArchival\rotatedDir\C0005749_2014-05-09-12-53-47_Spotfire.Dxp.Web.231.log E:\TIBCO\logsArchival\rotatedDir\C0005749_2014-05-09-12-53-47_Spotfire.Dxp.Web.232.log: WARNING: The filename, directory name, or volume label syntax is incorrect.
$sysname=$env:COMPUTERNAME
$Date = Get-Date
$Now = Get-Date -format yyyy-MM-dd-HH-mm-ss
$host_date=$sysname +"_"+ $Now
$RotateDays = "60"
$ArchiveDays="120"
$ProjectLogsDir = "E:\TIBCO\*\6.0.0\Logfiles"
$RotatedLogsDir = "E:\TIBCO\logsArchival\rotatedDir"
$ArchivedLogsDir= "E:\TIBCO\logsArchival\archiveDir"
$psLogsDir= "E:\TIBCO\logsArchival\shLogsDir"
$LastRotate = $Date.AddDays(-$RotateDays)
$LastArchive = $Date.AddDays(-$ArchiveDays)
$RenameLogFiles = Get-Childitem $ProjectLogsDir -include Spotfire.Dxp.*.*.* -exclude spotfire.Dxp.web.KeepAlive.* -Recurse
$RenameLogFiles
$RenameLogFiles | Where-Object {!$_.PSIsContainer} | Rename-Item -NewName { $host_date +"_"+ $_.Name.Replace(' ','_') };
$RotatedLogFiles = Get-Childitem $ProjectLogsDir -include *_Spotfire.Dxp.*.*.* -Recurse
$RotatedLogFiles
$RotatedLogFiles | move-item -destination "$RotatedLogsDir"
$ZippedLogFiles = Get-Childitem $RotatedLogsDir -include *_Spotfire.Dxp.*.*.* -Recurse | Where {$_.LastWriteTime -le "$LastRotate"}
$ZippedLogFiles
function create-7zip([String] $aDirectory, [String] $aZipfile) {
[string]$pathToZipExe = "C:\Program Files\7-zip\7z.exe";
[Array]$arguments = "a", "-tzip", "$aZipfile", "$aDirectory";
& $pathToZipExe $arguments;
}
create-7zip "$ZippedLogFiles" "$ArchivedLogsDir\$host_date.zip"
$ZippedLogFiles | Remove-Item -Force
$DeleteZippedFiles = Get-Childitem $ArchivedLogsDir\*.zip -Recurse | Where {$_.LastWriteTime -le "$LastArchive"}
$DeleteZippedFiles
$DeleteZippedFiles | Remove-Item -Force
$DeletePsFiles = Get-Childitem $psLogsDir\*.log -Recurse | Where {$_.LastWriteTime -le "$LastRotate"}
$DeletePsFiles
$DeletePsFiles | Remove-Item -Force
Please provide the help here to get the files zipped.
The issue is that you are calling your 7-Zip function incorrectly. Look at what the function takes:
function create-7zip([String] $aDirectory, [String] $aZipfile) {
It has 2 strings as parameters. Then look at what you are calling it with:
create-7zip "$ZippedLogFiles" "$ArchivedLogsDir\$host_date.zip"
"$ZippedLogFiles" was defined just before this by running $ZippedLogFiles = Get-Childitem $RotatedLogsDir with a few parameters to filter the results. So that right there is an array of FileInfo objects... not a string. That's the issue, is that you are not calling the function correctly.
What you really want to include is "$RotatedLogsDir\*_Spotfire.Dxp*.*" so try calling it with that instead of "$ZippedLogFiles".
Edit: Comment moved to answer for zipping only files over 60 days.
You can use built in .Net calls to archive things and not have to use 7-Zip. Even easier in my opinion would be to install the PowerShell Community Extensions and use their Write-Zip cmdlet like this:
$ZippedLogFiles = Get-Childitem $RotatedLogsDir -include *_Spotfire.Dxp.*.*.* -Recurse | Where {$_.LastWriteTime -le "$LastRotate"}
$ZippedLogFiles
$ZippedLogFiles | Write-Zip "$ArchivedLogsDir\$host_date.zip"

How to delete empty subfolders with PowerShell?

I have a share that is a "junk drawer" for end-users. They are able to create folders and subfolders as they see fit. I need to implement a script to delete files created more than 31 days old.
I have that started with Powershell. I need to follow up the file deletion script by deleting subfolders that are now empty. Because of the nesting of subfolders, I need to avoid deleting a subfolder that is empty of files, but has a subfolder below it that contains a file.
For example:
FILE3a is 10 days old. FILE3 is 45 days old.
I want to clean up the structure removing files older than 30 days, and delete empty subfolders.
C:\Junk\subfolder1a\subfolder2a\FILE3a
C:\Junk\subfolder1a\subfolder2a\subfolder3a
C:\Junk\subfolder1a\subfolder2B\FILE3b
Desired result:
Delete: FILE3b, subfolder2B & subfolder3a.
Leave: subfolder1a, subfolder2a, and FILE3a.
I can recursively clean up the files. How do I clean up the subfolders without deleting subfolder1a? (The "Junk" folder will always remain.)
I would do this in two passes - deleting the old files first and then the empty dirs:
Get-ChildItem -recurse | Where {!$_.PSIsContainer -and `
$_.LastWriteTime -lt (get-date).AddDays(-31)} | Remove-Item -whatif
Get-ChildItem -recurse | Where {$_.PSIsContainer -and `
#(Get-ChildItem -Lit $_.Fullname -r | Where {!$_.PSIsContainer}).Length -eq 0} |
Remove-Item -recurse -whatif
This type of operation demos the power of nested pipelines in PowerShell which the second set of commands demonstrates. It uses a nested pipeline to recursively determine if any directory has zero files under it.
In the spirit of the first answer, here is the shortest way to delete the empty directories:
ls -recurse | where {!#(ls -force $_.fullname)} | rm -whatif
The -force flag is needed for the cases when the directories have hidden folders, like .svn
This will sort subdirectories before parent directories working around the empty nested directory problem.
dir -Directory -Recurse |
%{ $_.FullName} |
sort -Descending |
where { !#(ls -force $_) } |
rm -WhatIf
Adding on to the last one:
while (Get-ChildItem $StartingPoint -recurse | where {!#(Get-ChildItem -force $_.fullname)} | Test-Path) {
Get-ChildItem $StartingPoint -recurse | where {!#(Get-ChildItem -force $_.fullname)} | Remove-Item
}
This will make it complete where it will continue searching to remove any empty folders under the $StartingPoint
I needed some enterprise-friendly features. Here is my take.
I started with code from other answers, then added a JSON file with original folder list (including file count per folder). Removed the empty directories and log those.
https://gist.github.com/yzorg/e92c5eb60e97b1d6381b
param (
[switch]$Clear
)
# if you want to reload a previous file list
#$stat = ConvertFrom-Json (gc dir-cleanup-filecount-by-directory.json -join "`n")
if ($Clear) {
$stat = #()
} elseif ($stat.Count -ne 0 -and (-not "$($stat[0].DirPath)".StartsWith($PWD.ProviderPath))) {
Write-Warning "Path changed, clearing cached file list."
Read-Host -Prompt 'Press -Enter-'
$stat = #()
}
$lineCount = 0
if ($stat.Count -eq 0) {
$stat = gci -Recurse -Directory | %{ # -Exclude 'Visual Studio 2013' # test in 'Documents' folder
if (++$lineCount % 100 -eq 0) { Write-Warning "file count $lineCount" }
New-Object psobject -Property #{
DirPath=$_.FullName;
DirPathLength=$_.FullName.Length;
FileCount=($_ | gci -Force -File).Count;
DirCount=($_ | gci -Force -Directory).Count
}
}
$stat | ConvertTo-Json | Out-File dir-cleanup-filecount-by-directory.json -Verbose
}
$delelteListTxt = 'dir-cleanup-emptydirs-{0}-{1}.txt' -f ((date -f s) -replace '[-:]','' -replace 'T','_'),$env:USERNAME
$stat |
? FileCount -eq 0 |
sort -property #{Expression="DirPathLength";Descending=$true}, #{Expression="DirPath";Descending=$false} |
select -ExpandProperty DirPath | #-First 10 |
?{ #(gci $_ -Force).Count -eq 0 } | %{
Remove-Item $_ -Verbose # -WhatIf # uncomment to see the first pass of folders to be cleaned**
$_ | Out-File -Append -Encoding utf8 $delelteListTxt
sleep 0.1
}
# ** - The list you'll see from -WhatIf isn't a complete list because parent folders
# might also qualify after the first level is cleaned. The -WhatIf list will
# show correct breath, which is what I want to see before running the command.
To remove files older than 30 days:
get-childitem -recurse |
? {$_.GetType() -match "FileInfo"} |
?{ $_.LastWriteTime -lt [datetime]::now.adddays(-30) } |
rm -whatif
(Just remove the -whatif to actually perform.)
Follow up with:
get-childitem -recurse |
? {$_.GetType() -match "DirectoryInfo"} |
?{ $_.GetFiles().Count -eq 0 -and $_.GetDirectories().Count -eq 0 } |
rm -whatif
This worked for me.
$limit = (Get-Date).AddDays(-15)
$path = "C:\Some\Path"
Delete files older than the $limit:
Get-ChildItem -Path $path -Recurse -Force | Where-Object { !$_.PSIsContainer -and $_.CreationTime -lt $limit } | Remove-Item -Force
Delete any empty directories left behind after deleting the old files:
Get-ChildItem -Path $path -Recurse -Force | Where-Object { $_.PSIsContainer -and (Get-ChildItem -Path $_.FullName -Recurse -Force | Where-Object { !$_.PSIsContainer }) -eq $null } | Remove-Item -Force -Recurse

Resources