How to split a CSV file depending of row values - file

Below is only an example, I have seen a lot of script to breakdown a .CSV file in smaller files but struggling with this.
How can we with PowerShell, find the header indicated by ALPH take each subsequent line, stop when it reaches ALPT (inclusive) and put this text into another file.
The operation will need to run through the whole file and the number of ALPD or ALPC lines will vary.
ALPH can be considered as a header while the information contained is needed as some field value can be different. The only constant are ALPH and ALPT.
ALPH;8102014
ALPC;PK
ALPD;50
ALPD;40
ALPT;5
ALPH;15102014
ALPC;PK
ALPD;50
ALPD;50
ALPD;70
ALPD;70
ALPD;71
ALPD;72
ALPD;40
ALPT;6
ALPH;15102014
ALPC;PK
ALPD;50
ALPD;50
ALPD;40
ALPT;6

If I understood your question correctly, something like this should work:
$csv = 'C:\path\to\your.csv'
$pattern = 'ALPH[\s\S]*?ALPT.*'
$cnt = 0
[IO.File]::ReadAllText($csv) | Select-String $pattern -AllMatches |
select -Expand Matches | select -Expand Groups |
% {
$cnt++
$outfile = Join-Path (Split-Path $csv -Parent) "split${cnt}.csv"
[IO.File]::WriteAllText($outfile, $_.Value)
}

Here is a way using switch. Your original file is in C:\temp\ALPH.CSV here is the way I imagine to find the begin an the end.
$n = 1
switch -File 'C:\temp\ALPH.CSV' -Regex
{
'^ALPH.*' {
Write-Host "Begin $n"
}
'^ALPT.*' {
Write-Host "End $n"
$n++
}
}
Now saving lines to a var and exporting files :
$n = 1
$csvTmp = #()
switch -File 'C:\temp\ALPH.CSV' -Regex
{
'^ALPH.*' {
Write-Host "Begin $n"
$csvTmp += $_
}
'^ALPT.*' {
Write-Host "End $n"
$csvTmp += $_
$csvTmp | Set-Content "c:\temp\file$n.csv"
$csvTmp = #()
$n++
}
default {
$csvTmp += $_
}
}

Related

Replacing lines of a text file with powershell, based on lines of a different text file

So i have 2 files that have the same style of listed content- Font ID, Font Def, and a Timestamp. I want to take a 2nd file of new fonts, and replace the lines of the first file that have matching font IDs---using powershell(no database which would be massively easier).
File2 text line = [FontIDA01] 5,5,5,5, randomtext, 11/10/2001
should replace the line of File1 where [FontIDA01] matches up, and replace the 5,5,5,5 with 6,6,6,6, and the date with the date on that line.
$content = Get-Content $fileSelected #(path chosen by user)
$masterContent = Get-Content $masterContentPath #(hardcoded path)
foreach($line in content)
{
$fontID = $line.SubString($startFontID, $endFontID)#this just sets font id = 23jkK instead of [23jkK]
foreach($masterLine in $masterContent)
{
if ($masterLine.Contains($fontID))
{
$masterContent -replace $masterLine, $line where-Object{$_.Name -contains $fontID} | Set-Content $masterContent -raw
}
}
}
Am I even close?
Collect the new data in a dictionary and use it for replacements:
# get new data in a dictionary
$newData = #{}
Get-Content 2.txt | %{
$parts = $_ -split ' '
$newData[$parts[0]] = #{numbers=$parts[1]; date=$parts[3]}
}
#patch original data using the new data dictionary
Get-Content 1.txt | %{
$parts = $_ -split ' '
$id = $parts[0]
$new = $newData[$id]
if ($new) {
$id, $new.numbers, $parts[2], $new.date -join ' '
} else {
$_
}
} | Out-File 3.txt -Encoding utf8
This code is assuming the fields are separated by spaces, so if it's not the case you'll have to use other methods of extracting the parts like Select-String or regexp matching: if ($_ -match '(.+?) ([\d,]+) and so on') { $id = $matches[0] }.

Get all unique substrings from array in Powershell

I have a set of strings gathered from logs that I'm trying to parse into unique entries:
function Scan ($path, $logPaths, $pattern)
{
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive - AllMatches | % `
{
$regexDateTime = New-Object System.Text.RegularExpressions.Regex "((?:\d{4})-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(,\d{3})?)"
$matchDate = $regexDateTime.match($_)
if($matchDate.success)
{
$loglinedate = [System.DateTime]::ParseExact($matchDate, "yyyy-MM-dd HH:mm:ss,FFF", [System.Globalization.CultureInfo]::InvariantCulture)
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
}
}
$messageArr | sort $message -Unique | foreach { Write-Host -f Green $date$message}
}
}
So for this input:
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:16 [20] WARN Brighter - The message abc123 has been marked as obsolete by the consumer as the entity has a higher version on the consumer side.
Only the second two entries should be returned
I'm having trouble filtering out duplicates of $message: currently all entries are being returned (sort -Unique is not behaving as I would expect it to). I also need the correct $date to be returned against the filtered $message.
I'm pretty stuck with this, can anyone help?
We can do what you want, but first let's backup just a little bit to help us do this better. Right now you have an array of arrays, and that's difficult to work with in general. What would be better is if you had an array of objects, and those objects had properties such as Date and Message. Let's start there.
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
is going to become...
if ($loglinedate -gt $laterThan)
{
[Array]$messageArr += [PSCustomObject]#{
'date' = $($_.toString().TrimStart() -split ']')[0]
'message' = $($_.toString().TrimStart() -split ']')[1]
}
}
That produces an array of objects, and each object has two properties, Date and Message. That will be much easier to work with.
If you only want the latest version of any message that's easily done with the Group-Object command as such:
$FilteredArr = $messageArr | Group Message | ForEach{$_.Group|sort Date|Select -Last 1}
Then if you want to display it to screen like you are, you could do:
$Filtered|ForEach{Write-Host -f Green ("{0}`t{1}" -f $_.Date, $_.Message)}
My take (not tested) :
function Scan ($path, $logPaths, $pattern)
{
$regex = '(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(.+)'
$ht = #{}
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % `
{
if ($_.line -match $regex -and $ht[$matches[2]] -gt $matches[1])
{ $ht[$matches[2]] = $matches[1] }
}
$ht.GetEnumerator() |
sort Value |
foreach { Write-Host -f Green "$($_.Value)$($_.Name)" }
}
}
This splits the file at the timestamp, and loads the parts into a hash table, using the error message as the key and the timestamp as the data (this will de-dupe the messages in-stream).
The timestamps are already in string-sortable format (yyyy-MM-dd HH:mm:ss), so there's really no need to cast them to [datetime] to find the latest one. Just do a straight string compare, and if the incoming timestamp is greater than an existing value for that message, replace the existing value with the new one.
When you're done, you should have a hash table with a key for each unique message found, having a value of the latest timestamp found for that message.

One element containing hashes instead of multiple elements - how to fix?

I am trying to parse robocopy log files to get file size, path, and date modified. I am getting the information via regex with no issues. However, for some reason, I am getting an array with a single element, and that element contains 3 hashes. My terminology might be off; I am still learning about hashes. What I want is a regular array with multple elements.
Output that I am getting:
FileSize FilePath DateTime
-------- -------- --------
{23040, 36864, 27136, 24064...} {\\server1\folder\Test File R... {2006/03/15 21:08:01, 2010/12...
As you can see, there is only one row, but that row contains multiple items. I want multiple rows.
Here is my code:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent = New-Object System.Collections.Generic.List[PSCustomObject]
Get-Content $Path\$InFile -ReadCount $Batch | ForEach-Object {
$FileSize = $_ -match $Match_Regex -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -match $Match_Regex -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -match $Match_Regex -replace $Replace_Regex,('$3').Trim()
$Props = #{
FileSize = $FileSize;
DateTime = $DateTime;
FilePath = $FilePath
}
$Obj = [PSCustomObject]$Props
$MainContent.Add($Obj)
}
$MainContent | % {
$_
}
What am I doing wrong? I am just not getting it. Thanks.
Note: This needs to be as fast as possible because I have to process millions of lines, which is why I am trying System.Collections.Generic.List.
I think the problem is that for what you're doing you actually need two foreach-object loops. Using Get-Content with -Readcount is going to give you an array of arrays. Use the -Match in the first Foreach-Object to filter out the records that match in each array. That's going to give you an array of the matched records. Then you need to foreach through that array to create one object for each record:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent =
Get-Content $Path\$InFile -ReadCount $Batch |
ForEach-Object {
$_ -match $Match_Regex |
ForEach-Object {
$FileSize = $_ -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -replace $Replace_Regex,('$3').Trim()
[PSCustomObject]#{
FileSize = $FileSize
DateTime = $DateTime
FilePath = $FilePath
}
}
}
You don't really need to use the collection as an accumulator, just output PSCustomObjects, and let them accumulate in the result variable.

PowerShell: Set-Content having issues with "file already in use"

I'm working on a PowerShell script that finds all the files with PATTERN within a given DIRECTORY, prints out the relevant lines of the document with the PATTERN highlighted, and then replaces the PATTERN with a provided REPLACE word, then saves the file back. So it actually edits the file.
Except I can't get it to alter the file, because Windows complains about the file already being open. I tried several methods to solve this, but keep running into the issue. Perhaps someone can help:
param(
[string] $pattern = ""
,[string] $replace = ""
,[string] $directory ="."
,[switch] $recurse = $false
,[switch] $caseSensitive = $false)
if($pattern -eq $null -or $pattern -eq "")
{
Write-Error "Please provide a search pattern." ; return
}
if($directory -eq $null -or $directory -eq "")
{
Write-Error "Please provide a directory." ; return
}
if($replace -eq $null -or $replace -eq "")
{
Write-Error "Please provide a string to replace." ; return
}
$regexPattern = $pattern
if($caseSensitive -eq $false) { $regexPattern = "(?i)$regexPattern" }
$regex = New-Object System.Text.RegularExpressions.Regex $regexPattern
function Write-HostAndHighlightPattern([string] $inputText)
{
$index = 0
$length = $inputText.Length
while($index -lt $length)
{
$match = $regex.Match($inputText, $index)
if($match.Success -and $match.Length -gt 0)
{
Write-Host $inputText.SubString($index, $match.Index) -nonewline
Write-Host $match.Value.ToString() -ForegroundColor Red -nonewline
$index = $match.Index + $match.Length
}
else
{
Write-Host $inputText.SubString($index) -nonewline
$index = $inputText.Length
}
}
}
Get-ChildItem $directory -recurse:$recurse |
Select-String -caseSensitive:$caseSensitive -pattern:$pattern |
foreach {
$file = ($directory + $_.FileName)
Write-Host "$($_.FileName)($($_.LineNumber)): " -nonewline
Write-HostAndHighlightPattern $_.Line
%{ Set-Content $file ((Get-Content $file) -replace ([Regex]::Escape("[$pattern]")),"[$replace]")}
Write-Host "`n"
Write-Host "Processed: $($file)"
}
The issue is located within the final block of code, right at the Get-ChildItem call. Of course, some of the code in that block is now a bit mangled due to me trying to fix the problem then stopping, but keep in mind the intent of that part of the script. I want to get the content, replace the words, then save the altered text back to the file I got it from.
Any help at all would be greatly appreciated.
Removed my previous answer, replacing it with this:
Get-ChildItem $directory -recurse:$recurse
foreach {
$file = ($directory + $_.FileName)
(Get-Content $file) | Foreach-object {
$_ -replace ([Regex]::Escape("[$pattern]")),"[$replace]")
} | Set-Content $file
}
Note:
The parentheses around Get-Content to ensure the file is slurped in one go (and therefore closed).
The piping to subsequent commands rather than inlining.
Some of your commands have been removed to ensure it's a simple test.
Just a suggestion but you might try looking at the documentation for the parameters code block. There is a more efficient way to ensure that a parameter is entered if you require it and to throw an error message if the user doesn't.
About_throw: http://technet.microsoft.com/en-us/library/dd819510.aspx
About_functions_advanced_parameters: http://technet.microsoft.com/en-us/library/dd347600.aspx
And then about using Write-Host all the time: http://powershell.com/cs/blogs/donjones/archive/2012/04/06/2012-scripting-games-commentary-stop-using-write-host.aspx
Alright, I finally sat down and just typed everything sequentially in PowerShell, then used that to make my script.
It was actually really simple;
$items = Get-ChildItem $directory -recurse:$recurse
$items |
foreach {
$file = $_.FullName
$content = get-content $file
$newContent = $content -replace $pattern, $replace
Set-Content $file $newcontent
}
Thanks for all your help guys.

Comparing folders and content with PowerShell

I have two different folders with xml files. One folder (folder2) contains updated and new xml files compared to the other (folder1). I need to know which files in folder2 are new/updated compared to folder1 and copy them to a third folder (folder3). What's the best way to accomplish this in PowerShell?
OK, I'm not going to code the whole thing for you (what's the fun in that?) but I'll get you started.
First, there are two ways to do the content comparison. The lazy/mostly right way, which is comparing the length of the files; and the accurate but more involved way, which is comparing a hash of the contents of each file.
For simplicity sake, let's do the easy way and compare file size.
Basically, you want two objects that represent the source and target folders:
$Folder1 = Get-childitem "C:\Folder1"
$Folder2 = Get-childitem "C:\Folder2"
Then you can use Compare-Object to see which items are different...
Compare-Object $Folder1 $Folder2 -Property Name, Length
which will list for you everything that is different by comparing only name and length of the file objects in each collection.
You can pipe that to a Where-Object filter to pick stuff that is different on the left side...
Compare-Object $Folder1 $Folder2 -Property Name, Length | Where-Object {$_.SideIndicator -eq "<="}
And then pipe that to a ForEach-Object to copy where you want:
Compare-Object $Folder1 $Folder2 -Property Name, Length | Where-Object {$_.SideIndicator -eq "<="} | ForEach-Object {
Copy-Item "C:\Folder1\$($_.name)" -Destination "C:\Folder3" -Force
}
Recursive Directory Diff Using MD5 Hashing (Compares Content)
Here is a pure PowerShell v3+ recursive file diff (no dependencies) that calculates MD5 hash for each directories file contents (left/right). Can optionally export CSV's along with a summary text file. Default outputs results to stdout. Can either drop the rdiff.ps1 file into your path or copy the contents into your script.
USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir]
Here is the gist. Recommended to use version from gist as it may have additional features over time. Feel free to send pull requests.
#########################################################################
### USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir] ###
### ADD LOCATION OF THIS SCRIPT TO PATH ###
#########################################################################
[CmdletBinding()]
param (
[parameter(HelpMessage="Stores the execution working directory.")]
[string]$ExecutionDirectory=$PWD,
[parameter(Position=0,HelpMessage="Compare two directories recursively for differences.")]
[alias("c")]
[string[]]$Compare,
[parameter(HelpMessage="Export a summary to path.")]
[alias("s")]
[string]$ExportSummary
)
### FUNCTION DEFINITIONS ###
# SETS WORKING DIRECTORY FOR .NET #
function SetWorkDir($PathName, $TestPath) {
$AbsPath = NormalizePath $PathName $TestPath
Set-Location $AbsPath
[System.IO.Directory]::SetCurrentDirectory($AbsPath)
}
# RESTORES THE EXECUTION WORKING DIRECTORY AND EXITS #
function SafeExit() {
SetWorkDir /path/to/execution/directory $ExecutionDirectory
Exit
}
function Print {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Message to print.")]
[string]$Message,
[parameter(HelpMessage="Specifies a success.")]
[alias("s")]
[switch]$SuccessFlag,
[parameter(HelpMessage="Specifies a warning.")]
[alias("w")]
[switch]$WarningFlag,
[parameter(HelpMessage="Specifies an error.")]
[alias("e")]
[switch]$ErrorFlag,
[parameter(HelpMessage="Specifies a fatal error.")]
[alias("f")]
[switch]$FatalFlag,
[parameter(HelpMessage="Specifies a info message.")]
[alias("i")]
[switch]$InfoFlag = !$SuccessFlag -and !$WarningFlag -and !$ErrorFlag -and !$FatalFlag,
[parameter(HelpMessage="Specifies blank lines to print before.")]
[alias("b")]
[int]$LinesBefore=0,
[parameter(HelpMessage="Specifies blank lines to print after.")]
[alias("a")]
[int]$LinesAfter=0,
[parameter(HelpMessage="Specifies if program should exit.")]
[alias("x")]
[switch]$ExitAfter
)
PROCESS {
if($LinesBefore -ne 0) {
foreach($i in 0..$LinesBefore) { Write-Host "" }
}
if($InfoFlag) { Write-Host "$Message" }
if($SuccessFlag) { Write-Host "$Message" -ForegroundColor "Green" }
if($WarningFlag) { Write-Host "$Message" -ForegroundColor "Orange" }
if($ErrorFlag) { Write-Host "$Message" -ForegroundColor "Red" }
if($FatalFlag) { Write-Host "$Message" -ForegroundColor "Red" -BackgroundColor "Black" }
if($LinesAfter -ne 0) {
foreach($i in 0..$LinesAfter) { Write-Host "" }
}
if($ExitAfter) { SafeExit }
}
}
# VALIDATES STRING MIGHT BE A PATH #
function ValidatePath($PathName, $TestPath) {
If([string]::IsNullOrWhiteSpace($TestPath)) {
Print -x -f "$PathName is not a path"
}
}
# NORMALIZES RELATIVE OR ABSOLUTE PATH TO ABSOLUTE PATH #
function NormalizePath($PathName, $TestPath) {
ValidatePath "$PathName" "$TestPath"
$TestPath = [System.IO.Path]::Combine((pwd).Path, $TestPath)
$NormalizedPath = [System.IO.Path]::GetFullPath($TestPath)
return $NormalizedPath
}
# VALIDATES STRING MIGHT BE A PATH AND RETURNS ABSOLUTE PATH #
function ResolvePath($PathName, $TestPath) {
ValidatePath "$PathName" "$TestPath"
$ResolvedPath = NormalizePath $PathName $TestPath
return $ResolvedPath
}
# VALIDATES STRING RESOLVES TO A PATH AND RETURNS ABSOLUTE PATH #
function RequirePath($PathName, $TestPath, $PathType) {
ValidatePath $PathName $TestPath
If(!(Test-Path $TestPath -PathType $PathType)) {
Print -x -f "$PathName ($TestPath) does not exist as a $PathType"
}
$ResolvedPath = Resolve-Path $TestPath
return $ResolvedPath
}
# Like mkdir -p -> creates a directory recursively if it doesn't exist #
function MakeDirP {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path create.")]
[string]$Path
)
PROCESS {
New-Item -path $Path -itemtype Directory -force | Out-Null
}
}
# GETS ALL FILES IN A PATH RECURSIVELY #
function GetFiles {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get files for.")]
[string]$Path
)
PROCESS {
ls $Path -r | where { !$_.PSIsContainer }
}
}
# GETS ALL FILES WITH CALCULATED HASH PROPERTY RELATIVE TO A ROOT DIRECTORY RECURSIVELY #
# RETURNS LIST OF #{RelativePath, Hash, FullName}
function GetFilesWithHash {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get directories for.")]
[string]$Path,
[parameter(HelpMessage="The hash algorithm to use.")]
[string]$Algorithm="MD5"
)
PROCESS {
$OriginalPath = $PWD
SetWorkDir path/to/diff $Path
GetFiles $Path | select #{N="RelativePath";E={$_.FullName | Resolve-Path -Relative}},
#{N="Hash";E={(Get-FileHash $_.FullName -Algorithm $Algorithm | select Hash).Hash}},
FullName
SetWorkDir path/to/original $OriginalPath
}
}
# COMPARE TWO DIRECTORIES RECURSIVELY #
# RETURNS LIST OF #{RelativePath, Hash, FullName}
function DiffDirectories {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Directory to compare left.")]
[alias("l")]
[string]$LeftPath,
[parameter(Mandatory=$TRUE,Position=1,HelpMessage="Directory to compare right.")]
[alias("r")]
[string]$RightPath
)
PROCESS {
$LeftHash = GetFilesWithHash $LeftPath
$RightHash = GetFilesWithHash $RightPath
diff -ReferenceObject $LeftHash -DifferenceObject $RightHash -Property RelativePath,Hash
}
}
### END FUNCTION DEFINITIONS ###
### PROGRAM LOGIC ###
if($Compare.length -ne 2) {
Print -x "Compare requires passing exactly 2 path parameters separated by comma, you passed $($Compare.length)." -f
}
Print "Comparing $($Compare[0]) to $($Compare[1])..." -a 1
$LeftPath = RequirePath path/to/left $Compare[0] container
$RightPath = RequirePath path/to/right $Compare[1] container
$Diff = DiffDirectories $LeftPath $RightPath
$LeftDiff = $Diff | where {$_.SideIndicator -eq "<="} | select RelativePath,Hash
$RightDiff = $Diff | where {$_.SideIndicator -eq "=>"} | select RelativePath,Hash
if($ExportSummary) {
$ExportSummary = ResolvePath path/to/summary/dir $ExportSummary
MakeDirP $ExportSummary
$SummaryPath = Join-Path $ExportSummary summary.txt
$LeftCsvPath = Join-Path $ExportSummary left.csv
$RightCsvPath = Join-Path $ExportSummary right.csv
$LeftMeasure = $LeftDiff | measure
$RightMeasure = $RightDiff | measure
"== DIFF SUMMARY ==" > $SummaryPath
"" >> $SummaryPath
"-- DIRECTORIES --" >> $SummaryPath
"`tLEFT -> $LeftPath" >> $SummaryPath
"`tRIGHT -> $RightPath" >> $SummaryPath
"" >> $SummaryPath
"-- DIFF COUNT --" >> $SummaryPath
"`tLEFT -> $($LeftMeasure.Count)" >> $SummaryPath
"`tRIGHT -> $($RightMeasure.Count)" >> $SummaryPath
"" >> $SummaryPath
$Diff | Format-Table >> $SummaryPath
$LeftDiff | Export-Csv $LeftCsvPath -f
$RightDiff | Export-Csv $RightCsvPath -f
}
$Diff
SafeExit
Further to #JNK's answer, you might want to ensure that you are always working with files rather than the less-intuitive output from Compare-Object. You just need to use the -PassThru switch...
$Folder1 = Get-ChildItem "C:\Folder1"
$Folder2 = Get-ChildItem "C:\Folder2"
$Folder2 = "C:\Folder3\"
# Get all differences, i.e. from both "sides"
$AllDiffs = Compare-Object $Folder1 $Folder2 -Property Name,Length -PassThru
# Filter for new/updated files from $Folder2
$Changes = $AllDiffs | Where-Object {$_.Directory.Fullname -eq $Folder2}
# Copy to $Folder3
$Changes | Copy-Item -Destination $Folder3
This at least means you don't have to worry about which way the SideIndicator arrow points!
Also, bear in mind that you might want to compare on LastWriteTime as well.
Sub-folders
Looping through the sub-folders recursively is a little more complicated as you probably will need to strip off the respective root folder paths from the FullName field before comparing lists.
You could do this by adding a new ScriptProperty to your Folder1 and Folder2 lists:
$Folder1 | Add-Member -MemberType ScriptProperty -Name "RelativePath" `
-Value {$this.FullName -replace [Regex]::Escape("C:\Folder1"),""}
$Folder2 | Add-Member -MemberType ScriptProperty -Name "RelativePath" `
-Value {$this.FullName -replace [Regex]::Escape("C:\Folder2"),""}
You should then be able to use RelativePath as a property when comparing the two objects and also use that to join on to "C:\Folder3" when copying to keep the folder structure in place.
Here's an approach which will find files which are missing or differ in content.
First, a quick-and-dirty one-liner (see caveat below).
dir -r | rvpa -Relative |%{ if (Test-Path $right\$_) { if (Test-Path -Type Leaf $_) { if ( diff (cat $_) (cat $right\$_ ) ) { $_ } } } else { $_ } }
Run the above in one of the directories, with $right set to (or replaced with) the path to the other directory. Things missing from $right, or which differ in content, will be reported. No output means no differences found. CAVEAT: Things existing in $right but missing from the left will not be found/reported.
This doesn't bother calculating hashes; it just compares the file contents directly. Hashing makes sense when you want to reference something in another context (later date, on another machine, etc.), but when we're comparing things directly, it adds nothing but overhead. (It's also theoretically possible for two files to have the same hash, although that's basically impossible to happen by accident. Deliberate attack, on the other hand...)
Here's a more proper script, which handles more corner cases and errors.
[CmdletBinding()]
Param(
[Parameter(Mandatory=$true,Position=0)][string]$Left,
[Parameter(Mandatory=$True,Position=1)][string]$Right
)
# throw errors on undefined variables
Set-StrictMode -Version 1
# stop immediately on error
$ErrorActionPreference = [System.Management.Automation.ActionPreference]::Stop
# init counters
$Items = $MissingRight = $MissingLeft = $Contentdiff = 0
# make sure the given parameters are valid paths
$left = Resolve-Path $left
$right = Resolve-Path $right
# make sure the given parameters are directories
if (-Not (Test-Path -Type Container $left)) { throw "not a container: $left" }
if (-Not (Test-Path -Type Container $right)) { throw "not a container: $right" }
# Starting from $left as relative root, walk the tree and compare to $right.
Push-Location $left
try {
Get-ChildItem -Recurse | Resolve-Path -Relative | ForEach-Object {
$rel = $_
$Items++
# make sure counterpart exists on the other side
if (-not (Test-Path $right\$rel)) {
Write-Output "missing from right: $rel"
$MissingRight++
return
}
# compare contents for files (directories just have to exist)
if (Test-Path -Type Leaf $rel) {
if ( Compare-Object (Get-Content $left\$rel) (Get-Content $right\$rel) ) {
Write-Output "content differs : $rel"
$ContentDiff++
}
}
}
}
finally {
Pop-Location
}
# Check items in $right for counterparts in $left.
# Something missing from $left of course won't be found when walking $left.
# Don't need to check content again here.
Push-Location $right
try {
Get-ChildItem -Recurse | Resolve-Path -Relative | ForEach-Object {
$rel = $_
if (-not (Test-Path $left\$rel)) {
Write-Output "missing from left : $rel"
$MissingLeft++
return
}
}
}
finally {
Pop-Location
}
Write-Verbose "$Items items, $ContentDiff differed, $MissingLeft missing from left, $MissingRight from right"
Handy version using script parameter
Simple file-level comparasion
Call it like PS > .\DirDiff.ps1 -a .\Old\ -b .\New\
Param(
[string]$a,
[string]$b
)
$fsa = Get-ChildItem -Recurse -path $a
$fsb = Get-ChildItem -Recurse -path $b
Compare-Object -Referenceobject $fsa -DifferenceObject $fsb
Possible output:
InputObject SideIndicator
----------- -------------
appsettings.Development.json <=
appsettings.Testing.json <=
Server.pdb =>
ServerClientLibrary.pdb =>
Do this:
compare (Get-ChildItem D:\MyFolder\NewFolder) (Get-ChildItem \\RemoteServer\MyFolder\NewFolder)
And even recursively:
compare (Get-ChildItem -r D:\MyFolder\NewFolder) (Get-ChildItem -r \\RemoteServer\MyFolder\NewFolder)
and is even hard to forget :)
gci -path 'C:\Folder' -recurse |where{$_.PSIsContainer}
-recurse will explore all subtrees below the root path given and the .PSIsContainer property is the one you want to test for to grab all folders only. You can use where{!$_.PSIsContainer} for just files.

Resources