Get all unique substrings from array in Powershell - arrays

I have a set of strings gathered from logs that I'm trying to parse into unique entries:
function Scan ($path, $logPaths, $pattern)
{
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive - AllMatches | % `
{
$regexDateTime = New-Object System.Text.RegularExpressions.Regex "((?:\d{4})-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(,\d{3})?)"
$matchDate = $regexDateTime.match($_)
if($matchDate.success)
{
$loglinedate = [System.DateTime]::ParseExact($matchDate, "yyyy-MM-dd HH:mm:ss,FFF", [System.Globalization.CultureInfo]::InvariantCulture)
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
}
}
$messageArr | sort $message -Unique | foreach { Write-Host -f Green $date$message}
}
}
So for this input:
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:16 [20] WARN Brighter - The message abc123 has been marked as obsolete by the consumer as the entity has a higher version on the consumer side.
Only the second two entries should be returned
I'm having trouble filtering out duplicates of $message: currently all entries are being returned (sort -Unique is not behaving as I would expect it to). I also need the correct $date to be returned against the filtered $message.
I'm pretty stuck with this, can anyone help?

We can do what you want, but first let's backup just a little bit to help us do this better. Right now you have an array of arrays, and that's difficult to work with in general. What would be better is if you had an array of objects, and those objects had properties such as Date and Message. Let's start there.
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
is going to become...
if ($loglinedate -gt $laterThan)
{
[Array]$messageArr += [PSCustomObject]#{
'date' = $($_.toString().TrimStart() -split ']')[0]
'message' = $($_.toString().TrimStart() -split ']')[1]
}
}
That produces an array of objects, and each object has two properties, Date and Message. That will be much easier to work with.
If you only want the latest version of any message that's easily done with the Group-Object command as such:
$FilteredArr = $messageArr | Group Message | ForEach{$_.Group|sort Date|Select -Last 1}
Then if you want to display it to screen like you are, you could do:
$Filtered|ForEach{Write-Host -f Green ("{0}`t{1}" -f $_.Date, $_.Message)}

My take (not tested) :
function Scan ($path, $logPaths, $pattern)
{
$regex = '(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(.+)'
$ht = #{}
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % `
{
if ($_.line -match $regex -and $ht[$matches[2]] -gt $matches[1])
{ $ht[$matches[2]] = $matches[1] }
}
$ht.GetEnumerator() |
sort Value |
foreach { Write-Host -f Green "$($_.Value)$($_.Name)" }
}
}
This splits the file at the timestamp, and loads the parts into a hash table, using the error message as the key and the timestamp as the data (this will de-dupe the messages in-stream).
The timestamps are already in string-sortable format (yyyy-MM-dd HH:mm:ss), so there's really no need to cast them to [datetime] to find the latest one. Just do a straight string compare, and if the incoming timestamp is greater than an existing value for that message, replace the existing value with the new one.
When you're done, you should have a hash table with a key for each unique message found, having a value of the latest timestamp found for that message.

Related

Return Values of Single Column in PowerShell Array

I'm creating a PowerShell script for DPM which outputs offsite ready tapes. The below code takes a few seconds to complete the DPM connection and to return the query, which is fine.
# Connect to local DPM server
$DPMServer = $env:COMPUTERNAME
Connect-DPMServer -DPMServerName $DPMServer | Out-Null
# Get the DPM libary
$DPMLib = Get-DPMLibrary
#Format output
$Formatting = #{Expression={$_.Barcode}; Label="Barcode "; Width=12},
#{Expression={"{0:MM/dd/yyyy HH:mm tt}" -f $_.CreationDate}; Label="Creation Date "; Width=23},
#{Expression={$_.DisplayString}; Label="Tape Label "; Width=28},
#{Expression={"{0,15:N0}" -f $_.DataWrittenDisplayString}; Label="Data Written "}
#Calculate Monday at midnight of this week
$Monday = (Get-Date).AddDays((-1 * (Get-Date).DayOfWeek.Value__) + 1).Date
# Get specific tapes
$Tapes = Get-DPMTape -DPMLibrary $DPMLib |
Where-Object {$_.Barcode -notlike "CLN*"} |
Where-Object {$_.DisplayString -notlike "Free"} |
Where-Object {$_.CreationDate -gt $Monday} |
Sort-Object Barcode |
Format-Table -Property $Formatting
Write-Host "`nOffsite Ready Tapes:" -ForegroundColor Cyan
#Output tapes
$Tapes
This outputs like so:
My question is how do I now select just the barcodes listed within the $Tapes array, and output them (below what I already have) as a comma delimited list, without running the DPM query again. I've tried all sorts of things with no luck, I'm missing something obvious.
If I do a second connection to DPM, I can do it like this, but I'm trying to avoid doubling the time it takes to run. This is part of my original newbie script that I'm trying to better.
# Define DPM server to connect to
$DPMServer = $env:COMPUTERNAME
Connect-DPMServer -DPMServerName $DPMServer | Out-Null
# Get the DPM libary
$DPMLib = Get-DPMLibrary
# Get tape display strings and barcodes, sort by barcode
$Tapes = Get-DPMTape -DPMLibrary $DPMLib | Select-Object CreationDate, DisplayString, Barcode | Sort-Object Barcode
# Create empty array
$DPMTapesForOffsite = #()
Write-Host "`nOffsite Ready Tapes:`n" -ForegroundColor Cyan
foreach ($_ in $Tapes) {
# Exclude cleaning tapes
if ($_.Barcode -notlike "CLN*") {
# Exclude marked as free
if ($_.DisplayString -notlike "Free") {
$TimeStamp = Get-Date $_.CreationDate
# Timestamp is from this week
if ($Timestamp -gt $Monday) {
$DPMTapesForOffsite = $DPMTapesForOffsite + $_.barcode
Write-Host $_.barcode
}
}
}
}
# Format tape list as comma delimited
$DPMTapesForOffsite = $DPMTapesForOffsite -join ","
I'm missing some obvious, any help would greatly be appreciated.
# Omit Format-Table initially, so as to store actual *data* in $tapes,
# not *formatting instructions*, which is what Format-* cmdlets return.
$tapes = Get-DPMTape -DPMLibrary $DPMLib |
Where-Object {$_.Barcode -notlike "CLN*"} |
Where-Object {$_.DisplayString -notlike "Free"} |
Where-Object {$_.CreationDate -gt $Monday} |
Sort-Object Barcode
# Now you can apply the desired formatting.
Write-Host "`nOffsite Ready Tapes:" -ForegroundColor Cyan
$tapes | Format-Table -Property $Formatting
# Thanks to PowerShell's member-access enumeration feature,
# accessing property .Barcode on the entire $tapes *collection*
# conveniently returns its *elements'* property values.
# -join ',' joins them with commas.
$tapes.Barcode -join ','
Format-* cmdlets output objects whose sole purpose is to provide formatting instructions to PowerShell's output-formatting system - see this answer.
In short: only ever use Format-* cmdlets to format data for display, never for subsequent programmatic processing.
This answer explains member-access enumeration.

Adding objects to an array in a hashtable

I want to create a Hashtable which groups files with the same name in arrays so I can later on work with those to list some properties of those files, like the folders where they're stored.
$ht = #{}
gci -recurse -file | % {
try{
$ht.Add($_.Name,#())
$ht[$_.Name] += $_
}
catch{
$ht[$_.Name] += $_
}
}
All I'm getting is:
Index operation failed; the array index evaluated to null.
At line:8 char:13
+ $ht[$_.Name] += $_
+ ~~~~~~~~~~~~~~~~~~
I'm not sure why this isn't working, I'd appreciate any help.
Don't reinvent the wheel. You want to group files with the same name, use the Group-Object cmdlet:
$groupedFiles = Get-ChildItem -recurse -file | Group-Object Name
Now you can easy retrieve all file names that are present at least twice using the Where-Object cmdlet:
$groupedFiles | Where-Object Count -gt 1
You are getting this error because if your code hits the catch block, the current pipeline variable ($_) represents the last error and not the current item. You can fix that by either storing the current item an a variable, or you use the -PipelineVariable advanced cmdlet parameter:
$ht = #{}
gci -recurse -file -PipelineVariable item | % {
try{
$ht.Add($item.Name,#())
$ht[$item.Name] += $item
}
catch{
$ht[$item.Name] += $item
}
}

Display all values in PSCustomObject array [duplicate]

Is it possible to display the results of a PowerShell Compare-Object in two columns showing the differences of reference vs difference objects?
For example using my current cmdline:
Compare-Object $Base $Test
Gives:
InputObject SideIndicator
987654 =>
555555 <=
123456 <=
In reality the list is rather long. For easier data reading is it possible to format the data like so:
Base Test
555555 987654
123456
So each column shows which elements exist in that object vs the other.
For bonus points it would be fantastic to have a count in the column header like so:
Base(2) Test(1)
555555 987654
123456
Possible? Sure. Feasible? Not so much. PowerShell wasn't really built for creating this kind of tabular output. What you can do is collect the differences in a hashtable as nested arrays by input file:
$ht = #{}
Compare-Object $Base $Test | ForEach-Object {
$value = $_.InputObject
switch ($_.SideIndicator) {
'=>' { $ht['Test'] += #($value) }
'<=' { $ht['Base'] += #($value) }
}
}
then transpose the hashtable:
$cnt = $ht.Values |
ForEach-Object { $_.Count } |
Sort-Object |
Select-Object -Last 1
$keys = $ht.Keys | Sort-Object
0..($cnt-1) | ForEach-Object {
$props = [ordered]#{}
foreach ($key in $keys) {
$props[$key] = $ht[$key][$_]
}
New-Object -Type PSObject -Property $props
} | Format-Table -AutoSize
To include the item count in the header name change $props[$key] to $props["$key($($ht[$key].Count))"].

One element containing hashes instead of multiple elements - how to fix?

I am trying to parse robocopy log files to get file size, path, and date modified. I am getting the information via regex with no issues. However, for some reason, I am getting an array with a single element, and that element contains 3 hashes. My terminology might be off; I am still learning about hashes. What I want is a regular array with multple elements.
Output that I am getting:
FileSize FilePath DateTime
-------- -------- --------
{23040, 36864, 27136, 24064...} {\\server1\folder\Test File R... {2006/03/15 21:08:01, 2010/12...
As you can see, there is only one row, but that row contains multiple items. I want multiple rows.
Here is my code:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent = New-Object System.Collections.Generic.List[PSCustomObject]
Get-Content $Path\$InFile -ReadCount $Batch | ForEach-Object {
$FileSize = $_ -match $Match_Regex -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -match $Match_Regex -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -match $Match_Regex -replace $Replace_Regex,('$3').Trim()
$Props = #{
FileSize = $FileSize;
DateTime = $DateTime;
FilePath = $FilePath
}
$Obj = [PSCustomObject]$Props
$MainContent.Add($Obj)
}
$MainContent | % {
$_
}
What am I doing wrong? I am just not getting it. Thanks.
Note: This needs to be as fast as possible because I have to process millions of lines, which is why I am trying System.Collections.Generic.List.
I think the problem is that for what you're doing you actually need two foreach-object loops. Using Get-Content with -Readcount is going to give you an array of arrays. Use the -Match in the first Foreach-Object to filter out the records that match in each array. That's going to give you an array of the matched records. Then you need to foreach through that array to create one object for each record:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent =
Get-Content $Path\$InFile -ReadCount $Batch |
ForEach-Object {
$_ -match $Match_Regex |
ForEach-Object {
$FileSize = $_ -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -replace $Replace_Regex,('$3').Trim()
[PSCustomObject]#{
FileSize = $FileSize
DateTime = $DateTime
FilePath = $FilePath
}
}
}
You don't really need to use the collection as an accumulator, just output PSCustomObjects, and let them accumulate in the result variable.

How to remove item from an array in PowerShell?

I'm using Powershell 1.0 to remove an item from an Array. Here's my script:
param (
[string]$backupDir = $(throw "Please supply the directory to housekeep"),
[int]$maxAge = 30,
[switch]$NoRecurse,
[switch]$KeepDirectories
)
$days = $maxAge * -1
# do not delete directories with these values in the path
$exclusionList = Get-Content HousekeepBackupsExclusions.txt
if ($NoRecurse)
{
$filesToDelete = Get-ChildItem $backupDir | where-object {$_.PsIsContainer -ne $true -and $_.LastWriteTime -lt $(Get-Date).AddDays($days)}
}
else
{
$filesToDelete = Get-ChildItem $backupDir -Recurse | where-object {$_.PsIsContainer -ne $true -and $_.LastWriteTime -lt $(Get-Date).AddDays($days)}
}
foreach ($file in $filesToDelete)
{
# remove the file from the deleted list if it's an exclusion
foreach ($exclusion in $exclusionList)
{
"Testing to see if $exclusion is in " + $file.FullName
if ($file.FullName.Contains($exclusion)) {$filesToDelete.Remove($file); "FOUND ONE!"}
}
}
I realize that Get-ChildItem in powershell returns a System.Array type. I therefore get this error when trying to use the Remove method:
Method invocation failed because [System.Object[]] doesn't contain a method named 'Remove'.
What I'd like to do is convert $filesToDelete to an ArrayList and then remove items using ArrayList.Remove. Is this a good idea or should I directly manipulate $filesToDelete as a System.Array in some way?
Thanks
The best way to do this is to use Where-Object to perform the filtering and use the returned array.
You can also use #splat to pass multiple parameters to a command (new in V2). If you cannot upgrade (and you should if at all possible, then just collect the output from Get-ChildItems (only repeating that one CmdLet) and do all the filtering in common code).
The working part of your script becomes:
$moreArgs = #{}
if (-not $NoRecurse) {
$moreArgs["Recurse"] = $true
}
$filesToDelete = Get-ChildItem $BackupDir #moreArgs |
where-object {-not $_.PsIsContainer -and
$_.LastWriteTime -lt $(Get-Date).AddDays($days) -and
-not $_.FullName.Contains($exclusion)}
In PSH arrays are immutable, you cannot modify them, but it very easy to create a new one (operators like += on arrays actually create a new array and return that).
I agree with Richard, that Where-Object should be used here. However, it's harder to read.
What I would propose:
# get $filesToDelete and #exclusionList. In V2 use splatting as proposed by Richard.
$res = $filesToDelete | % {
$file = $_
$isExcluded = ($exclusionList | % { $file.FullName.Contains($_) } )
if (!$isExcluded) {
$file
}
}
#the files are in $res
Also note that generally it is not possible to iterate over a collection and change it. You would get an exception.
$a = New-Object System.Collections.ArrayList
$a.AddRange((1,2,3))
foreach($item in $a) { $a.Add($item*$item) }
An error occurred while enumerating through a collection:
At line:1 char:8
+ foreach <<<< ($item in $a) { $a.Add($item*$item) }
+ CategoryInfo : InvalidOperation: (System.Collecti...numeratorSimple:ArrayListEnumeratorSimple) [], RuntimeException
+ FullyQualifiedErrorId : BadEnumeration
This is ancient. But, I wrote these a while ago to add and remove from powershell lists using recursion. It leverages the ability of powershell to do multiple assignment . That is, you can do $a,$b,$c=#('a','b','c') to assign a b and c to their variables. Doing $a,$b=#('a','b','c') assigns 'a' to $a and #('b','c') to $b.
First is by item value. It'll remove the first occurrence.
function Remove-ItemFromList ($Item,[array]$List(throw"the item $item was not in the list"),[array]$chckd_list=#())
{
if ($list.length -lt 1 ) { throw "the item $item was not in the list" }
$check_item,$temp_list=$list
if ($check_item -eq $item )
{
$chckd_list+=$temp_list
return $chckd_list
}
else
{
$chckd_list+=$check_item
return (Remove-ItemFromList -item $item -chckd_list $chckd_list -list $temp_list )
}
}
This one removes by index. You can probably mess it up good by passing a value to count in the initial call.
function Remove-IndexFromList ([int]$Index,[array]$List,[array]$chckd_list=#(),[int]$count=0)
{
if (($list.length+$count-1) -lt $index )
{ throw "the index is out of range" }
$check_item,$temp_list=$list
if ($count -eq $index)
{
$chckd_list+=$temp_list
return $chckd_list
}
else
{
$chckd_list+=$check_item
return (Remove-IndexFromList -count ($count + 1) -index $index -chckd_list $chckd_list -list $temp_list )
}
}
This is a very old question, but the problem is still valid, but none of the answers fit my scenario, so I will suggest another solution.
I my case, I read in an xml configuration file and I want to remove an element from an array.
[xml]$content = get-content $file
$element = $content.PathToArray | Where-Object {$_.name -eq "ElementToRemove" }
$element.ParentNode.RemoveChild($element)
This is very simple and gets the job done.

Resources