One element containing hashes instead of multiple elements - how to fix? - arrays

I am trying to parse robocopy log files to get file size, path, and date modified. I am getting the information via regex with no issues. However, for some reason, I am getting an array with a single element, and that element contains 3 hashes. My terminology might be off; I am still learning about hashes. What I want is a regular array with multple elements.
Output that I am getting:
FileSize FilePath DateTime
-------- -------- --------
{23040, 36864, 27136, 24064...} {\\server1\folder\Test File R... {2006/03/15 21:08:01, 2010/12...
As you can see, there is only one row, but that row contains multiple items. I want multiple rows.
Here is my code:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent = New-Object System.Collections.Generic.List[PSCustomObject]
Get-Content $Path\$InFile -ReadCount $Batch | ForEach-Object {
$FileSize = $_ -match $Match_Regex -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -match $Match_Regex -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -match $Match_Regex -replace $Replace_Regex,('$3').Trim()
$Props = #{
FileSize = $FileSize;
DateTime = $DateTime;
FilePath = $FilePath
}
$Obj = [PSCustomObject]$Props
$MainContent.Add($Obj)
}
$MainContent | % {
$_
}
What am I doing wrong? I am just not getting it. Thanks.
Note: This needs to be as fast as possible because I have to process millions of lines, which is why I am trying System.Collections.Generic.List.

I think the problem is that for what you're doing you actually need two foreach-object loops. Using Get-Content with -Readcount is going to give you an array of arrays. Use the -Match in the first Foreach-Object to filter out the records that match in each array. That's going to give you an array of the matched records. Then you need to foreach through that array to create one object for each record:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent =
Get-Content $Path\$InFile -ReadCount $Batch |
ForEach-Object {
$_ -match $Match_Regex |
ForEach-Object {
$FileSize = $_ -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -replace $Replace_Regex,('$3').Trim()
[PSCustomObject]#{
FileSize = $FileSize
DateTime = $DateTime
FilePath = $FilePath
}
}
}
You don't really need to use the collection as an accumulator, just output PSCustomObjects, and let them accumulate in the result variable.

Related

Add Column to CSV file Using Other CSV file

I am working with two CSV files. One holds the name of users and the other one holds their corresponding email address. What I want to do is to combine them both so that users is column 1 and email is column 2 and output it to one file. So far, I've managed to add a second column from the email csv file to the user csv file, but with blank row data. Below is the code that I am using:
$emailCol= import-csv "C:\files\temp\emailOnly.csv" | Select-Object -skip 1
$emailArr=#{}
$i=0
$nameCol = import-csv "C:\files\temp\nameOnly.csv"
foreach ($item in $emailCol){
$nameCol | Select *, #{
Name="email";Expression=
{$emailArr[$i]}
} | Export-Csv -path
C:\files\temp\revised.csv -NoTypeInformation
}
Updated: Below is what worked for me. Thanks BenH!
function combineData {
#This function will combine the user CSV file and
#email CSV file into a single file
$emailCol = Get-Content "C:\files\temp\emailOnly.csv"
| Select-Object -skip 1
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" |
Select-Object -skip 1
# Max function to find the larger count of the two
#csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
fullName = $nameCol[$i]
email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv
-NoTypeInformation
}
To prevent some additional questions about this theme let me show you alternative approach. If your both CSV files have same number of lines and each line of the first file corresponds to the first line of the second file and etc. then you can do next. For example, users.csv:
User
Name1
Name2
Name3
Name4
Name5
and email.csv:
Email
mail1#gmail.com
mail2#gmail.com
mail3#gmail.com
mail5#gmail.com
Our purpose:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name4",
"Name5","mail5#gmail.com"
What we do?
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
[Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv | Export-Csv C:\path\to\output.csv
If our purpose is:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name5","mail5#gmail.com"
then:
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
([Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv).Where{$_.Email} | Export-Csv C:\path\to\output.csv
Hope this helps you in the future.
A for loop would be better suited for your loop. Then use the counter as the index for each of the arrays to build your new object.
$emailCol = Get-Content "C:\files\temp\emailOnly.csv" | Select-Object -Skip 2
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" | Select-Object -Skip 1
# Max function to find the larger count of the two csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
Name = $nameCol[$i]
Email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv -NoTypeInformation
Answer edited to use Get-Content with an extra skip added to skip the header line in order to handle blank lines.

Adding objects to an array in a hashtable

I want to create a Hashtable which groups files with the same name in arrays so I can later on work with those to list some properties of those files, like the folders where they're stored.
$ht = #{}
gci -recurse -file | % {
try{
$ht.Add($_.Name,#())
$ht[$_.Name] += $_
}
catch{
$ht[$_.Name] += $_
}
}
All I'm getting is:
Index operation failed; the array index evaluated to null.
At line:8 char:13
+ $ht[$_.Name] += $_
+ ~~~~~~~~~~~~~~~~~~
I'm not sure why this isn't working, I'd appreciate any help.
Don't reinvent the wheel. You want to group files with the same name, use the Group-Object cmdlet:
$groupedFiles = Get-ChildItem -recurse -file | Group-Object Name
Now you can easy retrieve all file names that are present at least twice using the Where-Object cmdlet:
$groupedFiles | Where-Object Count -gt 1
You are getting this error because if your code hits the catch block, the current pipeline variable ($_) represents the last error and not the current item. You can fix that by either storing the current item an a variable, or you use the -PipelineVariable advanced cmdlet parameter:
$ht = #{}
gci -recurse -file -PipelineVariable item | % {
try{
$ht.Add($item.Name,#())
$ht[$item.Name] += $item
}
catch{
$ht[$item.Name] += $item
}
}

Iterate through Rows in SQL to Output to Text File

I have a SQL table that contains several hundred rows of data. One of the columns in this table contains text reports that were stored as plain text within the column.
Essentially, I need to iterate through each row of data in SQL and output the contents of each row's report column to its own individual text file with a unique name pulled from another column.
I am trying to accomplish this via PowerShell and I seem to be hung up. Below is what I have thus far.
foreach ($i=0; $i -le $Reports.Count; $i++)
{
$SDIR = "C:\harassmentreports"
$FILENAME = $Reports | Select-Object FILENAME
$FILETEXT = $Reports | Select-Object TEXT
$NAME = "$SDIR\$FILENAME.txt"
if (!([System.IO.File]::Exists($NAME))) {
Out-File $NAME | Set-Content -Path $FULLFILE -Value $FILETEXT
}
}
Assuming that $Reports is a list of the records from your SQL query, you'll want to fix the following issues:
In an indexed loop use indexed access to the elements of your array:
$FILENAME = $Reports[$i] | Select-Object FILENAME
$FILETEXT = $Reports[$i] | Select-Object TEXT
Define variables outside the loop if their value doesn't change inside the loop:
$SDIR = "C:\harassmentreports"
foreach ($i=0; $i -le $Reports.Count; $i++) {
...
}
Expand properties if you want to use their value:
$FILENAME = $Reports[$i] | Select-Object -Expand FILENAME
$FILETEXT = $Reports[$i] | Select-Object -Expand TEXT
Use Join-Path for constructing paths:
$NAME = Join-Path $SDIR "$FILENAME.txt"
Use Test-Path for checking the existence of a file or folder:
if (-not (Test-Path -LiteralPath $NAME)) {
...
}
Use either Out-File
Out-File -FilePath $NAME -InputObject $TEXT
or Set-Content
Out-File -Path $NAME -Value $TEXT
not both of them. The basic difference between the two cmdlets is their default encoding. The former uses Unicode, the latter ASCII encoding. Both allow you to change the encoding via the parameter -Encoding.
You may also want to reconsider using a for loop in the first place. A pipeline with a ForEach-Object loop might be a better approach:
$SDIR = "C:\harassmentreports"
$Reports | ForEach-Object {
$file = Join-Path $SDIR ($_.FILENAME + '.txt')
if (-not (Test-Path $file)) { Set-Content -Path $file -Value $_.TEXT }
}

Can't access values in an array that's part of a foreach loop in powershell

I'm relatively new to powershell and coding and am having issues accessing the values in an array. I'm trying to loop thru a set of files using foreach and count the number of messages in each file. And then have the count for each file put in to an array so I can assign it to a variable. When I do write-host $data[0] it returns all the values. If I do write-host $data1 it returns nothing. It seems like these values are all being stored as one instead of as individual numbers. How do I get each value and then assign it to a variable. Any help would be appreciated.
$FilePath = 'some file path here'
$TodaysDate = (Get-Date -format "MM-dd-yyyy")
ForEach($file in Get-ChildItem $FilePath -exclude *.ps1,*.xml,*.xls | Where-Object {$_.LastWriteTime -ge $TodaysDate})
{
$data = ,#(Get-Content $file | Where-Object {$_.Contains("MSH|")}).Count
write-host $data[0]
}
exit
powershell result
In this line:
$data = ,#(Get-Content $file | Where-Object {$_.Contains("MSH|")}).Count
you are creating an array of a single element (the count). What you want to do is add to $data each time:
$data += ,#(Get-Content $file | Where-Object {$_.Contains("MSH|")}).Count
But given your description, I think you may want a hashtable, using the filename as a key:
$data = #{} #init hashtable
ForEach($file in Get-ChildItem $FilePath -exclude *.ps1,*.xml,*.xls | Where-Object {$_.LastWriteTime -ge $TodaysDate})
{
$data[$file] = #(Get-Content $file | Where-Object {$_.Contains("MSH|")}).Count
}
write-output $data

Get all unique substrings from array in Powershell

I have a set of strings gathered from logs that I'm trying to parse into unique entries:
function Scan ($path, $logPaths, $pattern)
{
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive - AllMatches | % `
{
$regexDateTime = New-Object System.Text.RegularExpressions.Regex "((?:\d{4})-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(,\d{3})?)"
$matchDate = $regexDateTime.match($_)
if($matchDate.success)
{
$loglinedate = [System.DateTime]::ParseExact($matchDate, "yyyy-MM-dd HH:mm:ss,FFF", [System.Globalization.CultureInfo]::InvariantCulture)
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
}
}
$messageArr | sort $message -Unique | foreach { Write-Host -f Green $date$message}
}
}
So for this input:
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:16 [20] WARN Brighter - The message abc123 has been marked as obsolete by the consumer as the entity has a higher version on the consumer side.
Only the second two entries should be returned
I'm having trouble filtering out duplicates of $message: currently all entries are being returned (sort -Unique is not behaving as I would expect it to). I also need the correct $date to be returned against the filtered $message.
I'm pretty stuck with this, can anyone help?
We can do what you want, but first let's backup just a little bit to help us do this better. Right now you have an array of arrays, and that's difficult to work with in general. What would be better is if you had an array of objects, and those objects had properties such as Date and Message. Let's start there.
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
is going to become...
if ($loglinedate -gt $laterThan)
{
[Array]$messageArr += [PSCustomObject]#{
'date' = $($_.toString().TrimStart() -split ']')[0]
'message' = $($_.toString().TrimStart() -split ']')[1]
}
}
That produces an array of objects, and each object has two properties, Date and Message. That will be much easier to work with.
If you only want the latest version of any message that's easily done with the Group-Object command as such:
$FilteredArr = $messageArr | Group Message | ForEach{$_.Group|sort Date|Select -Last 1}
Then if you want to display it to screen like you are, you could do:
$Filtered|ForEach{Write-Host -f Green ("{0}`t{1}" -f $_.Date, $_.Message)}
My take (not tested) :
function Scan ($path, $logPaths, $pattern)
{
$regex = '(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(.+)'
$ht = #{}
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % `
{
if ($_.line -match $regex -and $ht[$matches[2]] -gt $matches[1])
{ $ht[$matches[2]] = $matches[1] }
}
$ht.GetEnumerator() |
sort Value |
foreach { Write-Host -f Green "$($_.Value)$($_.Name)" }
}
}
This splits the file at the timestamp, and loads the parts into a hash table, using the error message as the key and the timestamp as the data (this will de-dupe the messages in-stream).
The timestamps are already in string-sortable format (yyyy-MM-dd HH:mm:ss), so there's really no need to cast them to [datetime] to find the latest one. Just do a straight string compare, and if the incoming timestamp is greater than an existing value for that message, replace the existing value with the new one.
When you're done, you should have a hash table with a key for each unique message found, having a value of the latest timestamp found for that message.

Resources