Maybe the header is wrong but i dont know how to explain.
I have 4 csv files with aprox 15000 rows in each looking like this
number,"surname","forename","emailAddress","taxIdentifier"
100238963,"Smith","John","john.smith#gmail.com","xxxxxxxxxxxx"
Im reading in 9999 of the rows and creating a json file we use on a site to check every person, we then get a respond back for most of the users, and that respons is "number"
Then i need to find all them persons in the first array.
I have done it like this today, but it take to much time to check every person like this, is there any better way of doing this?
This is the code for getting the persons from the file and create json file:
$Files = Get-ChildItem -Path "$Folders\\*" -Include *.csv -Force
foreach ($File in $Files){
$fname = $file
$fname = (Split-Path $File.name -leaf).ToString().Replace(".csv", "")
$Savefile = $fname+ "_Cleaned.csv"
$users = Import-Csv $File
$body = "{`"requestId`": `"144x25`",`"items`": ["
$batchSize = 9999
$batchNum = 0
$row = 0
while ($row -lt $users.Count) {
$test = $users[$row..($row + $batchSize - 1)]
foreach ($user in $test) {
$nr = $user.number
$tax = $user.taxIdentifier
$body += "{`"itemId`": `"$nr`",`"subjectId`": `"$tax`"},"
}
And then this is the code to deal with the respons:
$Result = #()
foreach ($1 in $response.allowedItemIds)
{
foreach ($2 in $Users){
If ($2.number -like $1)
{
$Result += [pscustomobject]#{
number = $2.number
Surname = $2.surname
Forename = $2.forename
Email = $2.emailaddress
Taxidendifier = $2.taxIdentifier
}
}
}
}
$Result | Export-Csv -path "$folders\$savefile" -NoTypeInformation -Append
$row += $batchSize
$batchNum++
Hope someone has any ideas
Cheers
I think you can just do this:
# read the original data file
$originalCsv = #"
number,"surname","forename","emailAddress","taxIdentifier"
1000,"Smith","Mel","mel.smith#example.org","xxxxxxxxxxxx"
3000,"Wilde","Kim","kim.wilde#example.org","xxxxxxxxxxxx"
2000,"Jones","Gryff Rhys","gryff.jones#example.org","xxxxxxxxxxxx"
"#
$originalData = $originalCsv | ConvertFrom-Csv
# get a response from the api
$responseJson = #"
{
"requestId": "144x25",
"responseId": "2efb8b47-d693-46ac-96b1-a31288567cf3",
"allowedItemIds": [ 1000, 2000 ]
}
"#
$responseData = $responseJson | ConvertFrom-Json
# filter original data for matches to the response
$matches = $originalData | where-object { $_.number -in $responseData.allowedItemIds }
# number surname forename emailAddress taxIdentifier
# ------ ------- -------- ------------ -------------
# 1000 Smith Mel mel.smith#example.org xxxxxxxxxxxx
# 2000 Jones Gryff Rhys gryff.jones#example.org xxxxxxxxxxxx
# write the data out
$matches | Export-Csv -Path ".\myfile.csv" -NoTypeInformation -Append
I don't know if that will perform better than your example, but it should do as it's not got a nested loop that runs original row count * response row count times.
Related
I have the following code:
function realtest
{
$files = Get-ChildItem -Path 'D:\data\' -Filter *.csv
$tester = [PSCustomObject]#{
foreach ($file in $files)
{
$tempName = $file.BaseName
$temp = Import-Csv $file
$tester | Add-Member -MemberType NoteProperty -Name $tempName -Value $temp.$tempName
}
$tester
$tester | Export-Csv "D:\result.csv" -NoTypeInformation
}
I am trying to export a bunch of data to CSV however when it is display the data on csv it is shown as below
"E0798T102","E0798T103"
"System.Object[]","System.Object[]"
but when i do it as a print on console it displays as the below
E0798T102 E0798T103
--------- ---------
{0, 0, 0, 0...} {0, 0, 0, 0...}
Ultimately, I want E0798T102 and E0798T103 as seperate columns in the result.csv
just to note, I will have 50 csv to loop through and each should display as its own column
Here is an incredibly inefficient answer to your question. If left as is, it assumes your CSV files already have a header with the CSV file basename:
$CSVs = Get-ChildItem -path 'D:\data\' -filter "*.csv" -file
$headers = $CSVs.basename
$table = [System.Data.DataTable]::new("Files")
foreach ($header in $headers) {
$table.Columns.Add($header) | out-null
}
foreach ($CSV in $CSVs) {
#$contents = Import-Csv $CSV -Header $CSV.basename # If CSV has no header
$contents = Import-Csv $CSV # If CSV contains header
$rowNumber = 0
foreach ($line in $Contents) {
$rowcount = $table.rows.count
if ($rowNumber -ge $rowCount) {
$row = $table.NewRow()
$row[$CSV.basename] = $line.$($CSV.basename)
$table.Rows.Add($row)
}
else {
$row = $table.rows[$rowNumber]
$row[$CSV.basename] = $line.$($CSV.basename)
}
$rowNumber++
}
}
$table | Export-Csv output.csv -NoTypeInformation
You can uncomment the commented $contents line if your CSV files do not have a header. You will just have to comment out the next $contents variable assignment if you uncomment the first.
Based on your snippet, this can be significantly simplified:
function Get-Csv {
$col = foreach ($file in Get-ChildItem -Path D:\data -Filter *.csv) {
$csv = Import-Csv -Path $file.FullName
[pscustomobject]#{
$csv.($file.BaseName) = $csv
}
}
$col | Export-Csv D:\result.csv -NoTypeInformation
return $col
}
However, a csv file seems like the wrong approach because you're trying to embed objects under a header. This doesn't really work in a tabular format as you only get one layer of depth. You should either expand all the properties on your objects, or use a different format that can represent depth, like json.
The reason for your formatting woes is due to how the serialization works. You're getting a string representation of your objects.
Converting to json isn't difficult, you just trade your Export-Csv call:
$col | ConvertTo-Json -Depth 100 | Out-File -FilePath D:\result.json
Note: I specify -Depth 100 because the cmdlet's default Depth is 2.
I am working with two CSV files. One holds the name of users and the other one holds their corresponding email address. What I want to do is to combine them both so that users is column 1 and email is column 2 and output it to one file. So far, I've managed to add a second column from the email csv file to the user csv file, but with blank row data. Below is the code that I am using:
$emailCol= import-csv "C:\files\temp\emailOnly.csv" | Select-Object -skip 1
$emailArr=#{}
$i=0
$nameCol = import-csv "C:\files\temp\nameOnly.csv"
foreach ($item in $emailCol){
$nameCol | Select *, #{
Name="email";Expression=
{$emailArr[$i]}
} | Export-Csv -path
C:\files\temp\revised.csv -NoTypeInformation
}
Updated: Below is what worked for me. Thanks BenH!
function combineData {
#This function will combine the user CSV file and
#email CSV file into a single file
$emailCol = Get-Content "C:\files\temp\emailOnly.csv"
| Select-Object -skip 1
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" |
Select-Object -skip 1
# Max function to find the larger count of the two
#csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
fullName = $nameCol[$i]
email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv
-NoTypeInformation
}
To prevent some additional questions about this theme let me show you alternative approach. If your both CSV files have same number of lines and each line of the first file corresponds to the first line of the second file and etc. then you can do next. For example, users.csv:
User
Name1
Name2
Name3
Name4
Name5
and email.csv:
Email
mail1#gmail.com
mail2#gmail.com
mail3#gmail.com
mail5#gmail.com
Our purpose:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name4",
"Name5","mail5#gmail.com"
What we do?
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
[Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv | Export-Csv C:\path\to\output.csv
If our purpose is:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name5","mail5#gmail.com"
then:
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
([Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv).Where{$_.Email} | Export-Csv C:\path\to\output.csv
Hope this helps you in the future.
A for loop would be better suited for your loop. Then use the counter as the index for each of the arrays to build your new object.
$emailCol = Get-Content "C:\files\temp\emailOnly.csv" | Select-Object -Skip 2
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" | Select-Object -Skip 1
# Max function to find the larger count of the two csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
Name = $nameCol[$i]
Email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv -NoTypeInformation
Answer edited to use Get-Content with an extra skip added to skip the header line in order to handle blank lines.
I have a file that contains CampaignNames and IDs. The two fields are separated by a pipe |. The IDs are separated by a space. I want to find all rows in a file (thorpe þ delimited) that contain the IDs, and output those rows into separate files per name. This file is usually 4-7 GB, sometimes larger.
campaigns.txt:
Name|NameID
FirstName|123 212 445 39
SecondName|313 939
ThirdName|219
Data ID File:
DateþIDþCode
10-22-14þ123þAbc
10-24-16þ212þPow
09-18-15þ219
So I would want 3 files created. FirstName.txt contains 2 rows. SecondName.txt contains 0 rows. ThirdName.txt contains 1 row.
I cobbled together some code from various sources and came up with this. However, I'm wondering if there's a better way than having to read through the data file multiple times. Any thoughts out there?
$campaigns = Import-Csv "campaigns.txt" -Delimiter "|"
$datafile = "5282_10-19-2016"
$encoding = [Text.Encoding]::GetEncoding('iso-8859-1')
echo "Starting.."
Get-Date -Format g
foreach ($campaign in $campaigns) {
$campaignname = $campaign.CampaignName
$campaignids = $campaign.CampaignID.split(" ")
echo "Looking for $campaignname - $campaignids"
$writer = New-Object System.IO.StreamWriter($campaignname + "_filtered.txt")
foreach ($campaignid in $campaignids) {
$datareader = New-Object System.IO.StreamReader($datafile, $encoding)
while ($dataline = $datareader.ReadLine()) {
if ($dataline -match $campaignid) {
$data = $dataline.Split("þ")
$writer.WriteLine('{0}|{1}|{2}|{3}|{4}|{5}|{6}|{7}', $data[0], $data[3], $data[5], $data[8], $data[12], $data[14], $data[19], $data[20])
}
}
}
$writer.Close()
}
echo "Done!"
Get-Date -Format g
Process the huge datafile just once.
Pick the campaign names from a hashtable built from campaign.txt.
Assuming there are not many campaigns (say, less than 1000) write to as many StreamWriters.
$campaignByID = #{}
foreach ($c in (Import-Csv 'campaigns.txt' -Delimiter '|')) {
foreach ($id in ($c.CampaignID -split ' ')) {
$campaignByID[$id] = $c.CampaignName
}
}
$campaignWriters = #{}
$datareader = New-Object IO.StreamReader($datafile, $encoding)
while (!$datareader.EndOfStream) {
$data = $datareader.ReadLine().Split('þ')
$campaignName = $campaignByID[$data[1]]
if ($campaignName) {
$writer = $campaignWriters[$campaignName]
if (!$writer) {
$writer = $campaignWriters[$campaignName] =
New-Object IO.StreamWriter($campaignName + '_filtered.txt')
}
$writer.WriteLine(($data[0,3,5,8,12,14,19,20] -join '|'))
}
}
$datareader.Close()
foreach ($writer in $campaignWriters.Values) {
$writer.Close()
}
To display progress use Write-Progress based on $datareader.BaseStream.Position / $datareader.BaseStream.Length * 100 but don't do it for every datafile line because it'll slow down the processing, do it every 1 second, for example, using a datetime variable: update it when a second has elapsed and display the progress.
try this ;)
$campaigns=import-csv C:\temp\campaigns.txt -Delimiter "|"
$datafile=import-csv C:\temp\5282_10-19-2016.txt -Delimiter "þ" -Encoding Default
$DirResult="C:\temp\root"
$campaigns | %{ foreach ($item in ($_.NameID.Split(" "))) {New-Object PSObject -Property #{ Name=$_.Name ; ValID=$item} } } | %{ $datafile | where id -eq $_.ValID | export-csv -Append -Delimiter "|" -Path ("$dirresult\" + $_.ValID + "_filtered.txt") -NoTypeInformation }
I have a set of strings gathered from logs that I'm trying to parse into unique entries:
function Scan ($path, $logPaths, $pattern)
{
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive - AllMatches | % `
{
$regexDateTime = New-Object System.Text.RegularExpressions.Regex "((?:\d{4})-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}(,\d{3})?)"
$matchDate = $regexDateTime.match($_)
if($matchDate.success)
{
$loglinedate = [System.DateTime]::ParseExact($matchDate, "yyyy-MM-dd HH:mm:ss,FFF", [System.Globalization.CultureInfo]::InvariantCulture)
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
}
}
$messageArr | sort $message -Unique | foreach { Write-Host -f Green $date$message}
}
}
So for this input:
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:06 [20] WARN Core.Ports.Services.ReferenceDataCheckers.SharedCheckers.DocumentLibraryMustExistService - A DocumentLibrary 3 could not be found.
2015-09-04 07:50:16 [20] WARN Brighter - The message abc123 has been marked as obsolete by the consumer as the entity has a higher version on the consumer side.
Only the second two entries should be returned
I'm having trouble filtering out duplicates of $message: currently all entries are being returned (sort -Unique is not behaving as I would expect it to). I also need the correct $date to be returned against the filtered $message.
I'm pretty stuck with this, can anyone help?
We can do what you want, but first let's backup just a little bit to help us do this better. Right now you have an array of arrays, and that's difficult to work with in general. What would be better is if you had an array of objects, and those objects had properties such as Date and Message. Let's start there.
if ($loglinedate -gt $laterThan)
{
$date = $($_.toString().TrimStart() -split ']')[0]
$message = $($_.toString().TrimStart() -split ']')[1]
$messageArr += ,$date,$message
}
is going to become...
if ($loglinedate -gt $laterThan)
{
[Array]$messageArr += [PSCustomObject]#{
'date' = $($_.toString().TrimStart() -split ']')[0]
'message' = $($_.toString().TrimStart() -split ']')[1]
}
}
That produces an array of objects, and each object has two properties, Date and Message. That will be much easier to work with.
If you only want the latest version of any message that's easily done with the Group-Object command as such:
$FilteredArr = $messageArr | Group Message | ForEach{$_.Group|sort Date|Select -Last 1}
Then if you want to display it to screen like you are, you could do:
$Filtered|ForEach{Write-Host -f Green ("{0}`t{1}" -f $_.Date, $_.Message)}
My take (not tested) :
function Scan ($path, $logPaths, $pattern)
{
$regex = '(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(.+)'
$ht = #{}
$logPaths | % `
{
$file = $_.FullName
Write-Host "`n[$file]"
Get-Content $file | Select-String -Pattern $pattern -CaseSensitive -AllMatches | % `
{
if ($_.line -match $regex -and $ht[$matches[2]] -gt $matches[1])
{ $ht[$matches[2]] = $matches[1] }
}
$ht.GetEnumerator() |
sort Value |
foreach { Write-Host -f Green "$($_.Value)$($_.Name)" }
}
}
This splits the file at the timestamp, and loads the parts into a hash table, using the error message as the key and the timestamp as the data (this will de-dupe the messages in-stream).
The timestamps are already in string-sortable format (yyyy-MM-dd HH:mm:ss), so there's really no need to cast them to [datetime] to find the latest one. Just do a straight string compare, and if the incoming timestamp is greater than an existing value for that message, replace the existing value with the new one.
When you're done, you should have a hash table with a key for each unique message found, having a value of the latest timestamp found for that message.
I am trying to parse robocopy log files to get file size, path, and date modified. I am getting the information via regex with no issues. However, for some reason, I am getting an array with a single element, and that element contains 3 hashes. My terminology might be off; I am still learning about hashes. What I want is a regular array with multple elements.
Output that I am getting:
FileSize FilePath DateTime
-------- -------- --------
{23040, 36864, 27136, 24064...} {\\server1\folder\Test File R... {2006/03/15 21:08:01, 2010/12...
As you can see, there is only one row, but that row contains multiple items. I want multiple rows.
Here is my code:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent = New-Object System.Collections.Generic.List[PSCustomObject]
Get-Content $Path\$InFile -ReadCount $Batch | ForEach-Object {
$FileSize = $_ -match $Match_Regex -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -match $Match_Regex -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -match $Match_Regex -replace $Replace_Regex,('$3').Trim()
$Props = #{
FileSize = $FileSize;
DateTime = $DateTime;
FilePath = $FilePath
}
$Obj = [PSCustomObject]$Props
$MainContent.Add($Obj)
}
$MainContent | % {
$_
}
What am I doing wrong? I am just not getting it. Thanks.
Note: This needs to be as fast as possible because I have to process millions of lines, which is why I am trying System.Collections.Generic.List.
I think the problem is that for what you're doing you actually need two foreach-object loops. Using Get-Content with -Readcount is going to give you an array of arrays. Use the -Match in the first Foreach-Object to filter out the records that match in each array. That's going to give you an array of the matched records. Then you need to foreach through that array to create one object for each record:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent =
Get-Content $Path\$InFile -ReadCount $Batch |
ForEach-Object {
$_ -match $Match_Regex |
ForEach-Object {
$FileSize = $_ -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -replace $Replace_Regex,('$3').Trim()
[PSCustomObject]#{
FileSize = $FileSize
DateTime = $DateTime
FilePath = $FilePath
}
}
}
You don't really need to use the collection as an accumulator, just output PSCustomObjects, and let them accumulate in the result variable.