I have a file that contains CampaignNames and IDs. The two fields are separated by a pipe |. The IDs are separated by a space. I want to find all rows in a file (thorpe þ delimited) that contain the IDs, and output those rows into separate files per name. This file is usually 4-7 GB, sometimes larger.
campaigns.txt:
Name|NameID
FirstName|123 212 445 39
SecondName|313 939
ThirdName|219
Data ID File:
DateþIDþCode
10-22-14þ123þAbc
10-24-16þ212þPow
09-18-15þ219
So I would want 3 files created. FirstName.txt contains 2 rows. SecondName.txt contains 0 rows. ThirdName.txt contains 1 row.
I cobbled together some code from various sources and came up with this. However, I'm wondering if there's a better way than having to read through the data file multiple times. Any thoughts out there?
$campaigns = Import-Csv "campaigns.txt" -Delimiter "|"
$datafile = "5282_10-19-2016"
$encoding = [Text.Encoding]::GetEncoding('iso-8859-1')
echo "Starting.."
Get-Date -Format g
foreach ($campaign in $campaigns) {
$campaignname = $campaign.CampaignName
$campaignids = $campaign.CampaignID.split(" ")
echo "Looking for $campaignname - $campaignids"
$writer = New-Object System.IO.StreamWriter($campaignname + "_filtered.txt")
foreach ($campaignid in $campaignids) {
$datareader = New-Object System.IO.StreamReader($datafile, $encoding)
while ($dataline = $datareader.ReadLine()) {
if ($dataline -match $campaignid) {
$data = $dataline.Split("þ")
$writer.WriteLine('{0}|{1}|{2}|{3}|{4}|{5}|{6}|{7}', $data[0], $data[3], $data[5], $data[8], $data[12], $data[14], $data[19], $data[20])
}
}
}
$writer.Close()
}
echo "Done!"
Get-Date -Format g
Process the huge datafile just once.
Pick the campaign names from a hashtable built from campaign.txt.
Assuming there are not many campaigns (say, less than 1000) write to as many StreamWriters.
$campaignByID = #{}
foreach ($c in (Import-Csv 'campaigns.txt' -Delimiter '|')) {
foreach ($id in ($c.CampaignID -split ' ')) {
$campaignByID[$id] = $c.CampaignName
}
}
$campaignWriters = #{}
$datareader = New-Object IO.StreamReader($datafile, $encoding)
while (!$datareader.EndOfStream) {
$data = $datareader.ReadLine().Split('þ')
$campaignName = $campaignByID[$data[1]]
if ($campaignName) {
$writer = $campaignWriters[$campaignName]
if (!$writer) {
$writer = $campaignWriters[$campaignName] =
New-Object IO.StreamWriter($campaignName + '_filtered.txt')
}
$writer.WriteLine(($data[0,3,5,8,12,14,19,20] -join '|'))
}
}
$datareader.Close()
foreach ($writer in $campaignWriters.Values) {
$writer.Close()
}
To display progress use Write-Progress based on $datareader.BaseStream.Position / $datareader.BaseStream.Length * 100 but don't do it for every datafile line because it'll slow down the processing, do it every 1 second, for example, using a datetime variable: update it when a second has elapsed and display the progress.
try this ;)
$campaigns=import-csv C:\temp\campaigns.txt -Delimiter "|"
$datafile=import-csv C:\temp\5282_10-19-2016.txt -Delimiter "þ" -Encoding Default
$DirResult="C:\temp\root"
$campaigns | %{ foreach ($item in ($_.NameID.Split(" "))) {New-Object PSObject -Property #{ Name=$_.Name ; ValID=$item} } } | %{ $datafile | where id -eq $_.ValID | export-csv -Append -Delimiter "|" -Path ("$dirresult\" + $_.ValID + "_filtered.txt") -NoTypeInformation }
Related
Maybe the header is wrong but i dont know how to explain.
I have 4 csv files with aprox 15000 rows in each looking like this
number,"surname","forename","emailAddress","taxIdentifier"
100238963,"Smith","John","john.smith#gmail.com","xxxxxxxxxxxx"
Im reading in 9999 of the rows and creating a json file we use on a site to check every person, we then get a respond back for most of the users, and that respons is "number"
Then i need to find all them persons in the first array.
I have done it like this today, but it take to much time to check every person like this, is there any better way of doing this?
This is the code for getting the persons from the file and create json file:
$Files = Get-ChildItem -Path "$Folders\\*" -Include *.csv -Force
foreach ($File in $Files){
$fname = $file
$fname = (Split-Path $File.name -leaf).ToString().Replace(".csv", "")
$Savefile = $fname+ "_Cleaned.csv"
$users = Import-Csv $File
$body = "{`"requestId`": `"144x25`",`"items`": ["
$batchSize = 9999
$batchNum = 0
$row = 0
while ($row -lt $users.Count) {
$test = $users[$row..($row + $batchSize - 1)]
foreach ($user in $test) {
$nr = $user.number
$tax = $user.taxIdentifier
$body += "{`"itemId`": `"$nr`",`"subjectId`": `"$tax`"},"
}
And then this is the code to deal with the respons:
$Result = #()
foreach ($1 in $response.allowedItemIds)
{
foreach ($2 in $Users){
If ($2.number -like $1)
{
$Result += [pscustomobject]#{
number = $2.number
Surname = $2.surname
Forename = $2.forename
Email = $2.emailaddress
Taxidendifier = $2.taxIdentifier
}
}
}
}
$Result | Export-Csv -path "$folders\$savefile" -NoTypeInformation -Append
$row += $batchSize
$batchNum++
Hope someone has any ideas
Cheers
I think you can just do this:
# read the original data file
$originalCsv = #"
number,"surname","forename","emailAddress","taxIdentifier"
1000,"Smith","Mel","mel.smith#example.org","xxxxxxxxxxxx"
3000,"Wilde","Kim","kim.wilde#example.org","xxxxxxxxxxxx"
2000,"Jones","Gryff Rhys","gryff.jones#example.org","xxxxxxxxxxxx"
"#
$originalData = $originalCsv | ConvertFrom-Csv
# get a response from the api
$responseJson = #"
{
"requestId": "144x25",
"responseId": "2efb8b47-d693-46ac-96b1-a31288567cf3",
"allowedItemIds": [ 1000, 2000 ]
}
"#
$responseData = $responseJson | ConvertFrom-Json
# filter original data for matches to the response
$matches = $originalData | where-object { $_.number -in $responseData.allowedItemIds }
# number surname forename emailAddress taxIdentifier
# ------ ------- -------- ------------ -------------
# 1000 Smith Mel mel.smith#example.org xxxxxxxxxxxx
# 2000 Jones Gryff Rhys gryff.jones#example.org xxxxxxxxxxxx
# write the data out
$matches | Export-Csv -Path ".\myfile.csv" -NoTypeInformation -Append
I don't know if that will perform better than your example, but it should do as it's not got a nested loop that runs original row count * response row count times.
I am working with two CSV files. One holds the name of users and the other one holds their corresponding email address. What I want to do is to combine them both so that users is column 1 and email is column 2 and output it to one file. So far, I've managed to add a second column from the email csv file to the user csv file, but with blank row data. Below is the code that I am using:
$emailCol= import-csv "C:\files\temp\emailOnly.csv" | Select-Object -skip 1
$emailArr=#{}
$i=0
$nameCol = import-csv "C:\files\temp\nameOnly.csv"
foreach ($item in $emailCol){
$nameCol | Select *, #{
Name="email";Expression=
{$emailArr[$i]}
} | Export-Csv -path
C:\files\temp\revised.csv -NoTypeInformation
}
Updated: Below is what worked for me. Thanks BenH!
function combineData {
#This function will combine the user CSV file and
#email CSV file into a single file
$emailCol = Get-Content "C:\files\temp\emailOnly.csv"
| Select-Object -skip 1
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" |
Select-Object -skip 1
# Max function to find the larger count of the two
#csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
fullName = $nameCol[$i]
email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv
-NoTypeInformation
}
To prevent some additional questions about this theme let me show you alternative approach. If your both CSV files have same number of lines and each line of the first file corresponds to the first line of the second file and etc. then you can do next. For example, users.csv:
User
Name1
Name2
Name3
Name4
Name5
and email.csv:
Email
mail1#gmail.com
mail2#gmail.com
mail3#gmail.com
mail5#gmail.com
Our purpose:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name4",
"Name5","mail5#gmail.com"
What we do?
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
[Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv | Export-Csv C:\path\to\output.csv
If our purpose is:
"User","Email"
"Name1","mail1#gmail.com"
"Name2","mail2#gmail.com"
"Name3","mail3#gmail.com"
"Name5","mail5#gmail.com"
then:
$c1 = 'C:\path\to\user.csv'
$c2 = 'C:\path\to\email.csv'
([Linq.Enumerable]::Zip(
(Get-Content $c1), (Get-Content $c2),[Func[Object, Object, Object[]]]{$args -join ','}
) | ConvertFrom-Csv).Where{$_.Email} | Export-Csv C:\path\to\output.csv
Hope this helps you in the future.
A for loop would be better suited for your loop. Then use the counter as the index for each of the arrays to build your new object.
$emailCol = Get-Content "C:\files\temp\emailOnly.csv" | Select-Object -Skip 2
$nameCol = Get-Content "C:\files\temp\nameOnly.csv" | Select-Object -Skip 1
# Max function to find the larger count of the two csvs to use as the boundary for the counter.
$count = [math]::Max($emailCol.count,$nameCol.count)
$CombinedArray = for ($i = 0; $i -lt $count; $i++) {
[PSCustomObject]#{
Name = $nameCol[$i]
Email = $emailCol[$i]
}
}
$CombinedArray | Export-Csv C:\files\temp\revised.csv -NoTypeInformation
Answer edited to use Get-Content with an extra skip added to skip the header line in order to handle blank lines.
I have imported a .csv file and i have the first column listed in a combobox in my form. I am trying to match the the selected data from the combobox with the corresponding row. For Example
Office,Server
Chicago,chicago1
New York, newyork1
Los Angeles, la1
When they select the $office, id like to create the next object the $server and reference it somewhere else.
$Offices = #(Import-CSV "C:\source\PrinterTable.csv")
$Array = $Offices.office | Sort-Object
ForEach ($Choice in $Array) {
[void] $objListBox.Items.Add($Choice)
}
$handler_Office_Click=
{
$officeSelected = $objListBox.SelectedItem
$row = $officeSelected | where { $_.office -eq $officeSelected }
$server = $row.server
explorer.exe \\$server
}
I've been googling for hours... please help!
When you have a selected office name, find a row in $offices that has a matching office field. Then select server field from this row.
$row = $offices | where { $_.office -eq $office }
$server = $row.server
run a foreach loop to read each line from the csv till you find the $office you want.
Foreach ($line in $offices) {
If ($line.office -eq $office) {
$server = $line.server
}
}
So i figured out the fix, i had to bring my csv back to my $handler click, final code is
$handler_Office_Click=
{
$officeSelected = $objListBox.SelectedItem
$OffServer= ($PrinterTables | where {$_.office -eq $officeSelected}).server
explorer.exe \\$OffServer
}
$PrinterTables = #(Import-CSV "C:\Program Files (x86)\Helpdesk 2.0\PrinterTables.csv")
$ListedOffices = $PrinterTables.office | Sort-Object
ForEach ($Choice in $ListedOffices) {
[void] $objListBox.Items.Add($Choice)
}
Below is only an example, I have seen a lot of script to breakdown a .CSV file in smaller files but struggling with this.
How can we with PowerShell, find the header indicated by ALPH take each subsequent line, stop when it reaches ALPT (inclusive) and put this text into another file.
The operation will need to run through the whole file and the number of ALPD or ALPC lines will vary.
ALPH can be considered as a header while the information contained is needed as some field value can be different. The only constant are ALPH and ALPT.
ALPH;8102014
ALPC;PK
ALPD;50
ALPD;40
ALPT;5
ALPH;15102014
ALPC;PK
ALPD;50
ALPD;50
ALPD;70
ALPD;70
ALPD;71
ALPD;72
ALPD;40
ALPT;6
ALPH;15102014
ALPC;PK
ALPD;50
ALPD;50
ALPD;40
ALPT;6
If I understood your question correctly, something like this should work:
$csv = 'C:\path\to\your.csv'
$pattern = 'ALPH[\s\S]*?ALPT.*'
$cnt = 0
[IO.File]::ReadAllText($csv) | Select-String $pattern -AllMatches |
select -Expand Matches | select -Expand Groups |
% {
$cnt++
$outfile = Join-Path (Split-Path $csv -Parent) "split${cnt}.csv"
[IO.File]::WriteAllText($outfile, $_.Value)
}
Here is a way using switch. Your original file is in C:\temp\ALPH.CSV here is the way I imagine to find the begin an the end.
$n = 1
switch -File 'C:\temp\ALPH.CSV' -Regex
{
'^ALPH.*' {
Write-Host "Begin $n"
}
'^ALPT.*' {
Write-Host "End $n"
$n++
}
}
Now saving lines to a var and exporting files :
$n = 1
$csvTmp = #()
switch -File 'C:\temp\ALPH.CSV' -Regex
{
'^ALPH.*' {
Write-Host "Begin $n"
$csvTmp += $_
}
'^ALPT.*' {
Write-Host "End $n"
$csvTmp += $_
$csvTmp | Set-Content "c:\temp\file$n.csv"
$csvTmp = #()
$n++
}
default {
$csvTmp += $_
}
}
I am trying to parse robocopy log files to get file size, path, and date modified. I am getting the information via regex with no issues. However, for some reason, I am getting an array with a single element, and that element contains 3 hashes. My terminology might be off; I am still learning about hashes. What I want is a regular array with multple elements.
Output that I am getting:
FileSize FilePath DateTime
-------- -------- --------
{23040, 36864, 27136, 24064...} {\\server1\folder\Test File R... {2006/03/15 21:08:01, 2010/12...
As you can see, there is only one row, but that row contains multiple items. I want multiple rows.
Here is my code:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent = New-Object System.Collections.Generic.List[PSCustomObject]
Get-Content $Path\$InFile -ReadCount $Batch | ForEach-Object {
$FileSize = $_ -match $Match_Regex -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -match $Match_Regex -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -match $Match_Regex -replace $Replace_Regex,('$3').Trim()
$Props = #{
FileSize = $FileSize;
DateTime = $DateTime;
FilePath = $FilePath
}
$Obj = [PSCustomObject]$Props
$MainContent.Add($Obj)
}
$MainContent | % {
$_
}
What am I doing wrong? I am just not getting it. Thanks.
Note: This needs to be as fast as possible because I have to process millions of lines, which is why I am trying System.Collections.Generic.List.
I think the problem is that for what you're doing you actually need two foreach-object loops. Using Get-Content with -Readcount is going to give you an array of arrays. Use the -Match in the first Foreach-Object to filter out the records that match in each array. That's going to give you an array of the matched records. Then you need to foreach through that array to create one object for each record:
[regex]$Match_Regex = "^.{13}\s\d{4}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}\s.*$"
[regex]$Replace_Regex = "^\s*([\d\.]*\s{0,1}\w{0,1})\s(\d{4}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2})\s(.*)$"
$MainContent =
Get-Content $Path\$InFile -ReadCount $Batch |
ForEach-Object {
$_ -match $Match_Regex |
ForEach-Object {
$FileSize = $_ -replace $Replace_Regex,('$1').Trim()
$DateTime = $_ -replace $Replace_Regex,('$2').Trim()
$FilePath = $_ -replace $Replace_Regex,('$3').Trim()
[PSCustomObject]#{
FileSize = $FileSize
DateTime = $DateTime
FilePath = $FilePath
}
}
}
You don't really need to use the collection as an accumulator, just output PSCustomObjects, and let them accumulate in the result variable.