I have a .csv with a few hundred records that I need to dissect into several different files. There is a part of the code that takes an array of objects and filters the file based on the array. It works great for the part that finds things equal to whats in the array, but when I try to filter based on whats not contained in the array it ignores any version of a "not equal" operator I can find. I think it has something to do with the data type, but can't figure why it would make a difference when the equal operator works.
CSV File
"Number","Ticket","Title","Customer User","CustomerID","Accounted time","Billing"
"1","2014041710000096","Calendar issues","george.jetson","Widget, Inc","0.25","Labor",""
"2","2014041710000087","Redirected Folder permission","jane.smith","Mars Bars, Inc.","1","Labor",""
"3","2014041610000203","Completed with No Files Changed ""QB Data""","will.smith","Dr. Smith","0","Labor",""
PowerShell Code
$msaClients = #("Widget, Inc","Johns Company")
$billingList = import-csv "c:\billing\billed.csv"
$idZero = "0"
$msaArray = foreach ($msa in $msaClients) {$billingList | where-object {$_.CustomerID -eq $msa -and $_."Accounted time" -ne $idZero}}
$laborArray = foreach ($msa in $msaClients) {$billingList | where-object {$_.CustomerID -ne $msa -and $_."Accounted time" -ne $idZero}}
$msaArray | export-csv c:\billing\msa.csv -notypeinformation
$laborArray | export-csv c:\billing\labor.csv -notypeinformation
I have tried all the different logical operators for the not equal and it just seems to ignore that part. I have much more to the code if something doesn't seem right.
What am I missing, and Thanks in advance for any help!
If i understand this correct, you want the values in $msaArray, where $billingList contains customerIDs which are present in $msaClients but their corresponding Accounted time should not be eual to $idzero( 0 in this case)
PS C:\> $msaArray = ($billingList | where {(($msaclients -contains $_.customerid)) -and ($_.'accounted time' -ne $idzero)})
PS C:\> $msaArray | ft -auto
Number Ticket Title Customer User CustomerID Accounted time Billing
------ ------ ----- ------------- ---------- -------------- -------
1 2014041710000096 Calendar issues george.jetson Widget, Inc 0.25 Labor
And for $laborArray, where $billingList does not contain customerIDs which are present in $msaClients and their corresponding Accounted time should not be eual to $idzero as well( 0 in this case)
PS C:\> $laborArray = ($billingList | where {(!($msaclients -contains $_.customerid)) -and ($_.'accounted time' -ne $idZero)})
PS C:\> $laborArray | ft -auto
Number Ticket Title Customer User CustomerID Accounted time Billing
------ ------ ----- ------------- ---------- -------------- -------
2 2014041710000087 Redirected Folder permission jane.smith Mars Bars, Inc. 1 Labor
Your -ne operator is working, but you are looping too many times in $msaclients to get $laborArray.i.e, when $msa = "Widget, Inc", you got "Mars Bars, Inc." as output, but foreach loop ran again and $msa value changed to "Johns Company" and in this case you got "Mars Bars, Inc." and "Widget, Inc" as output too. Hence you ended up with three outputs.
Related
I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values
I am trying to locate discrepancies in BIND DNS records. I would like to output a CSV file that only has those discrepancies. I have a CSV file that has all records from all locations in BIND (ns.prvt, ns.pub, common, includes). What I'm trying to figure out is how to output a CSV that only shows the discrepancies. For 2 records to be considered a discrepancy, they must meet the following criteria:
Both records have the same RecordName and RecordType.
Both records have different Data or TTL.
Both records come from different locations.
I am almost there with the following script but it keeps showing me a couple of rows that don't necessarily meet the above criteria.
$Records = Import-Csv C:\Temp\Domain_ALL.csv | Select * | Sort Data,Location
$RecordsRev = #()
$Records | % {
$Record = $_
$Records | % {
$DataFE = $_
If (
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).RecordName -eq $DataFE.RecordName) -and
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).RecordType -eq $DataFE.RecordType) -and
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).Location -ne $DataFE.Location) -and
(([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).Data -ne $DataFE.Data) -or
([string]($Record | ? {($_.RecordName -eq $DataFE.RecordName)}).TTL -ne $DataFE.TTL))
) {
$RecordsRev += $_
}
}
}
$RecordsRev | Export-Csv C:\Temp\Domain_Discrepancies.csv -NoType
The results that I get are:
RecordName RecordType Data TTL Location
---------- ---------- ---- --- --------
domain.com TXT "MS=abc1234566" 600 Includes
domain.com TXT "MS=abc1234566" 600 Common
domain.com TXT "site-verification=abcd1234" 600 Includes
domain.com TXT "site-verification=abcd1234" 600 Common
www CNAME somedomain.com.test. 600 Includes
www CNAME somedomain.com. 600 Common
The results that I expect are:
RecordName RecordType Data TTL Location
---------- ---------- ---- --- --------
www CNAME somedomain.com.test. 600 Includes
www CNAME somedomain.com. 600 Common
How do I delete all duplicated rows in the array? This is different from "Select * -unique" as I don't want to keep any row that contains the duplicated information.
EDIT: I think the main problem is that, since the script checks each record against every record in the CSV, it technically is a discrepancy. For example, in the below table, record 1 meets the criteria to be a discrepancy because it differs from record 4. However, since record 1 is the same as record 2, it should actually be omitted from the results.
RecordNumber RecordName RecordType Data TTL Location
------------ ---------- ---------- ---- --- --------
1 domain.com TXT "MS=abc1234566" 600 Includes
2 domain.com TXT "MS=abc1234566" 600 Common
3 domain.com TXT "site-verification=abcd1234" 600 Includes
4 domain.com TXT "site-verification=abcd1234" 600 Common
5 www CNAME somedomain.com.test. 600 Includes
6 www CNAME somedomain.com. 600 Common
Any help would be greatly appreciated.
Kyle
I was able to figure this out with the help of someone who deleted their post... Here is the script that I am using now to find all records that meet ALL of the following criteria:
Both records have the same RecordName and RecordType. -AND
Both records have different Data or TTL. -AND
Both records come from different locations.
$Records = Import-Csv C:\Temp\Domain_ALL.csv | Select * | Sort Data,Location
$Discrepancies = #()
$GoodRecords = #()
$BadRecords = #()
$Records | ForEach-Object {
# for each record $_, compare it against every other record..
foreach ($R in $Records) {
# if Both records have the same RecordName and RecordType..
if (($_.RecordName -eq $R.RecordName) -and ($_.RecordType -eq $R.RecordType)) {
# and if Both records come from different locations..
if ($_.Location -ne $R.Location) {
# if Both records have the same Data and TTL then they are considered good:
if (($_.Data -eq $R.Data) -and ($_.TTL -eq $R.TTL)) {
$GoodRecords += $_
}
Else{
# if Both records have different Data or TTL then they are considered bad:
$BadRecords += $_
}
}
}
}
}
ForEach ($BadRecord in $BadRecords){
If (($GoodRecords -notcontains $BadRecord)){
$Discrepancies += $BadRecord
}
}
$Discrepancies | Select * -Unique | Sort RecordName,Location,Data | ft
I'm currently having a bit of trouble with my current project. I have two arrays - the first array contains reference values for disk size:
$RequiredDisks0 = New-Object System.Object
$RequiredDisks0 | Add-Member -Type NoteProperty -Name "DeviceID" -Value "C:"
$RequiredDisks0 | Add-Member -Type NoteProperty -Name "SizeGB" -Value "80"
The second array contains the disk information of the underlying system:
$SystemDisks = Get-WmiObject Win32_LogicalDisk |
Where {$_.DriveType -eq 3} |
select DeviceID,
#{Name="Size(GB)";Expression={[decimal]("{0:N0}" -f ($_.Size/1gb))}}
What I would like to do, is check the given array against the reference array to see if any given disks are smaller than required. I've found out that I can compare the arrays by using
Compare-Object -ReferenceObject $RequiredDisks -DifferenceObject $SystemDisks -Property SizeGB,DeviceID
And I indeed receive the differences as follows:
SizeGB DeviceID SideIndicator
------ -------- -------------
99 C: =>
15 H: =>
100 I: =>
80 C: <=
25 H: <=
200 I: <=
Where I'm having trouble is working with the output. The result I'd like to achieve is an output stating "Disk n is smaller than required!". I know that everything with the side indicator "<=" is the required value and everything with the "=>" side indicator is the given value. I've tried a foreach statement but I am unable to process the data as needed - I need to check the given value against the required value and if it's smaller, tell me so. How can I again compare these values as required? Basically a "foreach object where SideIndicator is <= compare to object where SideIndicator is => and DeviceID equals DeviceID". How do I translate that into proper code?
It looks to me like the Compare-Object is doing a double comparison on both properties. The documentation or another StackOverflow soul may be able to help with that command.
My approach would be to translate your pseudo-code into code:
foreach ($disk in $SystemDisks){
$ref = $RequiredDisks | Where-object {$_.DeviceID -eq $disk.DeviceID}
if([int]($disk.SizeGB) -lt [int]($ref.SizeGB){
Write-Output "Disk $($disk.DeviceID) is smaller than required!"
}
}
I've got a Powershell script that uses Get-EventLog to search for events 6005, 6006, and 6008 on remote servers; after a little manipulation, it returns the results in an array.
$eventData += Get-EventLog -computerName $server_name -LogName system `
-After $prevMonthBegin -Before $prevMonthEnd |
Where-Object `
{ $_.eventid -eq 6005 -OR $_.eventID -eq 6006 -OR $_.eventID -eq 6008 } |
Select timegenerated, eventid, message, index, recordid | sort-object timegenerated
I loop through the results, and assign either down or up to the $eventData.message field and sort it again by timegenerated This returns an array like so (apologies for the formatting):
$eventData | Sort-Object timegenerated | format-table
TimeGenerated EventID Message Index
------------- ------- ------- ----- --------
8/3/2014 5:30:02 AM 6006 down 0
8/3/2014 5:30:47 AM 6005 up 0
8/24/2014 5:31:00 AM 6005 up 0
8/31/2014 2:34:59 AM 6008 down 1
8/31/2014 5:30:04 AM 6006 down 0
8/31/2014 5:30:59 AM 6005 up 0
8/31/2014 5:36:26 AM 6005 up 0
How can I create a new array from these results and arrange the TimeGenerated into up/down pairs? I would prefer to disregard the first event of the array if it is anup (so there's always down then up in the pairs), but I can work around this. It's more important to never have two consecutive down or up events, as in the example output above.
I was thinking of maybe iterating through the events and toggling a temp variable between 1 and 0 (or up and down) but that seems clunky and I just can't make anything work. It shouldn't be this hard, but that's why I'm asking for help. Let me know if providing more code would be useful; there's a quite a bit of it and I didn't want to just dump pages of it.
Here's an example of the format for the array that I would like to have. The method that I'm using pairs these events as 12121212 without looking at what types of events they are. So in the above example, the last event pair would calculate downtime where it did not exist because there were two up (6005) events.
Downtime Uptime CrashEvent?
8/3/2014 5:30:02 AM 8/3/2014 5:30:47 AM 0
8/24/2014 5:31:00 AM 8/31/2014 2:34:59 AM 0
8/31/2014 2:34:59 AM 8/31/2014 5:30:04 AM 1
8/31/2014 5:30:59 AM 8/31/2014 5:36:26 AM 0
Ok, this should select the last of any Down or Up event before the next of the opposite type (last Down before an Up, and last Up before a Down) which I think is what you wanted. Basically match each time the system goes down, with the time it came back up, using the last uptime/downtime notification if there are multiple in a row. I think your whole 'Discard the first of any sequential duplicate event type' is in error, but this does exactly that.
Event collection duplicated from Noah Sparks with the Select argument trimmed down a bit. Then it goes into a Switch creating a custom object whenever there is a Downtime event, and then setting the uptime for each uptime event, and outputting it whenever the next downtime event happens and starting over again.
$Events = Get-EventLog -LogName system |
Where-Object { $_.eventid -eq 6005 -OR $_.eventID -eq 6006 -OR $_.eventID -eq 6008 } |
Select timegenerated, eventid | sort-object timegenerated
$record=$null
$Output = Switch($Events){
{$_.EventID -match "600(6|8)"}{if(![string]::IsNullOrEmpty($Record.up)){$record}
$Record=[pscustomobject][ordered]#{
'Down'=$_.timegenerated
'Up'=$null
'Expected'=If($_.EventID -eq 6006){$true}else{$false}
}}
{$_.EventID -eq "6005"}{$Record.Up = $_.TimeGenerated}
}
$Output += $record
$Output
Personally I'd go with the first uptime even after a downtime event happens, but that's just me. That would take some different coding. Anyway, the results of that look like this (using my own event log entries):
Down Up Expected
---- -- --------
11/20/2013 8:47:42 AM 11/20/2013 8:49:11 AM True
11/20/2013 3:50:14 PM 12/13/2013 9:14:52 AM True
12/13/2013 9:26:21 AM 12/13/2013 9:27:42 AM False
12/13/2013 3:40:07 PM 12/13/2013 3:41:31 PM True
1/9/2014 1:13:31 PM 1/16/2014 12:38:08 PM True
1/16/2014 12:39:21 PM 1/16/2014 12:48:44 PM True
If you'll look at the second listing there, I know my computer was not down from 11/20 to 12/13, but discarding the first of sequential duplicates that is how it comes across. If we didn't do that it would show the system coming back up just one minute after going down, which is much more likely for my computer.
Here is an answer for how to do it that should skip duplicate entries. However, you will likely want to adjust the events you are querying otherwise you end up with down/up conditions occurring at the same time.
$Events = Get-EventLog -LogName system |
Where-Object { $_.eventid -eq 6005 -OR $_.eventID -eq 6006 -OR $_.eventID -eq 6008 } |
Select timegenerated, eventid, message, index, recordid | sort-object timegenerated
[array]$FullObj = [pscustomobject] #{
'Time' = ''
'Status' = ''
'Message' = ''
}
Foreach ($Event in $Events)
{
If ($Event.Message -eq 'The Event log service was started.' -and $fullobj.status[-1] -ne 'UP')
{
[array]$FullObj += [pscustomobject] #{
'Time' = $Event.TimeGenerated
'Status' = 'UP'
'Message' = $Event.Message
}
}
Elseif ($fullobj.status[-1] -eq 'UP' -and $fullobj.Message[-1] -notlike "The previous system shutdown*")
{
[array]$FullObj += [pscustomobject] #{
'Time' = $Event.TimeGenerated
'Status' = 'DOWN'
'Message' = $Event.Message
}
}
}
$FullObj | select time,status
I have the following powershell script:
$grouped_TPR_Test1=Import-Csv c:\TPR.csv | group UPC -AsHashTable -AsString
Import-Csv c:\HQ.csv | foreach{
$tpr_Sales=($grouped_TPR_Test1."$($_.UPC)" | foreach {$_.TPR_Sales}) -join ","
$_ | Add-Member -MemberType NoteProperty -Name TPR_SALES -Value $tpr_Sales -PassThru
} | Export-Csv -NoTypeInformation c:\HQ_TPR_sales.csv
It finds/matches the UPC value in file c:\TPR.csv with the same value in this file. c:\HQ.csv and outputs the corresponding sales data from to a 3rd file that includes all fields in c:\HQ.csv as well as the additional ones that match on UPC from c:\TPR.csv
This works.
However, I am not sure how to add a second field to check ("Zone", to narrow down the results that are sent to the 3rd output file. Both files have the zone field as well.
I read a bit on this and an array seems better suited for multiple criteria, rather than a hashtable, but I'm not having much luck.
c:\HQ.csv looks essentially like this:
UPC ZONE column1 column2 column3
1234567890123 3 blah1 blah2 blah3
c:\TPR.csv looks essentially like this:
UPC ZONE sales
1234567890123 3 5.00
1234567890123 2 4.00
3210987654321 2 3.00
Any help is appreciated.
Thanks!
You could simply just use WHERE-OBJECT on the resulting file to just pull out the zone(s) you are interested in:
Import-CSV c:\HQ_TPR_sales.csv | WHERE-OBJECT {$_.Zone -eq 2}
Or put the WHERE-OBJECT into your pipe before you export-csv:
... | ? {$_.Zone -eq 2} | Export-CSV ...