I have a sample array and I try to create a rule which will filter our either of the 2 items that are present in the array.
Sample array:
$d_c_arr = #('idata', 'quanthouse', 'reuters', 'bloomberg', 'fidelity', 'nasdaq')
The items that I want to filter out are 'idata' and 'quanthouse'.
The filtering rule should adapt to the situations where either both 'idata' and 'quanthouse' exist or only one of them is present in an array.
My attempt to define the rule:
$d_c_arr | Where-Object { $_ -notlike 'idata' -or $_ -notlike 'quanthouse' }
My expected output would be:
reuters
bloomberg
fidelity
nasdaq
However, the actual output is:
idata
quanthouse
reuters
bloomberg
fidelity
nasdaq
How to properly define the rule which would filter either of the 2 selected items in the array?
$d_c_arr | sls -pattern 'idata|quanthouse' -notmatch
[LIVE]
Your object are .net strings so can use select-string cmdlet.
select-string takes regex as its pattern.
Related
Is there a non for-loop way to remove some items from a arrayList?
$remotesumerrors = $remoteFiles | Select-String -Pattern '^[a-f0-9]{32}( )' -NotMatch
I want to remove the output of the above from the $remoteFiles var.. is there some pipe way to remove them?
Assuming all of the following:
you do need the results captured in $remotesumerrors separately
that $remoteFiles is a collection of System.IO.FileInfo instances, as output by Get-ChildItem, for instance
it is acceptable to save the result as an invariably new collection back to $remoteFiles,
you can use the .Where() array method as follows (this outperforms a pipeline-based solution based on the Where-Object cmdlet):
# Get the distinct set of the full paths of the files of origin
# from the Select-String results stored in $remotesumerrors
# as a hash set, which allows efficient lookup.
$errorFilePaths =
[System.Collections.Generic.HashSet[string]] $remotesumerrors.Path
# Get those file-info objects from $remoteFiles
# whose paths aren't in the list of the paths obtained above.
$remoteFiles = $remoteFiles.Where({ -not $errorFilePaths.Contains($_.FullName) })
As an aside:
Casting a collection to [System.Collections.Generic.HashSet[T]] is a fast and convenient way to get a set of distinct values (duplicates removed), but note that the resulting hash set's elements are invariably unordered and that, with strings, lookups are by default case-sensitive - see this answer for more information.
Use the Where-Object cmdlet to filter the list:
$remoteFiles = $remoteFiles |Where-Object { $_ |Select-String -Pattern '^[a-f0-9]{32}( )' -NotMatch }
If it truly was a [collections.arraylist], you could remove an element by value. There's also .RemoveAt(), to remove by array index.
[System.Collections.ArrayList]$array = 'a','b','c','d','e'
$array.remove
OverloadDefinitions
-------------------
void Remove(System.Object obj)
void IList.Remove(System.Object value)
$array.remove('c')
$array
a
b
d
e
Let assume that $remoteFiles is a file object of type System.IO.FileInfo. I also assume that you want to filter based on filename.
$remotesumerrors = $remoteFiles.name | Select-String -Pattern '^[a-f0-9]{32}' -NotMatch
What are trying to do with "( )" or what is query that you want to do.
edit: corrected answer based on comment
first of all, I've got a reliable search (thanks to some help on Stack Overflow) that checks for occurrences of different strings in a line over many log files.
I've now been tasked to include multiple searches and since there are about 20 files and about a dozen search criteria, I don't want to to have to access these files over 200 times. I believe the best way of doing this is in a array, but so far all methods I've tried have failed.
The search criteria is made up of date, which obviously changes very day, a fixed string (ERROR) and a unique java classname. Here is what i have:
$dateStr = Get-Date -Format "yyyy-MM-dd"
$errword = 'ERROR'
$word01 = [regex]::Escape('java.util.exception')
$pattern01 = "${dateStr}.+${errword}.+${word01}"
$count01 = (Get-ChildItem -Filter $logdir -Recurse | Select-String -Pattern $pattern01 -AllMatches |ForEach-Object Matches |Measure-Object).Count
Add-Content $outfile "$dateStr,$word01,$count01"
the easy way to expand this is to have a separate three command entry (set word, set pattern and then search) for each class i want to search against - which I've done and it works, but its not elegant and then we're processing >200 files to run the search. I've tried to read the java classes in from a simple text file with mixed results, but its the only thing I've been able to get to work in order to simplify the search for 12 different patterns.
iRon provided an important pointer: Select-String can accept an array of patterns to search for, and reports matches for lines that match any one of them.
You can then get away with a single Select-String call, combined with a Group-Object call that allows you to group all matching lines by which pattern matched:
# Create the input file with class names to search for.
#'
java.util.exception
java.util.exception2
'# > classNames.txt
# Construct the array of search patterns,
# and add them to a map (hashtable) that maps each
# pattern to the original class name.
$dateStr = Get-Date -Format 'yyyy-MM-dd'
$patternMap = [ordered] #{}
Get-Content classNames.txt | ForEach-Object {
$patternMap[('{0}.+{1}.+{2}' -f $dateStr, 'ERROR', [regex]::Escape($_))] = $_
}
# Search across all files, using multiple patterns.
Get-ChildItem -File -Recurse $logdir | Select-String #($patternMap.Keys) |
# Group matches by the matching pattern.
Group-Object Pattern |
# Output the result; send to `Set-Content` as needed.
ForEach-Object { '{0},{1},{2}' -f $dateStr, $patternMap[$_.Name], $_.Count }
Note:
$logDir, as the name suggests, is presumed to refer to a directory in which to (recursively) search for log files; passing that to -Filter wouldn't work, so I've removed it (which then positionally binds $logDir to the -Path parameter); -File limits the results to files; if other types of files are also present, add a -Filter argument as needed, e.g. -Filter *.log
Select-String's -AllMatches switch is generally not required - you only need it if any of the patterns can match multiple times per line and you want to capture all of those matches.
Using #(...), the array-subexpression operator around the collection of the hashtable's keys, $patternMap.Keys, i.e. the search patterns, is required purely for technical reasons: it forces the collection to be convertible to an array of strings ([string[]]), which is how the -Pattern parameter is typed.
The need for #(...) is surprising, and may be indicative of a bug, as of PowerShell 7.2; see GitHub issue #16061.
I am trying to filter out lines out of a csv that contain any of the values in an array.
Using this post as reference:
Use -notlike to filter out multiple strings in PowerShell
I managed to get it working with this format:
Import-Csv "$LocalPath\Stripped1Acct$abbrMonth$Year.csv" |
where {$_."SubmitterName" -notlike "*${Report2a}*"
-and $_."SubmitterName" -notlike "*${Report2b}*"
-and $_."SubmitterName" -notlike "*${Report2c}*"} |
Export-Csv "$LocalPath\Stripped2Acct$abbrMonth$Year.csv" -NoTypeInformation
Eventually I plan to rewrite the script so it will pull the exclusion list from a text file generated by an end user. In order to do that, I'll have to have it access values in an array. I tried doing that with the following syntax, but it didn't work as intended:
Import-Csv "$LocalPath\Stripped1Acct$abbrMonth$Year.csv" |
where {$_."SubmitterName" -notlike "*${Report2[0]}*"
-and $_."SubmitterName" -notlike "*${Report2[1]}*"
-and $_."SubmitterName" -notlike "*${Report2[2]}*"} |
Export-Csv "$LocalPath\Stripped2Acct$abbrMonth$Year.csv" -NoTypeInformation
I have a feeling it's just an syntax issue, but after playing around with it for far too long, I've run out of ideas.
I have a feeling it's a syntax issue
This is a syntax issue. The ${Name} syntax is used primarily for names that contain odd characters, like ${A ~*Strange*~ Variable Name}. It's not an expression though, so you can't index into it with [0] inside the braces; that would be taken as a literal part of the variable name.
Instead you can use a sub-expression $(...) to do this:
"*$($Report2[0])*"
As an alternative approach, I might convert your whole array into a single regular expression and then use the -match (or -notmatch) operator:
$regex = $Report2.ForEach({ [RegEx]::Escape($_) }) -join '|'
Import-Csv "$LocalPath\Stripped1Acct$abbrMonth$Year.csv" |
where {$_."SubmitterName" -notmatch $regex} |
Export-Csv "$LocalPath\Stripped2Acct$abbrMonth$Year.csv" -NoTypeInformation
This takes the $Report2 array, then builds an array of the same values, but escaped for Regular Expressions (so that any special characters are matched literally), and then builds a regex that looks like:
Item1|Item2|Item3
In RegEx, a pipe is alternation, so it looks for a match of Item1 or Item2, etc. Regex finds it anywhere in the string so it doesn't need a wildcard character the way that -like does.
So with that built to pre-contain all items in your array, then you can use -notmatch to achieve the same thing, and you don't have to hardcode a bunch of indices.
you can use contains too like this
short version
[string[]]$listexludevalue=Get-Content "C:\temp\exludevalue.txt"
Import-Csv "$LocalPath\Stripped1Acct$abbrMonth$Year.csv" | %{$valcontain=$true; $col=$_.Owner; $listexludevalue.ForEach({$valcontain=$valcontain -and !$col.Contains($valuetotest)}); if ($valcontain) {$_} } | Export-Csv "$LocalPath\Stripped2Acct$abbrMonth$Year.csv" -NoTypeInformation
detailed version :
$listexludevalue=Get-Content "C:\temp\exludevalue.txt"
Import-Csv "$LocalPath\Stripped1Acct$abbrMonth$Year.csv" |
% {
$valcontain=$true
foreach ($valuetotest in $listexludevalue) {$valcontain=$valcontain -and !$_.SubmitterName.Contains($valuetotest)}
if ($valcontain) {$_}
} | Export-Csv "$LocalPath\Stripped2Acct$abbrMonth$Year.csv" -NoTypeInformation
I'm looking for the negative intersection of two arrays. Each array has about 20k elements. I'm using a foreach loop over one array and looking each value up in the other array. I'm only keeping elements in the first array not found in the second array:
$deadpaths=#()
$ix=0
ForEach ($f in $FSBuildIDs)
{
if (-not($blArray -like $f)) {$deadpaths+=$paths[$ix]}
$ix++
}
$blArray contains valid IDs. $FSBuildIDs contains the IDs corresponding to the file system paths in $paths. The intent is to only keep the elements in $paths where the corresponding ID in $FSBuildIDS is NOT in $blArray.
Is there a better way to do this? The processing here takes an extremely long time. Both $blArray and $FSBuildIDs have about 20k elements and I suspect I'm looking at On^2 comparisons.
I thought about using a Dictionary with the elements of $FSBuildIDs as the keys and $paths as the values, but I can't figure out from the docs how to initialize and load the Dictionary (assuming this approach would speed things up). Obviously negative set intersection would be best but this isn't TSQL and I'm painfully aware that even V4 of PS doesn't support set operations.
Would using a dictionary in this problem speed up the comparisons? If so how do I create it from $FSBuildIDs and $paths? Any other techniques that might give me a performance boost vs. just iterating over these large(ish) lists?
Sample data for $blArray:
51012
51044
51049
51055
51058
51060
51073
51074
51077
51085
Sample data for $FSBuildIDs:
51001
51003
51005
51009
51013
51017
51018
51020
51021
51024
51026
Sample data for $paths:
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2335
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2336
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2337
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2338
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2339
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2340
\\server1\d$\software\anthill\var\artifacts\0000\3774\0000\3792\0005\2341
This is similar to the question posed previously, but different in some aspects. I'm essentially looking for guidance on constructing a dictionary from two existing arrays. I realized after posting that I really need a dictionary from $blarray as the keys and maybe $True as the value. The value is irrelevant. The important test is whether or not the current value in $FSBuildIDs is found in $blarray. That could be a dictionary lookup based on the ID as the key. That should speed up the processing, right?
I'm not clear on the comment that I'm destroying and recreating the array each time. Is that the $deadPaths array? Simply adding to it causes that? If so would I be better using a .Net ArrayList?
You could achieve a significant improvement by using the -contains operator instead of -like.
When the left-hand side of a -like operation is an array, PowerShell will iterate the array and perform a -like comparison against each and every entry.
-contains, on the other hand, returns as soon as a match is found.
Consider the following example:
$array1 = 1..2000
$array2 = 2..2001
$like = Measure-Command {
foreach($i in $array2){
$array1 -like $i
}
} |Select -Expand TotalMilliseconds
$contains = Measure-Command {
foreach($i in $array2){
$array1 -contains $i
}
} |Select -Expand TotalMilliseconds
Write-Host "Operation with -like took: $($like)ms"
Write-Host "Operation with -contains took: $($contains)ms"
Just like in your real-world example, we have 2 integer arrays with a large overlap. Let's see how it performs on my Windows 7 laptop (PowerShell 4.0):
I think the result speaks for itself :-)
That being said, you could, as you seem to anticipate, achieve an even greater improvement by populating a hashtable, using the values from the first array as keys:
$hashtable = $array1 |ForEach-Object -Begin {$t = #{}} -Process {
$t[$_] = $null
# the value doesn't matter, we're only interested in the key lookup
} -End { $t }
and then use the ContainsKey() method on the hashtable instead of -like:
foreach($i in $array2){
if($hashtable.ContainsKey($i)) { # do stuff }
}
You'll need to bump up the size of the array to see the actual difference (here using 20K items in the first array):
Final test script can be found here
I think this would be the start of what you are looking for. As discussed in comments we are going to do two comparisons. First to get the BuildID's we need to compare from from $FSBuildIDs and $blArray then we take the result of that to compare against the list of $paths. I am going to assume that it is just a string array of paths for now. Note there is room for error prevention and correction here. Still just testing for now.
$parsedIDs = Compare-Object $blArray $FSBuildIDs | Where{$_.SideIndicator -eq "=>"} | Select-Object -ExpandProperty InputObject
$paths = $paths | ForEach-Object{
$_ | Add-Member -MemberType NoteProperty -Name BuildID -Value (($_.Parent.Name + $_.Name) -as [int32]) -PassThru
}
$paths | Where-Object{$_.BuildID -in $parsedIDs}
First we compare the two ID arrays and keep the unique elements of $FSBuildIDs.
Next we go through the $paths. For each one we add a property that contains buildid. Where the buildid is the last two path elements concatenated and converted to an integer.
Once we have that a simple Where-Object give us the paths that have an id present from the first comparison.
To answer the question about building a hashtable:
$keyEnumerator = $FSBuildIDs.GetEnumerator()
$valEnumerator = $paths.GetEnumerator()
$idPathHash = #{}
foreach ($key in $keyEnumerator ) {
$null = $valEnumerator.movenext()
$idPathHash[$key] = $valEnumerator.current
}
Running this code on my system with a 20000 element array of fake data took 138ms.
To build the list of build ids not in the $idPathHash:
$buildIDsNotIn =
foreach ($buildId in $blArray) {
if (!$idPathHash.ContainsKey($buildId )) {
$buildId
}
}
This took 50ms on my system, with 20000 items in $blArray, again with fake data.
PS noob here (as will be obvious shortly) but trying hard to get better. In my exchange 2010 environment I import and export huge numbers of .pst files. Many will randomly fail to queue up and once they're not in the queue it's very tedious to sort through the source files to determine which ones need to be run again so I'm trying to write a script to do it.
first I run a dir on the list of pst files and fill a variable with the associated aliases of the accounts:
$vInputlist = dir $vPath -Filter *.pst |%{ get-mailbox -Identity $_.basename| select alias}
Then I fill a variable with the aliases of all the files/accounts that successfully queued:
$vBatch = foreach ($a in (Get-MailboxImportRequest -BatchName $vBatchname)) {get-mailbox $a.mailbox | select alias}
Then I compare the two arrays to see which files I need to queue up again:
foreach($should in $vInputlist){if ($vBatch -notcontains $should){Write-Host $should ""}}
It seems simple enough yet the values in the arrays never match, or not match, as the case may be. I've tried both -contains and -notcontains. I have put in a few sanity checks along the way like exporting the variables to the screen and/or to csv files and the data looks fine.
For instance, when $vInputlist is first filled I send it to the screen and it looks like this:
Alias
MapiEnableTester1.psiloveyou.com
MapiEnableTester2.psiloveyou.com
MapiEnableTester3.psiloveyou.com
MapiEnableTester4.psiloveyou.com
Yet that last line of code I displayed above (..write-host $should,"") will output this:
#{Alias=MapiEnableTester1.psiloveyou.com}
#{Alias=MapiEnableTester2.psiloveyou.com}
#{Alias=MapiEnableTester3.psiloveyou.com}
#{Alias=MapiEnableTester4.psiloveyou.com}
(those all display as a column, not sure why they won't show that way here)
I've tried declaring the arrays like this, $vInputlist = #()
I've tried instead of searching for the alias just cleaning .pst off off the $_.basename using .replace
I've searched on comparing arrays til I'm blue in the fingers and I don't think my comparison is wrong, I believe that somehow no matter how I fill these variables I am corrupting or changing the data so that seemingly matching data simply doesn't.
Any help would be greatly appreciated. TIA
Using -contains to compare objects aren't easy because the objects are never identical even though they have the same property with the same value. When you use select alias you get an array of pscustomobjects with the property alias.
Try using the -expand parameter in select, like
select -expand alias
Using -expand will extract the value of the alias property, and your lists will be two arrays of strings instead, which can be compared using -contains and -notcontains.
UPDATE I've added a sample to show you what happends with your code.
#I'm creating objects that are EQUAL to the ones you have in your code
#This will simulate the objects that get through the "$vbatch -notcontains $should" test
PS > $arr = #()
PS > $arr += New-Object psobject -Property #{ Alias="MapiEnableTester1.psiloveyou.com" }
PS > $arr += New-Object psobject -Property #{ Alias="MapiEnableTester2.psiloveyou.com" }
PS > $arr += New-Object psobject -Property #{ Alias="MapiEnableTester3.psiloveyou.com" }
PS > $arr | ForEach-Object { Write-Host $_ }
#{Alias=MapiEnableTester1.psiloveyou.com}
#{Alias=MapiEnableTester2.psiloveyou.com}
#{Alias=MapiEnableTester3.psiloveyou.com}
#Now this is what you will get if you use "... | select -expand alias" instead of "... | select alias"
PS > $arrWithExpand = $arr | select -expand alias
PS > $arrWithExpand | ForEach-Object { Write-Host $_ }
MapiEnableTester1.psiloveyou.com
MapiEnableTester2.psiloveyou.com
MapiEnableTester3.psiloveyou.com