I'm new to PowerShell and I'm sure I'm not using best practices here. I've been working on this PowerShell script to compare two XML files. I start out by looping through each of the XML files and throwing the data into PS objects:
Here are some samples of the XML data:
XML file 1
<RESULTS>
<ROW>
<COLUMN NAME="ATTR1"><![CDATA[123456ABCDEF]]></COLUMN>
<COLUMN NAME="ATTR2"><![CDATA[1.0.4.0]]></COLUMN>
<COLUMN NAME="ATTR3"><![CDATA[Google.com]]></COLUMN>
<COLUMN NAME="ATTR4"><![CDATA[Lorem ipsum]]></COLUMN>
<COLUMN NAME="ATTR5"><![CDATA[This is some text]]></COLUMN>
</ROW>
<ROW>
<COLUMN NAME="ATTR1"><![CDATA[123456ABCDEF]]></COLUMN>
<COLUMN NAME="ATTR2"><![CDATA[2.0.0.1]]></COLUMN>
<COLUMN NAME="ATTR3"><![CDATA[HelloWorld.com]]></COLUMN>
<COLUMN NAME="ATTR4"><![CDATA[Lorem ipsum]]></COLUMN>
<COLUMN NAME="ATTR5"><![CDATA[This is some text]]></COLUMN>
</ROW>
<ROW>
<COLUMN NAME="ATTR1"><![CDATA[123456ABCDEF]]></COLUMN>
<COLUMN NAME="ATTR2"><![CDATA[5.6.7.0]]></COLUMN>
<COLUMN NAME="ATTR3"><![CDATA[foo_foo_6 (2).org]]></COLUMN>
<COLUMN NAME="ATTR4"><![CDATA[Lorem ipsum]]></COLUMN>
<COLUMN NAME="ATTR5"><![CDATA[This is some text]]></COLUMN>
</ROW>
</RESULTS>
XML File 2
<applications xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<application>
<name>Google.com</name>
<version>1.2.0.0</version>
</application>
<application>
<name>HelloWorld.com</name>
<version>2.0.0.1</version>
</application>
<application>
<name>FOO_FOO.org</name>
<version>6.2.0.1</version>
</application>
</applications>
Creating arrays/objects filled with XML data
# assign all output from `foreach` loop to `$array1` - XML file 1
$array1 = foreach($row in $xmldata1.RESULTS.ROW){
# create new object with the pertinent details as property values
[pscustomobject]#{
Name = $row.COLUMN.Where{ $_.NAME -eq "ATTR3"}.'#cdata-section'
Version = $row.COLUMN.Where{ $_.NAME -eq "ATTR2"}.'#cdata-section'
}
}
# assign all output from `foreach` loop to `$array2` - XML file 2
$array2 = foreach($row in $xmldata2.applications.application){
# create new object with the pertinent details as property values
[pscustomobject]#{
Name = $row.name
Version = $row.version
}
}
This is the script I'm wondering how to write more effectively. It simply loops through $array1 and compares it with the data in $array2. If there is a match in the name, and a mismatch in the version, then it will store those values in a PS object.
Script I want to improve
#loop through array 1
for($i = 0; $i -le $array1.Length; $i++)
{
#loop through array 2
for($j = 0; $j -le $array2.Length; $j++)
{
#if file name in array 1 matches a name in array 2...
if (($array1.name[$i] -eq $array2.name[$j]) -or ($array1.name[$i].Substring(0, [Math]::Min($array1.name[$i].Length, 7)) -eq $array2.name[$j].Substring(0, [Math]::Min($array2.name[$i].Length, 7))))
{
#then, if that file names version does not match the version found in array 2...
if($array1.version[$i] -ne $array2.version[$j])
{
#create new object
[pscustomobject]#{
Name = $array1.name[$i]
Name2 = $array2.name[$j]
Version = $array1.version[$i]
Version2 = $array2.version[$j]
}
}
}
}
}
However, there are some names that don't match perfectly. So I use the -or operator and throw this line in my first if-statement to compare the first 7 characters of the file name in each array to see if there's some kind of match (which, I know there are):
($array1.name[$i].Substring(0, [Math]::Min($array1.name[$i].Length, 7)) -eq $array2.name[$j].Substring(0, [Math]::Min($array2.name[$i].Length, 7)))
Whenever I add that line though I get the following error for only some of the data objects in the arrays. The script will return some objects, but most of the time my console pane will be filled with the following error:
Error
You cannot call a method on a null-valued expression.
At line:8 char:13
+ if (($array1.name[$i] -eq $array2.name[$j]) -or ($array1 ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
I don't even know what it's talking about. Cause when I extract that line and put actual indices in it, it works fine.
Example
if($array1.name[1020].Substring(0, [Math]::Min($array1.name[1020].Length, 7)) -eq $array2.name[2500].Substring(0, [Math]::Min($array2.name[2500].Length, 7))){
So, I'm stumped. Is there a better way to compare these two arrays and get a similar output?
I believe this could work and might be a more direct way to do it, this method would not require you to do the object construction of the first XML. Hopefully the inline comments explains the logic.
:outer foreach($i in $xml1.results.row) {
$name = $i.Column.Where{ $_.NAME -eq 'ATTR3' }.'#cdata-section'
$version = $i.Column.Where{ $_.NAME -eq 'ATTR2' }.'#cdata-section'
foreach($z in $xml2.applications.application) {
# check if they have the same version
$sameVersion = $version -eq $z.Version
# check if they have the same name
$sameName = $name -eq $z.Name
# if both conditions are `$true` we can skip this and continue with
# next item of outer loop
if($sameVersion -and $sameName) {
continue outer
}
# if their first 7 characters are the same but they're NOT the same version
if([string]::new($name[0..6]) -eq [string]::new($z.Name[0..6]) -and -not $sameVersion) {
[pscustomobject]#{
Name = $name
Name2 = $z.Name
Version = $version
Version2 = $z.Version
}
}
}
}
The result of this would be:
Name Name2 Version Version2
---- ----- ------- --------
Google.com Google.com 1.0.4.0 1.2.0.0
foo_foo_6 (2).org FOO_FOO.org 5.6.7.0 6.2.0.1
See Using a labeled continue in a loop which describes and explains the use of continue outer in this example.
Related
I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 1
ComputerID : 2
Virtual : 3
ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 5
ComputerID : 6
Virtual : 7
ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37
FirstObserved : 2019-06-03T20:29:01
ComputerName : 9
ComputerID : 10
Virtual : 11
ComputerSerialID : 12
You might want to clean up your question a little bit, because it's a little bit hard to read, but I'll try to answer the best I can with what I can understand about what you're trying to do.
Unfortunately, with so much data there's no way to do this quickly. String Comparison and sorting are done by brute force; there is no way to reduce the complexity of comparing each character in one string against another any further than measuring them one at a time to see if they're the same.
(Honestly, if this were me, I'd just use export-csv $object and perform this operation in excel. The time tradeoff to scripting something like this only once just wouldn't be worth it.)
By "Items" I'm going to assume that you mean rows in your table, and that you're not trying to retrieve only the strings in the rows you're looking for. You've already got the basic idea of select-object down, you can do that for the whole table:
$outputFirstObserved = $inputData | Sort-Object -Property FirstObserved -Unique
$outputLastObserved = $inputData | Sort-Object -Property LastObserved -Unique
Now you have ~20 million rows in memory, but I guess that beats doing it by hand. All that's left is to join the two tables. You can download that Join-Object command from the powershell gallery with Install-Script -Name Join and use it in the way described. If you want to do this step yourself, the easiest way would be to squish the two tables together and sort them again:
$output = $outputFirstObserved + $outputLastObserved
$return = $output | Sort-Object | Get-Unique
Does this do it? It keeps the one it finds first.
$test | sort -u ComputerSeriaID, ComputerID
I created this function to de-duplicate my multi-dimensional arrays.
Basically, I concatenate the contents of the record, add this to a hash.
If the concatenate text already exists in the hash, don't add it to the array to be returned.
Function DeDupe_Array
{
param
(
$Data
)
$Return_Array = #()
$Check_Hash = #{}
Foreach($Line in $Data)
{
$Concatenated = ''
$Elements = ($Line | Get-Member -MemberType NoteProperty | % {"$($_.Name)"})
foreach($Element in $Elements)
{
$Concatenated += $line.$Element
}
If($Check_Hash.$Concatenated -ne 1)
{
$Check_Hash.add($Concatenated,1)
$Return_Array += $Line
}
}
return $Return_Array
}
Try the following script.
Should be as fast as possible due to avoiding any pipe'ing in PS.
$hashT = #{}
foreach ($item in $csvData) {
# Building hash table key
$key = '{0}###{1}' -f $item.ComputerSeriaID, $item.ComputerID
# if $key doesn't exist yet OR when $key exists and "FirstObserverd" is less than existing one in $hashT (only valid when date provided in sortable format / international format)
if ((! $hashT.ContainsKey($key)) -or ( $item.FirstObserved -lt $hashT[$key].FirstObserved )) {
$hashT[$key] = $item
}
}
$result = $hashT.Values
I have an array that contains different rows where one column identifies the "record" "type." I want to iterate through this array and sort each item based on that value into a new array so that I have one array per type.
Here's what I have so far:
$data = Get-ADObject -SearchBase $sb -filter * -properties * | select samaccountname,canonicalname,objectclass,distinguishedname | sort objectclass,samaccountname
$oct = $data | select objectclass -Unique
foreach ($o in $oct)
{
$oc = $o.objectclass
Remove-Variable -name "$oc"
New-Variable -name "$oc" -value #()
}
$d = #()
$user = #()
foreach ($d in $data)
{
$oc = $d.objectclass
foreach ($o in $oct)
{
$1 = $o.objectclass
if ($1 -eq $oc)
{
('$' + $oc) += $d
}
}
}
(the lines: Remove-Variable -name "$oc", $d = #(), and $user = #() are for testing purposes so ignore those)
This works great up to the line where I try to dynamically reference my new arrays. What am I doing wrong and how can I fix it?
The error text is:
('$' + $oc) += $d
~~~~~~~~~ The assignment expression is not valid. The input to an assignment operator must be an object that is able to accept
assignments, such as a variable or a property.
CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
FullyQualifiedErrorId : InvalidLeftHandSide
I have tried using $($oc), but that didn't work either. If I change it to the name of one of my dynamically created arrays like $user, the code works fine except that it loads everything into the $user array (obviously).
The reason I tried ('$' + $oc) is because this is the only way I could get ISE to output $user.
I also tried ('$' + $oc).add($d) but it appears to be seeing it as a string rather than the array.
Any pointers are appreciated.
Use the Get-Variable and Set-Variable cmdlets:
$curVal = Get-Variable -Name $oc -ValueOnly
Set-Variable -Name $oc -Value ($curVal+$d)
But note that you would be better off building this array in a local variable first, and then assigning it to your "runtime-named" variable once, as these get and set operations are going to be way slower.
Rather than fiddling around with dynamically named variables, I'd use dictionary-type, like for example a hashtable:
# initialize an empty hashtable
$objectsByClass = #{}
# Define list of properties
$properties = 'samaccountname','canonicalname','objectclass','distinguishedname'
# Retrieve AD objects
$Data = Get-ADObject -SearchBase $sb -filter * -properties $properties | select $properties | sort objectclass,samaccountname
#Populate hashtable
$Data |ForEach-Object {
if(-not $objectsByClass.ContainsKey($_.objectClass)){
# Create entry in hashtable
$objectsByClass[$_.objectClass] = #()
}
# Add entry to dictionary
$objectsByClass[$_.objectClass] += $_
}
Now you can access the items by class name:
$users = $objectsByClass['user']
And you can easily discover all class names:
$classNames = $objectsByClass.Keys
As briantist points out, you can also have Group-Object build the hashtable for you if the above gets too verbose:
$objectsByClass = $Data |Group-Object objectClass -AsHashTable
There are quite a few posts on SO that address PowerShell transposition. However, most of the code is specific to the use case or addresses data being gathered from a text/CSV file and does me no good. I'd like to see a solution that can do this work without such specifics and works with arrays directly in PS.
Example data:
Customer Name: SomeCompany
Abbreviation: SC
Company Contact: Some Person
Address: 123 Anywhere St.
ClientID: XXXX
This data is much more complicated, but I can work with it using other methods if I can just get the rows and columns to cooperate. The array things that "Name:" and "SomeCompany" are column headers. This is a byproduct of how the data is gathered and cannot be changed. I'm importing the data from an excel spreadsheet with PSExcel and the spreadsheet format is not changeable.
Desired output:
Customer Name:, Abbreviation:, Company Contact:, Address:, ClientID:
SomeCompany, SC, Some Person, 123 Anywhere St., XXXX
Example of things I've tried:
$CustInfo = Import-XLSX -Path "SomePath" -Sheet "SomeSheet" -RowStart 3 -ColumnStart 2
$b = #()
foreach ($Property in $CustInfo.Property | Select -Unique) {
$Props = [ordered]#{ Property = $Property }
foreach ($item in $CustInfo."Customer Name:" | Select -Unique){
$Value = ($CustInfo.where({ $_."Customer Name:" -eq $item -and
$_.Property -eq $Property })).Value
$Props += #{ $item = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
This does not work because of the "other" data I mentioned. There are many other sections in this particular workbook so the "Select -Unique" fails without error and the output is blank. If I could limit the input to only select the rows/columns I needed, this might have a shot. It appears that while there is a "RowStart" and "ColumnStart" to Import-XLSX, there are no properties for stopping either one.
I've tried methods from the above linked SO questions, but as I said, they are either too specific to the question's data or apply to importing CSV files and not working with arrays.
I was able to resolve this by doing two things:
Removed the extra columns by using the "-Header" switch on the Import-XLSX function to add fake header names and then only select those headers.
$CustInfo = Import-XLSX -Path "SomePath" -Sheet "SomeSheet" -RowStart 2 -ColumnStart 2 -Header 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18 | Select "1","2"
The downside to this is that I had to know how many columns the input data had -- Not dynamic. If anyone can provide a solution to this issue, I'd be grateful.
Flipped the columns and headers with a simple foreach loop:
$obj = [PSCustomObject]#{}
ForEach ($item in $CustInfo) {
$value = $null
$name = $null
if ($item."2") { [string]$value = $item."2" }
if ($item."1") { [string]$name = $item."1" }
if ($value -and $name) {
$obj | Add-Member -NotePropertyName $name -NotePropertyValue $value
}
}
I had to force string type on the property names and values because the zip codes and CustID was formatting as an Int32. Otherwise, this does what I need.
In the following Powershell example, how can I get $server to populate within line #5? I'm trying to save the results of a Get-EventLog query to an array for each $server loop iteration.
Example:
$serversArray = #("one", "two", "three")
ForEach ($server in $serversArray) {
echo $server "Variable array iteration populates here correctly"
$eventArray = #()
$eventArray += Get-EventLog -computerName $server #But not here, where I need it
$eventArray #Yet it populates from calling the second array correctly
}
I've tried assigning the scope of the $server variable to global or script.
I've researched other similar issues, but none that I could find had this circumstance
I've tried different combinations of piping, quotes, back ticks, etc.
As always, thanks in advance.
Edit Ok, ISE was caching some of the variables (or something strange), as the above example began working after I restarted ISE. Anyways, the issue is with the $servers array input, which in the full script is from a MySQL query. If I statically assign the array with server names (like in the example above) the script works. If I use the input from the MySQL query, Get-EventLog fails with The network path was not found. So even though the server values look correct (and something could be expanding), it could be a text encoding issue, etc. Sorry to waste time, but discussing it has helped narrow it down. Here's the pertinent part of the actual script:
#Open SQL Connection
[System.Reflection.Assembly]::LoadWithPartialName("MySql.Data")
$connectionString = "server=dbserver;uid=odbc;pwd=password;database=uptime;"
$connection = New-Object MySql.Data.MySqlClient.MySqlConnection
$connection.ConnectionString = $connectionString
$connection.Open()
#Get server names from db
$sqlGetServers = "select server_nm from servers limit 3;"
$command = New-Object MySql.Data.MySqlClient.MySqlCommand($sqlGetServers, $connection)
$dataAdapter = New-Object MySql.Data.MySqlClient.MySqlDataAdapter($command)
$dataSet = New-Object System.Data.DataSet
$recordCount = $dataAdapter.Fill($dataSet, "sample_data")
$server_names = #()
$server_names += $dataSet.Tables["sample_data"]
#$server_names = #("server-1","server-2") <-- this works
#loop through array of server names
foreach($server in $server_names) {
$eventData = #()
$eventData += Get-EventLog -computerName $server -LogName System -Newest 10
foreach($event in $eventdata) {Do-Stuff}
}
I am not 100% certain what the problem is that you're having. I am assuming that the issue is that your $eventArray ends up with just the last result.
If that is correct, then the reason is because you're reinitializing it as empty on every iteration in line 4: $eventArray = #()
Try this:
$serversArray = #("one", "two", "three")
$eventArray = #()
ForEach ($server in $serversArray) {
echo $server "Variable array iteration populates here correctly"
$eventArray += Get-EventLog -computerName $server #But not here, where I need it
$eventArray #Yet it populates from calling the second array correctly
}
or, alternatively, with ForEach-Object like this:
$serversArray = #("one", "two", "three")
$eventArray = $serversArray | ForEach-Object {
echo $_ "Variable array iteration populates here correctly"
Get-EventLog -computerName $_ #But not here, where I need it
#Yet it populates from calling the second array correctly
}
Explanation for Method 1:
Declaring $eventArray as an empty array before the loop starts, then adding items to it within each iteration of the loop, as opposed to initializing it on every iteration.
The way you had it, $eventArray would be reset to an empty array every time, and in the end it would just contain the last result.
Explanation for Method 2:
ForEach-Object is a pipeline cmdlet and returns the result of its code block (which is run once for each object piped into it.
In this case we use $_ to represent the individual object. Instead of assigning the result of Get-EventLog to $eventArray, we simply call it, allowing the return value to be returned from the block.
We assign the entire ForEach-Object call into $eventArray instead, which will end up with the collection of results from the entire call.
Instead....
$serversArray = #("one", "two", "three")
$eventArray = #()
$serversArray | ForEach-Object {
echo $_ "Variable array iteration populates here correctly"
$eventArray += Get-EventLog -computerName $_ #But not here, where I need it
$eventArray #Yet it populates from calling the second array correctly
}
Pipe the array into a for-each loop. then you can use $_
I'm using Powershell 1.0 to remove an item from an Array. Here's my script:
param (
[string]$backupDir = $(throw "Please supply the directory to housekeep"),
[int]$maxAge = 30,
[switch]$NoRecurse,
[switch]$KeepDirectories
)
$days = $maxAge * -1
# do not delete directories with these values in the path
$exclusionList = Get-Content HousekeepBackupsExclusions.txt
if ($NoRecurse)
{
$filesToDelete = Get-ChildItem $backupDir | where-object {$_.PsIsContainer -ne $true -and $_.LastWriteTime -lt $(Get-Date).AddDays($days)}
}
else
{
$filesToDelete = Get-ChildItem $backupDir -Recurse | where-object {$_.PsIsContainer -ne $true -and $_.LastWriteTime -lt $(Get-Date).AddDays($days)}
}
foreach ($file in $filesToDelete)
{
# remove the file from the deleted list if it's an exclusion
foreach ($exclusion in $exclusionList)
{
"Testing to see if $exclusion is in " + $file.FullName
if ($file.FullName.Contains($exclusion)) {$filesToDelete.Remove($file); "FOUND ONE!"}
}
}
I realize that Get-ChildItem in powershell returns a System.Array type. I therefore get this error when trying to use the Remove method:
Method invocation failed because [System.Object[]] doesn't contain a method named 'Remove'.
What I'd like to do is convert $filesToDelete to an ArrayList and then remove items using ArrayList.Remove. Is this a good idea or should I directly manipulate $filesToDelete as a System.Array in some way?
Thanks
The best way to do this is to use Where-Object to perform the filtering and use the returned array.
You can also use #splat to pass multiple parameters to a command (new in V2). If you cannot upgrade (and you should if at all possible, then just collect the output from Get-ChildItems (only repeating that one CmdLet) and do all the filtering in common code).
The working part of your script becomes:
$moreArgs = #{}
if (-not $NoRecurse) {
$moreArgs["Recurse"] = $true
}
$filesToDelete = Get-ChildItem $BackupDir #moreArgs |
where-object {-not $_.PsIsContainer -and
$_.LastWriteTime -lt $(Get-Date).AddDays($days) -and
-not $_.FullName.Contains($exclusion)}
In PSH arrays are immutable, you cannot modify them, but it very easy to create a new one (operators like += on arrays actually create a new array and return that).
I agree with Richard, that Where-Object should be used here. However, it's harder to read.
What I would propose:
# get $filesToDelete and #exclusionList. In V2 use splatting as proposed by Richard.
$res = $filesToDelete | % {
$file = $_
$isExcluded = ($exclusionList | % { $file.FullName.Contains($_) } )
if (!$isExcluded) {
$file
}
}
#the files are in $res
Also note that generally it is not possible to iterate over a collection and change it. You would get an exception.
$a = New-Object System.Collections.ArrayList
$a.AddRange((1,2,3))
foreach($item in $a) { $a.Add($item*$item) }
An error occurred while enumerating through a collection:
At line:1 char:8
+ foreach <<<< ($item in $a) { $a.Add($item*$item) }
+ CategoryInfo : InvalidOperation: (System.Collecti...numeratorSimple:ArrayListEnumeratorSimple) [], RuntimeException
+ FullyQualifiedErrorId : BadEnumeration
This is ancient. But, I wrote these a while ago to add and remove from powershell lists using recursion. It leverages the ability of powershell to do multiple assignment . That is, you can do $a,$b,$c=#('a','b','c') to assign a b and c to their variables. Doing $a,$b=#('a','b','c') assigns 'a' to $a and #('b','c') to $b.
First is by item value. It'll remove the first occurrence.
function Remove-ItemFromList ($Item,[array]$List(throw"the item $item was not in the list"),[array]$chckd_list=#())
{
if ($list.length -lt 1 ) { throw "the item $item was not in the list" }
$check_item,$temp_list=$list
if ($check_item -eq $item )
{
$chckd_list+=$temp_list
return $chckd_list
}
else
{
$chckd_list+=$check_item
return (Remove-ItemFromList -item $item -chckd_list $chckd_list -list $temp_list )
}
}
This one removes by index. You can probably mess it up good by passing a value to count in the initial call.
function Remove-IndexFromList ([int]$Index,[array]$List,[array]$chckd_list=#(),[int]$count=0)
{
if (($list.length+$count-1) -lt $index )
{ throw "the index is out of range" }
$check_item,$temp_list=$list
if ($count -eq $index)
{
$chckd_list+=$temp_list
return $chckd_list
}
else
{
$chckd_list+=$check_item
return (Remove-IndexFromList -count ($count + 1) -index $index -chckd_list $chckd_list -list $temp_list )
}
}
This is a very old question, but the problem is still valid, but none of the answers fit my scenario, so I will suggest another solution.
I my case, I read in an xml configuration file and I want to remove an element from an array.
[xml]$content = get-content $file
$element = $content.PathToArray | Where-Object {$_.name -eq "ElementToRemove" }
$element.ParentNode.RemoveChild($element)
This is very simple and gets the job done.