Powershell Reading text file word by word - arrays

So I'm trying to count the words of my text file however when I do get-content the array reads them letter by letter and so it doesn't let me compare them word by word. I hope you guys can help me out!
Clear-Host
#Functions
function Get-Articles (){
foreach($Word in $poem){
if($Articles -contains $Word){
$Counter++
}
}
write-host "The number of Articles in your sentence: $counter"
}
#Variables
$Counter = 0
$poem = $line
$Articles = "a","an","the"
#Logic
$fileExists = Test-Path "text.txt"
if($fileExists) {
$poem = Get-Content "text.txt"
}
else
{
Write-Output "The file SamMcGee does not exist"
exit(0)
}
$poem.Split(" ")
Get-Articles

What your script does, edited down a bit:
$poem = $line # set poem to $null (because $line is undefined)
$Articles = "a","an","the" # $Articles is an array of strings, ok
# check file exists (I skipped, it's fine)
$poem = Get-Content "text.txt" # Load content into $poem,
# also an array of strings, ok
$poem.Split(" ") # Apply .Split(" ") to the array.
# Powershell does that once for each line.
# You don't save it with $xyz =
# so it outputs the words onto the
# pipeline.
# You see them, but they are thrown away.
Get-Articles # Call a function (with no parameters)
function Get-Articles (){
# Poem wasn't passed in as a parameter, so
foreach($Word in $poem){ # Pull poem out of the parent scope.
# Still the original array of lines. unchanged.
# $word will then be _a whole line_.
if($Articles -contains $Word){ # $articles will never contain a whole line
$Counter++
}
}
write-host "The number of Articles in your sentence: $counter" # 0 everytime
}
You probably wanted to do $poem = $poem.Split(" ") to make it an array of words instead of lines.
Or you could have passed $poem words into the function with
function Get-Articles ($poem) {
...
Get-Articles $poem.Split(" ")
And you could make use of the PowerShell pipeline with:
$Articles = "a","an","the"
$poemArticles = (Get-Content "text.txt").Split(" ") | Where {$_ -in $Articles}
$counter = $poemArticles | Measure | Select -Expand Count
write-host "The number of Articles in your sentence: $counter"

TessellatingHeckler's helpful answer explains the problem with your approach well.
Here's a radically simplified version of your command:
$counter = (-split (Get-Content -Raw text.txt) -match '^(a|an|the)$').count
write-host "The number of articles in your sentence: $counter"
The unary form of the -split operator is key here: it splits the input into words by any run of whitespace between words, resulting in an array of individual words.
-match then matches the resulting array of words against a regex that matches words a, an, or the: ^(a|an|the)$.
The result is the filtered subarray of the input array containing only the words of interest, and .count simply returns that subarray's count.

Related

Out-File output is missing line feeds between lines of data

I am passing in an array of $users.
PS C:\> $users | ft
ID DisplayName AdminID first last Password
---- ----------- ------- ----- ---- --------
Axyz Axyz, Bill NBX_Admin Bill Axyz Secret
The code:
$y = #()
$y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
foreach ($x in $users) {
$y += "User $($x.DisplayName) with NNN of $($x.ID)"
}
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
$y | Out-File "Log.txt"
$y is now an unformatted string array. When I type $y to the screen, it looks great.
If I direct it to Format-Table, it looks great (no headings).
When I output it to a file, and type that file at a Command Prompt (cmd.exe), it looks great.
However, when I pull it up in Notepad, all the output appears on a single line. To be precise, all the data is there, there are no lines of data missing, but there are no CR/LF so all of the data appears on a single line within the file when viewed with Notepad.exe.
As AdminOfThings correctly points out:
While $y = #() assigns an empty array to $y, it doesn't type-constrain that variable, so your very next assignment - $y = "Create Users process ..." - changes the variable type to a string.
Simply using += instead of = in that subsequent assignment would have prevented the problem: $y += "Create Users process ...".
Alternatively, type-constraining the variable creation - [array] $y = #() - i.e., placing a type literal to the left of the variable being assigned (akin to a cast) - would have prevented the problem too.
Subsequent use of += therefore performs simple string concatenation rather than the desired gradual building of an array, with no separators between the "lines" added.[1]
By contrast, had you used an array as intended, both Out-File and Set-Content would automatically insert platform-appropriate newlines[2] between the elements, plus one at the end, on saving (in PSv5+ you can use the -NoNewline switch to opt out).
That said, using += to "extend" an array is inefficient, because what PowerShell must do behind the scenes is create a new array containing the old elements plus the new one(s), given that arrays are fixed-size data structures.
While the performance penalty for use of += to "extend" arrays in a loop only really matters with high iteration counts, it is more concise, convenient and efficient to let PowerShell create arrays for you implicitly, by using your foreach loop as an expression:
# Initialize the array and assign the first element.
# Due to the type constraint ([array]), the RHS string implicitly becomes
# the array's 1st element.
[array] $y = "Create Users process. Run started at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Add the strings output by the foreach loop to the array.
# PowerShell implicitly collects foreach output in an array when
# you use it in as an expression.
$y += foreach ($x in $users)
{
"User $($x.displayname) with NNN of $($x.ID)"
}
# Add the final string to the array.
$y += "Completed at $('[{0:MM/dd/yyyy} {0:HH:mm:ss}]' -f (Get-Date))"
# Send the array to a file with Out-File, which separates
# the elements with newlines and adds a trailing one.
# Windows PowerShell:
# Out-File creates UTF-16LE-encoded files.
# Set-Content, which can alternatively be used, creates "ANSI"-encoded files.
# PowerShell Core:
# Both cmdlets create UTF-8-encoded files without BOM.
$y | Out-File "Log.txt"
Note that you can similarly use for, if, do / while / switch statements as expressions.
In all cases, however, as of PowerShell 7.0, these statements can only serve as expressions by themselves; regrettably, using them as the first segment of a pipeline or embedding them in larger expressions does not work - see this GitHub issue.
[1] A simple demonstration of your problem:
# The initialization of $y as #() is overridden by $y = 'first'.
PS> $y = #(); $y = 'first'; $y += 'second'; $y
firstsecond # !! $y contains a single string built with string concatenation
The description of your symptoms is therefore not consistent with your code, as you should have seen a single-line output string in all scenarios (printing directly to the screen / via Format-Table, sending to a file and type-ing that from cmd.exe).
[2] The platform-appropriate newline is reflected in [Environment]::NewLine, and it is "`r`n" (CRLF) on Windows, and just "`n" (LF) on Unix-like platforms (in PowerShell Core).
As using += recreates the array on every iteration I'd suggest to assign the output of a ForEach-Object with it's -Begin, -Process and -End sections to a variable also using a more common approach of the format operator.:
$Log = $users | ForEach-Object -Begin {
"Create Users process. Run started at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
} -Process {
"User {0} with NNN of {1}" -f $_.DisplayName,$_.ID
} -End {
"Completed at [{0:MM/dd/yyyy} {0:HH:mm:ss}]" -f (Get-Date)
}
$Log | Set-Content "Log.txt"

Array to read different values

How can I input multiple values in PowerShell and have those values stored in a variable?
For example:
$value = Read-Host "Enter the value's" #I need 15 values to be entered.
And then recall them e.g:
$value[0] = 1233
$value[1] = 2345
To do this you could declare an array #() and then use a loop and the addition operator to add items to the array (stopping when a blank value is submitted):
$values = #()
Do{
$value = read-host "Enter a value"
if ($value) {$values += $value}
}Until (-not $value)
You can then retrieve values as you described via the index with square brackets []:
$values #returns all values
$values[3] #returns the fourth value (if you entered four or more)
Beware that arrays start from 0, so the first item is [0], second is [1] etc. With PowerShell you can also use negative numbers to work through the array backwards, so [-1] is the last item, [-2] the second to last, etc.
Stores the readin values in an array:
$values = #()
$i = $null
while ($i -ne "q") {
if ($i -ne $null) {
# Attach value to array
$values += $i
}
$i = Read-Host "Enter value (stop with q)"
}
# Print each value in a seperate line
$values | % { Write-Host $_}
# Print type -> to visualize that it is an array
$values.GetType()
# Several values can be retrieved via index operator
$values[0]

Powershell - How can I count count occurrences in my array

So I made my input from the read-host into an array an figured it would let me count the amount of times a word in the sentence $a is seen from the $Array. however Count++ doesn't give me a total
function Get-Sentence($a){
if($a -contains $array) {
$Count++
}
else {
return 0
}
}
Write-Host "There are $count words"
[array]$Array = #("a", "an", "the")
[array]$a = Read-Host "Enter a long sentence from a story book or novel: ").split(" ")
Preferred approach:
Easiest way to accurately count occurrences of multiple substrings is probably:
Construct regex pattern that matches on any one of the substrings
Use the -split operator to split the string
Count the resulting number of strings and substract 1:
# Define the substrings and a sentence to test against
$Substrings = "a","an","the"
$Sentence = "a long long sentence to test the -split approach, anticipating false positives"
# Construct the regex pattern
# The \b sequence ensures "word boundaries" on either side of a
# match so that "a" wont match the a in "man" for example
$Pattern = "\b(?:{0})\b" -f ($Substrings -join '|')
# Split the string, count result and subtract 1
$Count = ($Sentence -split $Pattern).Count - 1
Outputs:
C:\> $Count
2
As you can see it will have matched and split on "a" and "the", but not the "an" in "anticipating".
I'll leave converting this into a function an exercise to the reader
Note:
if you start feeding more than just simple ASCII strings as input, you may want to escape them before using them in the pattern:
$Pattern = "\b(?:{0})\b" -f (($Substrings |ForEach-Object {[regex]::Escape($_)}) -join '|')
Naive approach:
If you're uncomfortable with regular expressions, you can make the assumption that anything in between two spaces is "a word" (like in your original example), and then loop through the words in the sentence and check if the array contains the word in question (not the other way around):
$Substrings = "a","an","the"
$Sentence = (Read-Host "Enter a long sentence from a story book or novel: ").Split(" ")
$Counter = 0
foreach($Word in $Sentence){
if($Substrings -contains $Word){
$Counter++
}
}
As suggested by Jeroen Mostert, you could also utilize a HashTable. With this you could track occurrences of each word, instead of just a total count:
$Substrings = "a","an","the"
$Sentence = (Read-Host "Enter a long sentence from a story book or novel: ").Split(" ")
# Create hashtable from substrings
$Dictionary = #{}
$Substrings |ForEach-Object { $Dictionary[$_] = 0 }
foreach($Word in $Sentence){
if($Dictionary.ContainsKey($Word)){
$Dictionary[$Word]++
}
}
$Dictionary
$Substrings = "a","an","the"
("a long long sentence to test the -split approach, anticipating false positives" -split " " | where {$Substrings -contains $_}).Count

Using an array of strings to stop regex expression

I'm trying to split up an emails subject line with pairs and then store that in hash table then I will then use to send it out to a different program.
$message = "WHERE=bmc3423 ENVIRONMENT=WINDOWS WHO=DDD WHAT=CPU WHATVAR=PERCENTAGE WHATVAL=98 WHY=HAPPY WHEN=02/05/2015 4:34 pm SEVERITY=WARNING PRIORITY=5 STATUS=OPEN TYPE=Stuff CI=bmc3232 MNEMONIC=DAW MESSAGE=happy days are here for email"
$tokens = "what","whatvar","whatval","where","when","severity","status","type","CI","mnemonic","who","message"
$tokenhash = #{}
ForEach($item in $tokens)
{
#$item
$match= $message -match "$item=([\S\s]*)\S*=?"
$tokenhash.Add($item,"$Matches")
out-host -InputObject $Matches.1
}
I was not sure if there was a way to use the token list in the regex so that is checks each token word to make sure it stops collecting at any of the tokens
Example:
$match= $message -match "$item=([\S\s]*)where="
$match= $message -match "$item=([\S\s]*)when="
I hope I explained that ok. I'm a horrible communicator. Right now I"m trying to use \S*=? to try to get whatever the next starting of the pair would be. They may not be in the same order when coming in.
If the key is always just a single word, you could do with two simple statements and a for loop:
# Split into alternating groups of key/value
$tokens = $message -split '([\S]*)=' |Where-Object { $_ }
# Populate dictionary:
for($i = 0; $i -lt $tokens.Count; $i += 2){
$tokenhash[$tokens[$i]] = $tokens[$i+1]
}

Powershell Test if array in one line

I do a get-content on a file. Sometimes there are a lot of lines, but it can happen there is only one line (or even 0)
I was doing something like
$csv = (gc $FileIn)
$lastID = $csv[0].Split(' ')[1] #First Line,2nd column
But with only one line, gc return a string and $csv[0] return the first caracter of the string instead of the complete line, and the following Split fail.
Is it possible to do something like :
$lastID = (is_array($csv)?$csv[0]:$csv).Split(' ')[1]
And to do that only if $csv contains at least a line?
Thx for your help,
Tim
There are type operators one can use to test the type of a variable. -is is the one you need. Like so,
$foo = #() # Array
$bar = "zof" # String
$foo -is [array] # Is foo an array?
True # Yes it is
$foo -is [string] # Is foo a string?
False # No it is not
$bar -is [array] # How about bar
False # Nope, not an array
$bar -is [string] # A string then?
True # You betcha!
So something like this could beused
if($csv -is [array]) {
# Stuff for array
} else {
# Stuff for string
}
Instead of doing:
$csv = (gc $FileIn)
you had to
$csv = #(gc $FileIn)
Now the output will always be an array of strings irrespective of the file having one line or not. The rest of the code will just have to treat $csv as an array of strings. This way is better than having to check if the output is an array etc., at the least in this situation.

Resources