Powershell Parsing Help - Intermediate-sortaIntermediate Question - arrays

I have a text file with the following format:
Rtmp: 2a1234bzcde9 SL ID: 1549566 IP: 192.168.0.1
Rtmp: 45a1234b4erde9 SL ID: 1549566 IP: 192.168.0.1
Rtmp: a1254bcde9 SL ID: 1549566 IP: 192.168.0.1
Rtmp: 23a1ft4bcde9 SL ID: 1549566 IP: 192.168.0.1
Rtmp: a125egbcde9 SL ID: 1549566 IP: 192.168.0.1
I have several hundred entries involved here.
The problem is that I need to get the second entry in each line of text (inbetween rtmp and sl). Each of those numbers has a length between 6 to 15 characters, all random characters.
How would I pull it into an array of those numbers? I want to use these numbers to make them the names of user accounts in 2008r2 (not using AD) and create folders in the inetpub/ftproot and create the folders in there with aliases and link each account to its corresponding virtual folder.
The last bit I can do.. it's manipulating text files that I suck at!! Here's what I've got so far:
$items = Get-Content $HOME/desktop/info.txt
$splitItems = #()
$splitItems = foreach ($item in $items)
{
$item.split(" ")
}
That splits each line, so that splitItems[0] is the first line of text, now split into multiple lines of text because of the space delimiter.
If I tried to take the SplitItems aray and use the same type of foreach to further split it. It gave me back an array of chars. USEFUL LOL haha. well.. each way I try I keep getting mumbo jumbo or it's not a string type (though get type seems to say it's a string). I think in the process the string type changes to a generic io.object type?
Any ideas or help would be immensely appreciated.!!!

The regex answers may be the right way to go, but if your input file is really as consistently formatted as you suggest, you may have been on the right track to begin with. Try something like this:
$accounts = gc info.txt | % { ($_ -split ' ')[1] }
$accounts

Given your text file is "c:\temp\t.txt", can you test this :
switch -regex -file "c:\temp\t.txt"
{
"^Rtmp: (.*) SL.*$" {$matches[1]}
}

$a = "Rtmp: 2a1234bzcde9 SL ID: 1549566 IP: 192.168.0.1 Rtmp: 45a1234b4erde9 SL ID: 1549566 IP: 192.168.0.1 Rtmp: a1254bcde9 SL ID: 1549566 IP: 192.168.0.1 Rtmp: 23a1ft4bcde9 SL ID: 1549566 IP: 192.168.0.1 Rtmp: a125egbcde9 SL ID: 1549566 IP: 192.168.0.1"
[regex]$regex = "Rtmp:\s(\S+)\sSL"
[regex]::matches($a,$regex) | foreach-object {$_.groups[1].value}
2a1234bzcde9
45a1234b4erde9
a1254bcde9
23a1ft4bcde9
a125egbcde9

If you want to extract one part from a string, and this string is following a defined pattern, simply use '-replace' and a regular expression.
$numbers = ($items -replace '^Rtmp: (\S+) SL.*$','$1')
This line will iterate through each item, extract the string and output it to the new array.
Some meanings:
\S+ means "One or more characters that are not whitespace"
$1 means "The part within the first set of brackets"

Related

Cache file content and then extract matches using regex

Please forgive a bash newbie for any silly questions.
I am really stuck here and I would love to know how this works and what I am doing wrong.
I have written this script which is supposed to capture syslog server based on protocol.
The input is as follows:
sys syslog {
include "destination remote_server {tcp(\"10.1.0.100\" port (514));tcp(\"192.168.1.5\" port (514));udp(\"192.168.1.60\" port (514));};filter f_alllogs {level (debug...emerg);};log {source(local);filter(f_alllogs);destination(remote_server);};"
remote-servers {
mysyslog {
host 192.168.1.1
}
remotesyslog1 {
host 192.168.1.2
}
remotesyslog2 {
host 192.168.1.3
local-ip 10.0.0.50
}
}
}
From this I would like to get something like in the end:
tcp=10.1.0.100
tcp=192.168.1.50
udp=192.168.1.60
udp=192.168.1.1
udp=192.168.1.2
udp=192.168.1.3
So I started with a bash script to parse the output.
#!/bin/bash
#Save output to file
syslogoutput=$(< /home/patrik/input)
echo "Testing variable:"
echo $syslogoutput
echo ""
#Declare array
tcpservers=()
echo $syslogoutput | while read line ; do
matches=($(echo $line | grep -Po '(tcp\("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")'))
#If number of matches is greater than 0, save them to tcpservers
if [ ${#matches[#]} -gt 0 ]; then
tcpservers=("${matches[#]}")
#Echoing matches
echo "Testing matches in loop:"
for i in "${matches[#]}"; do
echo $i
done
fi
done
echo "Testing output:"
for i in "${tcpservers[#]}"; do
echo $i
done
I expected something like this:
...input file separated by line breaks
Testing matches in loop:
tcp("10.1.0.100"
tcp("192.168.1.5"
Testing output:
tcp("10.1.0.100"
tcp("192.168.1.5"
But instead I get:
sys syslog { include "destination remote_server {tcp(\"10.1.0.100\" port (514));tcp(\"192.168.1.5\" port (514));udp(\"192.168.1.60\" port (514));};filter f_alllogs {level (debug...emerg);};log {source(local);filter(f_alllogs);destination(remote_server);};" remote-servers { mysyslog { host 192.168.1.1 } remotesyslog1 { host 192.168.1.2 } remotesyslog2 { host 192.168.1.3 local-ip 10.0.0.50 } } }
Testing matches in loop:
tcp("10.1.0.100"
tcp("192.168.1.5"
Testing output:
So on to my questions:
Why isn't tcpservers=("${matches[#]}") working?
Why isn't the output cached with line breaks?
Why does bash scripting make me want to jump from a tall building every time I try it?
/Patrik
Don't use redirection, as it starts the loop in a subshell, and variables form a subshell don't propagate into the parent shell.
while read line ; do
# ...
done <<< "$syslogoutput"
You also overwrite the tcpservers on each iteration. Change the assignment to
tcpservers+=("${matches[#]}")
# ^
# |
# add to an array, don't overwrite

Getting "The second parameter of split is a string, not an array" when processing input data

I'm new to this (first post!) and have been searching for a similar question for a number of days but cannot for the life of me find something similar enough to help (or cannot recognise it if it is there). If I'm wrong to post this question, please direct me to the right place :)
::
So I've just started using Perl, and I need to take some input:
xxx00_xxxx_xxx00 and split the string into bits separated by the '_'. This I have done using arrays.
problem is now I want to assign the values from the arrays into variables so I can use them.
Here is my novice code:
#!/usr/bin/perl
use strict;
use warnings;
# take input
print "enter ID:\n";
my $input = <>;
#split input
my #values = split( '_', $input );
print "$values[0]\n";
print "$values[1]\n";
print "$values[2]\n";
#assign split components to variables
my $customer_ID = $values[0];
my $service_ID = $values[1];
my $ipac_ID = $values[2];
#print components
print "Customer ID: $customer_ID";
print "Service ID: $service_ID";
print "IPAC ID: $ipac_ID";
exit 0;
BTW I know I have errors on line 19 (I cant figure that out either...)
any help appreciated - thanks!
The message The second parameter of split is a string, not an array is from the Padre IDE you are using, not from Perl. It is produced by Padre's "Beginner's Mode", which naïvely checks whether there is an # between the text split and the next semicolon, even across multiple lines, and if so then it assumes that you have made a mistake. Of course that approach doesn't allow for comments.
The only errors you have - if you can call them errors - are
You leave the newline in place at the end of your input string, so it will look like "xxx00_xxxx_xxx00\n" and split will produce "xxx00", "xxxx", "xxx00\n". A simple chomp $input will remove that newline
You don't print a newline after each scalar value, so your output is all on one line and your session looks like this
enter ID:
xxx00_xxxx_xxx00
xxx00
xxxx
xxx00
Customer ID: xxx00Service ID: xxxxIPAC ID: xxx00
And all you need to do there is add a newline on each print (like you did with your earlier print statements) like this
print "Customer ID: $customer_ID\n";
print "Service ID: $service_ID\n";
print "IPAC ID: $ipac_ID\n";
A tidier version of your program with more obvious choices would look like this
#!/usr/bin/perl
use strict;
use warnings;
# take input
print 'Enter ID: ';
my $input = <>;
chomp $input;
# Split input
my #values = split /_/, $input;
print "$_\n" for #values;
# Assign split components to variables
my ($customer_ID, $service_ID, $ipac_ID) = #values;
#print components
print "Customer ID: $customer_ID\n";
print "Service ID: $service_ID\n";
print "IPAC ID: $ipac_ID\n";
You can clean up your code a bit like this:
print "enter ID: ";
chomp(my $input = <STDIN>);
my ($customer_ID, $service_ID, $ipac_ID) = split(/_/, $input);
print "Customer ID: $customer_ID ";
print "Service ID: $service_ID ";
print "IPAC ID: $ipac_ID\n";
Customer ID: xxx00 Service ID: xxxx IPAC ID: xxx00
You may prefer this:
my ($customer_ID, $service_ID, $ipac_ID) = split('_', $input);
Are you taking input from the keyboard?
If yes, try replacing $input=<> with $input=<STDIN>.
By the way, what is the error you are receiving?

Shell script - awk extract block from file into array

I'm currently writing a shell script that reads a Vagrantfile and bootstraps it (in a nutshell ;) )
But I'm hitting a wall with the following piece of code:
TEST=()
while read result; do
TEST+=(`echo ${result}`)
done <<< `awk '/config.vm.define[ \s]\"[a-z]*\"[ \s]do[ \s]\|[a-zA-Z_]*\|/, /end/ { print }' Vagrantfile`
echo "${TEST[1]}"
When I pass a Vagrantfile into this awk pattern regex with two machines defined (config.vm.define) in it they are found.
The output
config.vm.define "web" do |web|
web.vm.box = "CentOs"
web.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box"
web.vm.hostname = 'dev.local'
web.vm.network :forwarded_port, guest: 90, host: 9090
web.vm.network :private_network, ip: "22.22.22.11"
web.vm.provision :puppet do |puppet|
puppet.manifests_path = "puppet/manifests"
puppet.manifest_file = "web.pp"
puppet.module_path = "puppet/modules"
puppet.options = ["--verbose", "--hiera_config /vagrant/hiera.yaml", "--parser future"]
end
config.vm.define "db" do |db_mysql|
db_mysql.vm.box = "CentOs"
db_mysql.vm.box_url = "http://developer.nrel.gov/downloads/vagrant-boxes/CentOS-6.4-x86_64-v20130731.box"
db_mysql.vm.hostname = 'db.mysql.local'
db_mysql.vm.network :private_network, ip: "22.22.22.22"
db_mysql.vm.network :forwarded_port, guest: 3306, host: 3306
db_mysql.vm.provision :puppet do |puppet|
puppet.manifests_path = "puppet/manifests"
puppet.manifest_file = "db.pp"
puppet.module_path = "puppet/modules"
puppet.options = ["--verbose", "--hiera_config /vagrant/hiera.yaml", "--parser future"]
end
But I can't seem to pass them into a array nicely. What I want is that the TEST array contains two indexes with the machine config.vm.define block as their corresponding values.
E.g.
TEST[0] = 'config.vm.define "web" do |web|
.... [REST OF THE BLOCK CONTENT] ...
end'
TEST[1] = 'config.vm.define "db" do |db_mysql|
.... [REST OF THE BLOCK CONTENT] ...
end'
The output echo "${TEST[1]}" is nothing. echo "${TEST[0]}" returns the whole block as plotted above.
I played with IFS / RS / FS but I can't seem to get the output I want.
A solution might be to write the two blocks to two separate files (blk1 and blk2) as:
awk '
/config.vm.define[[:space:]]\"[a-z]*\"[[:space:]]do[[:space:]]\|[a-zA-Z_]*\|/{f=1; i++}
f{print $0 > "blk"i}
/end/ {f=0}' Vagrantfile
and then later read these two files into the bash array as
IFS= TEST=( $(cat <"blk1") $(cat <"blk2") )
Note:
The regex \s seems to work only for the latest version of gawk (Works with version 4.1, but not version 3.1.8.
For gawk version 3.1.8, use [[:space:]] instead.
For gawk version 4.1, the regex \s does not work inside brackets [\s]. Use either config.vm.define[[:space:]] or config.vm.define\s..
Update
An alternative could be to insert an artificial separator between the blocks, for instance the string ###. Then you could do
IFS= TEST=()
while IFS= read -r -d '#' line ; do
TEST+=($line)
done < <(awk '
/config.vm.define[[:space:]]\"[a-z]*\"[[:space:]]do[[:space:]]\|[a-zA-Z_]*\|/{f=1; i++}
f{print }
/end/ {f=0; print "###"}' Vagrantfile)

bash string to array with spaces and extra delimiters

I'm trying to create arrays from strings that have pipe ("|") as delimiters and include spaces. I've been looking around for a while and I've gotten close thanks to sources like How do I split a string on a delimiter in Bash?, Splitting string into array and a bunch more. I'm close but it's not quite working. The two main problems are that there are spaces in the strings, there are starting and ending delimiters, and some of the fields are blank. Also, instead of just echoing the values, I need to assign them to variables.
Here's the format of the source data:
|username|full name|phone1|phone2|date added|servers|comments|
Example:
|jdoe | John Doe| 555-1212 | |1/1/11 | workstation1, server1 | added by me |
Here's what I need:
Username: jdoe
Fullname: John Doe
Phone1: 555-1212
Phone2:
Date_added: 1/1/11
Servers: workstation1, server1
Comments: guest account
Edit: I use sed to strip out the first and last delimiter and spaces before and after each delimiter, input is now:
jdoe|John Doe|555-1212||1/1/11|workstation1, server1|added by me
Here's things I've tried:
oIFS="$IFS"; IFS='|'
for line in `cat $userList`; do
arr=("$line")
echo "Username: ${arr[0]}" #not assigning a variable, just testing the output
echo "Full Name: ${arr[1]}"
echo "Phone 1: ${arr[2]}"
echo "Phone 2: ${arr[3]}"
# etc..
done
IFS="$oIFS"
Output:
Username:
Full Name:
Phone 1:
Phone 2:
Username: jdoe
Full Name:
Phone 1:
Phone 2:
Username: John Doe
Full Name:
Phone 1:
Phone 2:
Another thing I tried:
for line in `cat $userList`; do
arr=(${line//|/ })
echo "Username: ${arr[0]}"
echo "Full Name: ${arr[1]}"
echo "Phone 1: ${arr[2]}"
echo "Phone 2: ${arr[3]}"
# etc
done
Output:
Username: jdoe
Full Name: John
Phone 1:
Phone 2:
Username: Doe
Full Name: 555-1212
Phone 1:
Phone 2:
Any suggestions? Thanks!!
Your first attempt is pretty close. The main problems are these:
for line in `cat $userList` splits the file by $IFS, not by line-breaks. So you should set IFS=$'\n' before the loop, and IFS='|' inside the loop. (By the way, it's worth noting that the for ... in `cat ...` approach reads out the entire file and then splits it up, so this isn't the best approach if the file can be big. A read-based approach would be better in that case.)
arr=("$line"), by wrapping $line in double-quotes, prevents word-splitting, and therefore renders $IFS irrelevant. It should just be arr=($line).
Since $line has a leading pipe, you either need to strip it off before you get to arr=($line) (by writing something like $line="${line#|}"), or else you need to treat arr as a 1-based array (since ${arr[0]}, the part before the first pipe, will be empty).
Putting it together, you get something like this:
oIFS="$IFS"
IFS=$'\n'
for line in `cat $userList`; do
IFS='|'
arr=($line)
echo "Username: ${arr[1]}" #not assigning a variable, just testing the output
echo "Full Name: ${arr[2]}"
echo "Phone 1: ${arr[3]}"
echo "Phone 2: ${arr[4]}"
# etc..
done
IFS="$oIFS"
(Note: I didn't worry about the fields' leading and trailing spaces, because of the "I can do that step separately" part . . . or did I misunderstand that? Do you need help with that part as well?)
IFS='|'
while read username fullname phone1 phone2 dateadded servers comments; do
printf 'username: %s\n' "$username"
printf 'fullname: %s\n' "$fullname"
printf 'phone1: %s\n' "$phone1"
printf 'phone2: %s\n' "$phone2"
printf 'date added: %s\n' "$dateadded"
printf 'servers: %s\n' "$servers"
printf 'comments: %s\n' "$comments"
done < infile.txt
Another solution:
shopt -s extglob
infile='user.lst'
declare -a label=( "" "Username" "Full Name" "Phone 1" "Phone 2" )
while IFS='|' read -a fld ; do
for (( n=1; n<${#label[#]}; n+=1 )); do
item=${fld[n]}
item=${item##+([[:space:]])}
echo "${label[n]}: ${item%%+([[:space:]])}"
done
done < "$infile"
Leading and trailing blanks will be removed.
Using arrays and paste. Doesn't account for empty fields since OP said it's not a requirement.
userList='jdoe|John Doe|555-1212||1/1/11|workstation1, server1|added by me'
fields=("Username: " "Full Name: " "Phone 1: " "Phone 2: " "Date_added: " "Servers: " "Comments: ")
IFS='|' read -ra data <<<${userList}
paste <(IFS=$'\n'; echo "${fields[*]}") <(IFS=$'\n'; echo "${data[*]}")
Username: jdoe
Full Name: John Doe
Phone 1: 555-1212
Phone 2:
Date_added: 1/1/11
Servers: workstation1, server1
Comments: added by me
Use column if available to you.
readarray -t my_vals <<< $(seq 5)
echo "${my_vals[#]}" #1 2 3 4 5
column -to: <<< "${my_vals[#]}" #1:2:3:4:5
-t = Table Output
-o = Output Delimiter (set to ':' here)

Bash: cant' read out strings with spaces after looping through array of strings

I am using a loop to read the contents of an array, which contains all of the directories and files in the directory hierarchy called 'music' (contents are strings from the previous output of 'find' command). The idea is to separate the full directory path of each array element in "directory_contents" into substrings according to genre, artist, and title. Since my music directory is sorted first by genre, then by artist, then by title, I am grabbing each relevant item using awk where the delimiter "/" shows up. For example, if the directory looks like this after using find "./Electronic/Squarepusher/My Red Hot Car.aif", I will separate "Electronic", "Squarepusher", and "My Red Hot Car", then store them each in separate arrays for genre, artist, and title. Later I will sort these data, then pipe the sorted output into another utility to print all the directory contents in a nice looking table (haven't done this yet). For now, I am just trying to view the results of the string separation with echo statements, and for the most part it seems to work. However, I can't seem to extract substrings which contain spaces, which isn't good:
-->./Hip-Hop/OutKast/title1.aif<--
Genre:
Hip-Hop
Artist:
OutKast
Title:
title1
-->./Hip-Hop/OutKast/title2.aif<--
Genre:
Hip-Hop
Artist:
OutKast
Title:
title2
-->./Hip-Hop/OutKast/title3.aif<--
Genre:
Hip-Hop
Artist:
OutKast
Title:
title3
-->./Jazz/John<--
Genre:
Jazz
Artist:
John
Title:
-->Coltrane/title1.aif<--
Genre:
title1.aif
Artist:
Title:
As you can see, when the loop reads in the string "John Coltrane", it is treating the space as a delimiter, and treating everything after "John" as a different filename. I tried looking for a solution in the bash manual under the section "Arrays" as well as other posts here, but couldn't find a solution that worked for my specific problem (sorry). If anyone has ideas, they would be greatly appreciated. The problematic code appears below, in the for loop (I didn't post the whole script because it is pretty lengthy, but let me if it is needed):
#more before this...
#declare variables
declare -a genre_list
declare -a title_list
declare -a artist_list
declare -a directory_contents
#populate directory with contents
cd $directory
directory_contents=$(find . -mindepth 1 -type f)
cd ..
for music_file in ${directory_contents[*]}; do
if [[ $DEBUG = "true" ]] ; then
echo "-->$music_file<--"
fi
echo "Genre:"
echo $music_file | awk -F"/" '{print $2}'
echo "Artist:"
echo $music_file | awk -F"/" '{print $3}'
echo "Title:"
echo $music_file | awk -F"/" '{print $4}' | awk -F"." '{print $1}'
echo ""
done
Why don't you simply do it in single line:
cd $directory && \
find . -mindepth 3 -maxdepth 3 -type f | \
awk -F'/' '{split($4,A,".aif"); printf "Genre: %s\nArtist: %s\nTitle: %s\n\n",$2,$3,A[1];}'
Update: (removed the .aif from the Title part)
If you can, you should use Perl for this task:
#! /usr/bin/perl
foreach my $fname (<*/*/*>) {
next unless -f $fname;
next unless $fname =~ m"^([^/]+)/([^/]+)/([^/.]+)\.\w+$";
my ($genre, $artist, $title) = ($1, $2, $3);
printf("Genre: %s\nArtist: %s\nTitle: %s\n\n", $genre, $artist, $title);
}
It is faster and simpler than the shell, and it is immune against whitespace in file names.
MUSICDIR="/some/path"
cd "$MUSICDIR"
find . -type f -print | (IFS=/;while read dot genre artist title
do
echo =$genre= =$artist= =$title=
done)
You might try setting the delimiter that bash uses. Perhaps to a newline character?
IFS=\n

Resources