bash script to write in specific part of xml file - arrays

My problem is that this code below puts the result from the beginning of the file but I want to put it in a specific place.
#!/bin/bash -x
USER_ID=( User1 User2 User3 )
USER_CONF=/opt/test/config.xml
for i in "${USER_ID[#]}"; do
printf '<user><id>ID</id><name><%s/name></user>\n' "$i" >> "$USER_CONF"
done
What I get now in config.xml is:
<company="external">
<enabled>true</enabled>
<users="allowed">
USER_TO_INSERT_HERE
</users>
</company>
<user><id>ID</id><name><User1/name></user>
<user><id>ID</id><name><User2/name></user>
<user><id>ID</id><name><User3/name></user>
What I want to get after the script execution in config.xml is:
<company="external">
<enabled>true</enabled>
<users="allowed">
<user><id>ID</id><name><User1/name></user>
<user><id>ID</id><name><User2/name></user>
<user><id>ID</id><name><User3/name></user>
</users>
</company>
Do you know how can I record the values from the for and write them in a variable then to just sed that var in the code?
I know how to sed it but don't know how to record in the var the values or something like that?

First of all, <users="allowed"> in an invalid XML-node. This should probably be something like <users permission="allowed">.
Please use an XML parser like xidel to edit your 'config.xml'.
With "direct element constructors":
$ xidel -s config.xml -e '
x:replace-nodes(
//users,
<users permission="allowed">{
for $user in ("User1","User2","User3") return
<user><id>ID</id><name>{$user}</name></user>
}</users>
)
' --output-node-format=xml --output-node-indent
With "computed constructors":
$ xidel -s config.xml -e '
x:replace-nodes(
//users,
function($x){
element {$x/name()} {
$x/#*,
for $user in ("User1","User2","User3") return
element user {
element id {"ID"},
element name {$user}
}
}
}
)
' --output-node-format=xml --output-node-indent
Output:
<company="external">
<enabled>true</enabled>
<users permission="allowed">
<user>
<id>ID</id>
<name>User1</name>
</user>
<user>
<id>ID</id>
<name>User2</name>
</user>
<user>
<id>ID</id>
<name>User3</name>
</user>
</users>
</company="external">
Playground

Lots of great answers over at unix.stackexchange.com.
The canonical answer for this sort of case (NOTE: not for XML in general in all cases, but for a file where there's a TOKEN on a line on it's own to be replaced - which is exactly the case you have given) is -
a) Output the part of the file "up to" the line before the line with the token
b) Output the replacement
c) Output the rest of the file "from" the line after the line with the token
e.g. here's the simple sed variant (which is no where near as elegant as the r option to sed) -
sed -e '/USER_TO_INSERT_HERE/,$ d' source.xml
cat replacement.xml
sed -e '1,/USER_TO_INSERT_HERE/ d' source.xml

Related

Search and delete links in markdown files

I run from time to time a linkchecker over my site and the external links 404 will be saved to a logfile.
Now I try to delete the links automated from the markdown files. I use multilingual websites so I start read in the logfile to an array.
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -r $i content/* | sed -e 's/([^()]*)//g'
done
This command deletes the link and title with () but the [Example Text] remains. I search for a way to remove [] so that at the end I only get Example Text.
Now:
[Example Text](http://example.com "Example Title")
Desired result:
Example Text
Assumptions
The i in for i in "${link[#]}" will evaluate to be a link like "http://example.com" each loop
The format of every section in your markdown file we care about will take on the form you described [Example Text](http://example.com "Example Title")
The code
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -ro "\[.*\].*${i}" content/* | grep -o '\[.*\]' | tr -d '[]'
done
Explanation
grep -ro "\[.*\].*${i}" content/*:
Recursive search to run on all files in a dir: grep -r ... content/*
Print only the text that applies to our regex: grep -o
Print anything that starts with [ followed by anything .* then a ] followed by the value of our loop variable ${i} (The current link): "\[.*\].*${i}"
From that output all we want is "Example Text" which lives between the brackets, so anything not between brackets needs to go grep -o '\[.*\]'
Finally, we want to remove those pesky brackets: tr -d '[]'
The immediate fix is to extend your sed regex.
sed 's/\[\([^][]*\)\]([^()]*)/\1/g'
But probably a much better fix is to replace all the lines from the Awk script in content in a single go.
find content -type f -exec \
sed -i 's%\[\([^][]*\)\('"$(
awk 'NR>1 { printf "\|" }
{ printf "%s", $7 }' "$ext")"'\)%\1%g'
The Awk script produces a long regex like
http://one.example.net/nosuchpage\|http://two.exampe.org/404\|https://three.example.com/broken-link
from all the links in the input, and the sed script then replaces any links which match this regex in the parentheses after the square brackets. (Maybe you'll want to extend this to also permit a quoted string after the link before the closing round parenthesis, like in your example; I feel I am already quessing too many things about what you are actually hoping to accomplish.)
If you are on a *BSD platform (including MacOS) you'll need to add an empty string ar[ument after the -i argument, like sed -i '' 's%...

How can I edit my code so that I can account for output that goes against my sed command

I am writing a code to put the species named matched from a remote NCBI BLAST database, and the file the matched name came from. I want to make my code more robust so that it can deal with files that do not get a match and that go against my current sed command
#!/bin/bash
for i in ./split.contigs.Parsed/*.csv ; do
sciname=$(head -1 $i | sed -E "s/([A-Z][a-z]+ [a-z]+) .+/\1/")
contigname=$(echo $i | sed -E "s/.fa.csv//" | sed -E
"s/\.\/split.contigs.Parsed\///")
echo "$sciname,$contigname"
done
Expected
Drosophila melanogaster,contig_66:1.0-213512.0_pilon
Drosophila melanogaster,contig_67:1.0-138917.0_pilon
Drosophila sechellia,contig_67:139347.0-186625.0_pilon
Drosophila melanogaster,contig_68:3768.0-4712.0_pilon
Actual
Drosophila ananassae,contig_393:1.0-13214.0_pilon
,contig_393:13217.0-13563.0_pilon
Drosophila sp. pallidosa-like-Wau w,contig_393:14835.0-18553.0_pilon
Apteryx australis,contig_393:19541.0-21771.0_pilon
,contig_393:21780.0-22772.0_pilon
Drosophila sp. pallidosa-like-Wau w,contig_393:22776.0-31442.0_pilon
Drosophila melanogaster,contig_394:1.0-89663.0_pilon
Simply skip the loop if $sciname is null. Put this one line after defining $sciname:
[[ -z $sciname ]] && continue

Using sed to remove line breaks after execute command and save it on array

I'm working on some inventory stuff and i'm trying to save all the AWS regions on one array, then, showed elements one under another to use it as an input menu.
This next command is giving me the right output but when i walk into the array with FOR, the array length is just 1 cause the result is:
aws ec2 describe-regions --output text|awk -F\t '{print $3}'| sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'
eu-north-1 ap-south-1 eu-west-3 eu-west-2 eu-west-1 ap-northeast-2
ap-northeast-1 sa-east-1 ca-central-1 ap-southeast-1 ap-southeast-2
eu-central-1 us-east-1 us-east-2 us-west-1 us-west-2
This is how i'm filing the arrays:
# Get regions
declare -a regions=$(aws ec2 describe-regions --output text | awk -F\t '{print $3}' | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /')
echo -e "\nPlease, select the region you would like to query: "
# Print Regions
len=${#regions[#]}
last=$((len+1))
for (( i=0; i<$len; i++ )); do
echo -e "$i.${regions[$i]}\n" ;
done
echo -e "$last All of them (this could take a while...O_o)\n"
read region_opt
if [${region_opt}!=${last}] then
region=(${regions[$region_opt]})
What i want to have in the output is something like
eu-north-1
ap-south-1
eu-west-3 ....
You're missing parentheses around your array values, e.g.,
declare -a ARRAY=(value1 value2 ... valueN)
(refs: https://www.tldp.org/LDP/Bash-Beginners-Guide/html/sect_10_02.html, https://www.gnu.org/software/bash/manual/bash.html)
The following forms also work, and the first (without declare -a) is given as an example in the GNU's Bash reference manual, the Bash guide for beginners, and the Advanced bash-scripting guide:
ARRAY=(value1 value2 ... valueN)
declare ARRAY=(value1 value2 ... valueN)
$() is command substitution, is just convert any stdout to string and assign it to a variable
if you said true that the result is;
eu-north-1 ap-south-1 eu-west-3...
then to get array out of it, make it syntactically appear so, then tell Bash to evaluate it as such,
regions=($regions)
after expansion it'd be the valid array syntax
regions=(eu-north-1 ap-south-1 eu-west-3)
then would be evaluated as valid array after it's enclosed by "" and as Bash eval argument
$ eval "regions=($regions)"
$ echo ${regions[0]}
eu-north-1
So that I am sure you will be able to accomplish and solve it on your own...

Bash one liner works but script does not

The following script produces no output for me when I run it. I am really confused as to why this isn't working.
#!/bin/bash
i=0
OLDIFS=$IFS
IFS=$'\n'
read -p 'Search history for? ' string
arr=( "$(history | grep "$string" | cut -c8-)" )
for item in ${arr[#]}
do
echo "$(( i++))) $item"
done
However this exact same thing (at least it seems the same to me) works fine when typed directly into my terminal in a single line:
i=0; OLDIFS=$IFS; IFS=$'\n'; read -p 'Search history for? ' string; arr=( "$(history | grep "$string" | cut -c8-)" ); for item in ${arr[#]}; do echo "$(( i++))) $item"; done
I've made the script executable. I've saved it as both a multi line and a single line script. Yet none of the saved scripts produce any output. Why doesn't this work when saved as a script but works fine typed directly into my terminal?
The line echo "$(( i++))) $item" has one closing parentheses in excess.
echo "$(( i++ )) $item"
If you try to use history in a script, it will fail.
Try running this script:
#!/bin/bash
history
It will print nothing because there is no history stored (for this instance of the shell). To read history you need to provide the file with the stored history, call the builtin history to read -r and finally you can list the history from memory:
#!/bin/bash
HISTFILE="$HOME/.bash_history"
history -r
history
That doesn't mean that commands will be written to the file, that's controlled by a different option.
#!/bin/bash
read -p 'Search history for? ' string
i=0
OLDIFS=$IFS
IFS=$'\n'
HISTFILE="$HOME/.bash_history"
history -r
IFS=$'\n' read -d '' -a arr <<<"$(history | grep "$string" | cut -c8-)"
for item in ${arr[#]}
do echo "$(( i++ )) $item"
done
Have a look at this. Apparently the bash history command is disabled in shell programs. But you can get around it according to that link:
#!/bin/bash
#Add this line in to set the history file to your.bash_history
HISTFILE=~/.bash_history
set -o history
history

How can I use sed (or awk or maybe a perl one-liner) to get values from specific columns in file A and use it to find lines in file B?

OK, sedAwkPerl-fu-gurus. Here's one similar to these (Extract specific strings...) and (Using awk to...), except that I need to use the number extracted from columns 4-10 in each line of File A (a PO number from a sales order line item) and use it to locate all related lines from File B and print them to a new file.
File A (purchase order details) lines look like this:
xxx01234560000000000000000000 yyy zzzz000000
File B (vendor codes associated with POs) lines look like this:
00xxxxx01234567890123456789001234567890
Columns 4-10 in File A have a 7-digit PO number, which is found in columns 7-13 of file B. What I need to do is parse File A to get a PO number, and then create a new sub-file from File B containing only those lines in File B which have the POs found in File A. The sub-file created is essentially the sub-set of vendors from File B who have orders found in File A.
I have tried a couple of things, but I'm really spinning my wheels on trying to make a one-liner for this. I could work it out in a script by defining variables, etc., but I'm curious whether someone knows a slick one-liner to do a task like this. The two referenced methods put together ought to do it, but I'm not quite getting it.
Here's a one-liner:
egrep -f <(cut -c4-10 A | sed -e 's/^/^.{6}/') B
It looks like the POs in file B actually start at column 8, not 7, but I made my regex start at column 7 as you asked in the question.
And in case there's the possibility of duplicates in A, you could increase efficiency by weeding those out before scanning file B:
egrep -f <(cut -c4-10 A | sort -u | sed -e 's/^/^.{6}/') B
sed 's_^...\(\d\{7\}\).*_/^.\{6\}\1/p_' FIRSTFILE > FILTERLIST
sed -n -f FILTERLIST SECONDFILE > FILTEREDFILE
The first line generates a sed script from firstfile than the second line uses that script to filter the second line. This can be combined to one line too...
If the files are not that big you can do something like
awk 'BEGIN { # read the whole FIRSTFILE PO numbers to an array }
substr($0,7,7} in array { print $0 }' SECONDFILE > FILTERED
You can do it like (but it will find the PO numbers anywhere on a line)
fgrep -f <(cut -b 4-10 FIRSTFILE) SECONDFILE
Another way using only grep:
grep -f <(grep -Po '^.{3}\K.{7}' fileA) fileB
Explanation:
-P for perl regex
-o to select only the match
\K is Perl positive lookbehind

Resources