Strange behaviour with Bash, Arrays and empty spaces - arrays

Problem:
Writing a bash script, i'm trying to import a list of products that are inside a csv file into an array:
#!/bin/bash
PRODUCTS=(`csvprintf -f "/home/test/data/input.csv" -x | grep "col2" | sed 's/<col2>//g' | sed 's/<\/col2>//g' | sed -n '1!p' | sed '$ d' | sed 's/ //g'`)
echo ${PRODUCTS[#]}
In the interactive shell, the result/output looks perfect as following:
burger
special fries
juice - 300ml
When i use exactly the same commands in bash script, even debugging with bash -x script.sh, in the part of echo ${PRODUCTS[#]}, the result of array is all files names located at /home/test/data/ and:
burger
special
fries
juice
-
300ml
The array is taking directory list AND messed up newlines. This don't happen in interactive shell (single command line).
Anyone know how to fix that?

Looking at the docs for csvprintf, you're converting the csv into XML and then parsing it with regular expressions. This is generally a very bad idea.
You might want to install csvkit then you can do
csvcut -c prod input.csv | sed 1d
Or you could use a language that comes with a CSV parser module. For example, ruby
ruby -rcsv -e 'CSV.read("input.csv", :headers=>true).each {|row| puts row["prod"]}'
Whichever method you use, read the results into a bash array with this construct
mapfile -t products < <(command to extract the product data)
Then, to print the array elements:
for prod in "${products[#]}"; do echo "$prod"; done
# or
printf "%s\n" "${products[#]}"
The quotes around the array expansion are critical. If missing, you'll see one word per line.
Tip: don't use ALLCAPS variable names in the shell: leave those for the shell. One day you'll write PATH=something and then wonder why your script is broken.

Related

Search and delete links in markdown files

I run from time to time a linkchecker over my site and the external links 404 will be saved to a logfile.
Now I try to delete the links automated from the markdown files. I use multilingual websites so I start read in the logfile to an array.
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -r $i content/* | sed -e 's/([^()]*)//g'
done
This command deletes the link and title with () but the [Example Text] remains. I search for a way to remove [] so that at the end I only get Example Text.
Now:
[Example Text](http://example.com "Example Title")
Desired result:
Example Text
Assumptions
The i in for i in "${link[#]}" will evaluate to be a link like "http://example.com" each loop
The format of every section in your markdown file we care about will take on the form you described [Example Text](http://example.com "Example Title")
The code
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -ro "\[.*\].*${i}" content/* | grep -o '\[.*\]' | tr -d '[]'
done
Explanation
grep -ro "\[.*\].*${i}" content/*:
Recursive search to run on all files in a dir: grep -r ... content/*
Print only the text that applies to our regex: grep -o
Print anything that starts with [ followed by anything .* then a ] followed by the value of our loop variable ${i} (The current link): "\[.*\].*${i}"
From that output all we want is "Example Text" which lives between the brackets, so anything not between brackets needs to go grep -o '\[.*\]'
Finally, we want to remove those pesky brackets: tr -d '[]'
The immediate fix is to extend your sed regex.
sed 's/\[\([^][]*\)\]([^()]*)/\1/g'
But probably a much better fix is to replace all the lines from the Awk script in content in a single go.
find content -type f -exec \
sed -i 's%\[\([^][]*\)\('"$(
awk 'NR>1 { printf "\|" }
{ printf "%s", $7 }' "$ext")"'\)%\1%g'
The Awk script produces a long regex like
http://one.example.net/nosuchpage\|http://two.exampe.org/404\|https://three.example.com/broken-link
from all the links in the input, and the sed script then replaces any links which match this regex in the parentheses after the square brackets. (Maybe you'll want to extend this to also permit a quoted string after the link before the closing round parenthesis, like in your example; I feel I am already quessing too many things about what you are actually hoping to accomplish.)
If you are on a *BSD platform (including MacOS) you'll need to add an empty string ar[ument after the -i argument, like sed -i '' 's%...

Trouble with AWK'd command output and bash array

I am attempting to get a list of running VirtualBox VMs (the UUIDs) and put them into an array. The command below produces the output below:
$ VBoxManage list runningvms | awk -F '[{}]' '{print $(NF-1)}'
f93c17ca-ab1b-4ba2-95e5-a1b0c8d70d2a
46b285c3-cabd-4fbb-92fe-c7940e0c6a3f
83f4789a-b55b-4a50-a52f-dbd929bdfe12
4d1589ba-9153-489a-947a-df3cf4f81c69
I would like to take those UUIDs and put them into an array (possibly even an associative array for later use, but a simple array for now is sufficient)
If I do the following:
array1="( $(VBoxManage list runningvms | awk -F '[{}]' '{print $(NF-1)}') )"
The commands
array1_len=${#array1[#]}
echo $array1_len
Outputs "1" as in there's only 1 element. If I print out the elements:
echo ${array1[*]}
I get a single line of all the UUIDs
( f93c17ca-ab1b-4ba2-95e5-a1b0c8d70d2a 46b285c3-cabd-4fbb-92fe-c7940e0c6a3f 83f4789a-b55b-4a50-a52f-dbd929bdfe12 4d1589ba-9153-489a-947a-df3cf4f81c69 )
I did some research (Bash Guide/Arrays on how to tackle this and found this with command substitution and redirection, but it produces an empty array
while read -r -d '\0'; do
array2+=("$REPLY")
done < <(VBoxManage list runningvms | awk -F '[{}]' '{print $(NF-1)}')
I'm obviously missing something. I've looked at several simiar questions on this site such as:
Reading output of command into array in Bash
AWK output to bash Array
Creating an Array in Bash with Quoted Entries from Command Output
Unfortunately, none have helped. I would apprecaite any assistance in figuring out how to take the output and assign it to an array.
I am running this on macOS 10.11.6 (El Captain) and BASH version 3.2.57
Since you're on a Mac:
brew install bash
Then with this bash as your shell, pipe the output to:
readarray -t array1
Of the -t option, the man page says:
-t Remove a trailing delim (default newline) from each line read.
If the bash4 solution is admissible, then the advice given
e.g. by gniourf_gniourf at reading-output-of-command-into-array-in-bash
is still sound.

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

How to remove numbers from extensions from files

I have many files in a directory having extension like
.text(2) and .text(1).
I want to remove the numbers from extension and output should be like
.text and .text .
can anyone please help me with the shell script for that?
I am using centOs.
A pretty portable way of doing it would be this:
for i in *.text*; do mv "$i" "$(echo "$i" | sed 's/([0-9]\{1,\})$//')"; done
Loop through all files which end in .text followed by anything. Use sed to remove any parentheses containing one or more digits from the end of each filename.
If all of the numbers within the parentheses are single digits and you're using bash, you could also use built-in parameter expansion:
for i in *.text*; do mv "$i" "${i%([0-9])}"; done
The expansion removes any parentheses containing a single digit from the end of each filename.
Another way without loops, but also with sed (and all the regexp's inside) is piping to sh:
ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
Example:
[...]$ ls
xxxx.text(1) yyyy.text(2)
[...]$ ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
[...]$ ls
xxxx.text yyyy.text
Explanation:
Everything between \( and \) is stored and can be pasted again by \1 (or \2, \3, ... a consecutive number for each pair of parentheses used). Therefore, the code above stores all the characters before the first dot \. and after that, compounds a sequence like this:
mv xxxx* xxxx.text
mv yyyy* yyyy.text
That is piped to sh
Most simple way if files are in same folder
rename 's/text\([0-9]+\)/text/' *.text*
link

How do I let sed 'w' command know where the filename ends?

Every example I was able to find demonstrating the w command of sed has it in the end of the script. What if I can't do that?
An example will probably demonstrate the problem better:
$ echo '123' | sed 'w tempfile; s/[0-9]/\./g'
sed: couldn't open file tempfile; s/[0-9]/\./g: No such file or directory
(How) can I change the above so that sed knows where the filename ends?
P.S. I'm aware that I can do
$ echo '123' | sed 'w tempfile
> s/[0-9]/\./g'
...
Are there prettier options?
P.P.S. People tend to suggest to split it in two scripts. The question is then: is it safe? What if I was going to branch somewhere after the w command, and so on. Can someone confirm that any script can be split in two after any command and that will not affect the results?
Final edit: I checked that multiple -e work just as concatenated commands. I thought it was more complex (like the first one should always exit before the second one starts, etc.). However, I tried splitting a {..} block of commands between two scripts and it still worked, so the w thing is really not a serious problem. Thanks to all.
You can give a two line script to sed in one shell line:
echo '123' | sed -e 'w tempfile' -e 's/[0-9]/\./g'
This might work for you (if you're using BASH and probably GNU sed):
echo '123' | sed 'w tempfile'$'\n'';s/[0-9]/\./g'
Explanation:
The r, R and w commands need a newline to terminate the file name.
The answer to the question is "newline":
sed will treat a non-escaped literal newline as the end of the file name.
If your shell is bash, or supports the $'\n' syntax, you can solve the OP's original question this way:
echo '123' | sed 'w tempfile'$'\n''s/[0-9]/\./g'
In a more limited sh you can say
$ echo '123' | sed 'w tempfile'\
> 's/[0-9]/\./g'
What I did here was write \ as an escape, then hit enter and wrote the rest of the command there. Note that here I am escaping the newline from bash but it is being passed to sed.
Reverse the 2 sed command sequences like this:
echo '123' | sed 's/[0-9]/\./g;w tempfile'
i.e. perform replacements first and then write pattern space into a file.
EDIT: There was some misunderstanding whether OP wants replaced text in final file or not. My above command puts replaced text in tempfile. Since this is not what OP wanted here is one more version that avoids it:
echo '123' | sed -e 'h;s/[0-9]/\./g;g;w tempfile'

Resources