BASH grep result as array name - arrays

Heyhey,
since this is my first post please be patient :) I try my best.
I try to "grep" the language out of my system (osx) and take this as a string name to set a language.
I've got some strings called $en['a' 'b' 'c'], $de['d' 'e' 'f'] and $fr['g' 'h' 'i'] somewhere...
I use:
language=$(locale | grep LANG= | cut -d'"' -f2 | cut -d_ -f1)
which gives me a ISO value like en, fr, de, ...
Here comes my main problem. I just can't just use ${language[*]}.
It feels like I tried everything. Already doing try an error with {} () '' and $.
Only thing I found out while debugging is language results in
language=ISO (so this works correct)
and if I try to get this value as my desired string
echo ${language[*]} , ${language[0]} , ${language[1]}
results in
ISO , ISO ,
which is not correct. It seems like I'm creating a new string but I want to use the existing ones.
Don't know any more keywords to google :(

These all use array syntax:
echo ${language[*]} , ${language[0]} , ${language[1]}
But the way you created the variable, this is not an array, this is a simple variable:
language=$(locale | grep LANG= | cut -d'"' -f2 | cut -d_ -f1)
To access its value, use simply $language, for example:
echo $language
Also, the pipeline with locale, grep and cut gets the first two characters of the LANG variable in a very inefficient way.
You can get it more efficiently using a substring:
language=${LANG:0:2}
If you want to use an array (though I don't see the point here), then you must put parentheses around the values to assign, for example:
language=(${LANG:0:2})
This array in this example has one element, you can access its value like this:
echo ${language[0]}
Note that the syntax of Bash is very strict with respect to symbols and spaces, every little detail may make a big difference, so it's important to write accurately.
You can paste your scripts to shellcheck.net to check for trivial errors.
You can read more about arrays in man bash.

Related

Bash: Store sed result into array?

How to fix the following code so that it can store the result of sed, which will replace the _
with -?
My code:
names=()
for entry_ in $foo
do
names+=($entry_ | sed -e "s/_/-/g")
done
echo names
You don't need sed for this, you can use bash's built-in parameter expansion + substitution capability to replace all _ characters with -: ${var//_/-}. You can even use it to do this for the entire list of elements in a single operation, but how you do it depends on what the source variable, foo, actually is.
If foo is an array (the much better way to do things), you can combine [#] ("get me all elements of the array") with the substitution:
names=( "${foo[#]//_/-}" )
If foo is a plain string, and you need to use word splitting to break it into elements for the array, you can do essentially the same thing without the [#] ('cause it's not an array) or the double-quotes (which prevent word splitting):
names=( ${foo//_/-} )
Note: I recommend avoiding word splitting if possible -- it often does something close to what you want, but almost never exactly what you want.
P.s. I third the recommendation of shellcheck. Among other things, it'll flag anything involving word splitting as a probable mistake.
This should be enough to get you there.
names=()
names+=$(echo "hello_world" | sed -e "s/_/-/g")
echo $names
Note that you need $ before echoing your variable.
Also. Look into installing shellcheck for your code editor and it will help you catch sneaky bugs and build better shell programming practices.

Parsing HTML to array only returns one word

I'm trying to parse some HTML subtitles into an array using Bash and html-xml-utils, and I've tried using a Lynx dump to pretty it up, but I had the same problem, because I can't get my sed to put more than one word at a time into the array.
Code:
array=($(echo $PAGE |
hxselect -i ".sub_info_container .sub_title" |
sed -r 's/.*\">(.*)<\/a>.*/\1/' ))
echo $array
This gets piped into sed:
<div class="sub_title"><a class="sub_title" href="/link">Some Random Title.</a></div><div class="sub_title"><a class="sub_title" href="/link2">Another subtitle I want.</a>
Output of echo $array:
Some
What I'm trying to get:
Some Random Title
Without the punctuation would be nice, and the subtitles often have ? or ! instead of period, but it could work including punctuation too.
Things I've tried:
Using Lynx to pretty up the code, then using awk to grab the elements
A lot of different sed and awk methods of grabbing the text
I'm not sure why, but my code ended up separating spaces into separate items. The solution was the following code:
array=($(echo $PAGE |
hxselect -i ".sub_info_container .sub_title" |
lynx -stdin -dump | tr " " - ))
I used tr to turn the spaces into dashes, allowing it to be passed into the array. Taking off the extra parenthesis as everybody suggested actually removed the function of assigning the values into an array, as I stated was my intention. After the code completed I simply re-converted all the dashes back to spaces. It's not pretty but it works!
Try this:
s='<div class="sub_title"><a class="sub_title" href="/link">Some Random Title.</a></div><div class="sub_title"><a class="sub_title" href="/link2">Another subtitle I want.</a>'
array=$(echo "$s" | sed 's/<\/div><div /\n/' | sed -r 's/.*\">(.*)<\/a>.*/\1/g')
echo "$array"
I had to add a newline between the divs to match both. I'm not that good with sed and couldn't figure out how to do it without that.
Your main problem was with the extra parenthesis
array=($(echo .....))

Parameter Substitution on Left Side of Variable Assignment - BASH and Arrays

I am processing some folders that each represent a page of a book. E.g. "Iliad-001" would be Book=Iliad, Page=001.
I want to iterate through all of the folders, create an array for each book and add an entry to that array for each page that is found, so that I can echo ${Iliad[#]} at the end of my script and it will give me a nice list of all the pages it found.
The catch I'm having is adding values to an array with a dynamic name. Here's the code that I think is intuitive (but clearly not right):
for j in */; do
vol_name=$(basename "$j" | sed 's/\(.*\)-[0-9]*/\1/')
page_name=$(basename "$j" | sed 's/.*-\([0-9]*\)/\1/')
$vol_name+=( "$page_name" )
done
This returns:
syntax error near unexpected token `"$page_name"'
If I change the variable assignment to this $vol_name+="( "$page_name" )" I get a little closer:
Iliad+=( 001 ): command not found
I was able to make it work using eval.
BTW, you do not need to run sed.
#! /bin/bash
for j in */; do
j=$(basename "$j")
vol_name=${j%-*}
page_name=${j#*-}
eval "$vol_name+=('$page_name')"
done
echo ${Iliad[#]}
try this
declare $vol_name+=( "$page_name" )

How to 'cut' on null?

Unix 'file' command has a -0 option to output a null character after a filename. This is supposedly good for using with 'cut'.
From man file:
-0, --print0
Output a null character ‘\0’ after the end of the filename. Nice
to cut(1) the output. This does not affect the separator which is
still printed.
(Note, on my Linux, the '-F' separator is NOT printed - which makes more sense to me.)
How can you use 'cut' to extract a filename from output of 'file'?
This is what I want to do:
find . "*" -type f | file -n0iNf - | cut -d<null> -f1
where <null> is the NUL character.
Well, that is what I am trying to do, what I want to do is get all file names from a directory tree that have a particular MIME type. I use a grep (not shown).
I want to handle all legal file names and not get stuck on file names with colons, for example, in their name. Hence, NUL would be excellent.
I guess non-cut solutions are fine too, but I hate to give up on a simple idea.
Just specify an empty delimiter:
cut -d '' -f1
(N.B.: The space between the -d and the '' is important, so that the -d and the empty string get passed as separate arguments; if you write -d'', then that will get passed as just -d, and then cut will think you're trying to use -f1 as the delimiter, which it will complain about, with an error message that "the delimiter must be a single character".)
This works with gnu awk.
awk 'BEGIN{FS="\x00"}{print$1}'
ruakh's helpful answer works well on Linux.
On macOS, the cut utility doesn't accept '' as a delimiter argument (bad delimiter):
Here is a portable workaround that works on both platforms, via the tr utility; it only makes one assumption:
The input mustn't contain \1 control characters (START OF HEADING, U+0001) - which is unlikely in text.
You can substitute any character known not to occur in the input for \1; if it's a character that can be represented verbatim in a string, that simplifies the solution because you won't need the aux. command substitution ($(...)) with a printf call for the -d argument.
If your shell supports so-called ANSI C-quoted strings - which is true of bash, zsh and ksh - you can replace "$(printf '\1')" with $'\1'
(The following uses a simpler input command to demonstrate the technique).
# In zsh, bash, ksh you can simplify "$(printf '\1')" to $'\1'
$ printf '[first field 1]\0[rest 1]\n[first field 2]\0[rest 2]' |
tr '\0' '\1' | cut -d "$(printf '\1')" -f 1
[first field 1]
[first field 2]
Alternatives to using cut:
C. Paul Bond's helpful answer shows a portable awk solution.

KSH scripting: how to split on ',' when values have escaped commas?

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.
Format is:
NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
Suppose I write:
read l
IFS=","
set -A nvls $l
echo "$nvls[2]"
This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:
NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".
I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.
Does anyone have a useful trick up their sleeve for me?
PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)
As it often happens, I deviced an answer minutes after asking the question in public forum :(
I worked around the quoting/unquoting issue by piping the input file through the following sed script:
sed -e 's/\([^\]\),/\1\
/g;s/$/\
/
It converted the input into:
NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>
Now, I can parse this input like this:
while read name value ; do
echo "$name => $value"
done
Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.
PS
Since I cant accept my own answer, should I delete the question, or ...?
You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.
read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

Resources