Using two files to search/replace a third file - arrays

I have two files:
correct.txt
the sky is blue
I like eat apple
.
.
and wrong.txt
the sky are blue
I like eat apple
.
.
.
There are a lot of lines in both files.
Now, I want to correct a third file using my search in the "wrong.txt"
to correct it using the "correct.txt".
I have created two files:
readarray -t correct_array < correct.txt
readarray -t wrong_array < wrong.txt
The file to be corrected is to_be_corrected.txt
This works:
for c in "${correct_array[#]}"
do
echo "$c"
done
I tried this
for e in "${correct_array[#]}"
do
sed -i.bak 's/$wrong_array[#]/$correct_array[#]/' to_be_corrected.txt
done
But this did not work.
How can I use sed with arrays?

You are using single quotes (') for your sed command, so the shell is not evaluating the variables $wrong_array[#] and $correct_array[#]. Try double quotes and braces on the variables. Also, you are using the entire array with ${correct_array[#]}. You need to pair the elements together, perhaps with an index:
for ((e=0; e<"${#correct_array[#]}"; ++e)); do
sed -i.bak "s/${wrong_array[$e]}/${correct_array[$e]}/" to_be_corrected.txt
done
This iterates e over the indexes of the array (${#correct_array[#]} gives the size of the array) then e is used to index the corresponding elements of wrong_array and correct_array. Hopefully you don't have any quotes (single or double) in your text files.

You should always use {} with arrays. This doesn't work:
$array[1]
But this will:
${array[1]}
As pointed out by e0k you should also use double quoted otherwise the variable won't be expanded to it's actual value.
Don't know what exaclty your array has, but I think you want to iterate it instead of use the whole thing. Try this approach:
for i in `seq 0 $((${#correct_array[#]}-1))`; do
sed -i.bak "s/${wrong_array[$i]}/${correct_array[$i]}/" to_be_corrected.txt
done

Related

Bash: Store sed result into array?

How to fix the following code so that it can store the result of sed, which will replace the _
with -?
My code:
names=()
for entry_ in $foo
do
names+=($entry_ | sed -e "s/_/-/g")
done
echo names
You don't need sed for this, you can use bash's built-in parameter expansion + substitution capability to replace all _ characters with -: ${var//_/-}. You can even use it to do this for the entire list of elements in a single operation, but how you do it depends on what the source variable, foo, actually is.
If foo is an array (the much better way to do things), you can combine [#] ("get me all elements of the array") with the substitution:
names=( "${foo[#]//_/-}" )
If foo is a plain string, and you need to use word splitting to break it into elements for the array, you can do essentially the same thing without the [#] ('cause it's not an array) or the double-quotes (which prevent word splitting):
names=( ${foo//_/-} )
Note: I recommend avoiding word splitting if possible -- it often does something close to what you want, but almost never exactly what you want.
P.s. I third the recommendation of shellcheck. Among other things, it'll flag anything involving word splitting as a probable mistake.
This should be enough to get you there.
names=()
names+=$(echo "hello_world" | sed -e "s/_/-/g")
echo $names
Note that you need $ before echoing your variable.
Also. Look into installing shellcheck for your code editor and it will help you catch sneaky bugs and build better shell programming practices.

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

bash: sed search and replace the path of array elements

In bash I have an array with path names, and I would like to replace each of them with different ones using sed, like so:
sed 's#^(.*?)master_repo(.*?)#\1"${SOME_REPO_NAME}"\2#g' <<< ${FULL_TGT_DIRS[${i}]}
A sample path name which is an element of the array would be:
/Volumes/munki/master_repo/pkgs/apps
I would like to replace the path name "master_repo" with for example "somedir", which is stored in $SOME_REPO_NAME, so I get:
/Volumes/munki/somedir/pkgs/apps
Or with built in string substitution:
for i in ${FULL_TGT_DIRS[#]}
do
FULL_TGT_DIRS[$i]=${FULL_TGT_DIRS[$i]/master_repo/$SOME_REPO_NAME}
#sed 's#^(.*?)master_repo(.*?)#\1"${SOME_REPO_NAME}"\2#g' <<< ${FULL_TGT_DIRS[${i}]}
done
I always get the following error when running my script:
> /usr/local/bin/repomgr: line 135:
> /Volumes/munki/master_repo/pkgs/apps: syntax error: operand expected
> (error token is "/Volumes/munki/master_repo/pkgs/apps")
I've tried using different separaters and sed options, as well as shuffling through different quote constellations. I don't write bash scripts on a daily basis so perhaps I'm missing something?
BTW, I run this on a Mac and therefore only have bash 3.2 at my disposal.
There's no need to use sed for this, bash has built-in string replacement in its parameter expansion.
var=/Volumes/munki/master_repo/pkgs/apps
$SOME_REPO_NAME=somedir
newvar=${var/master_repo/$SOME_REPO_NAME}
In a for-in loop, the variable gets set to the array elements, not the array indexes, so you shouldn't be using FULL_TGT_DIRS[$i] -- $i contains the pathname. So the loop should be:
for file in ${FULL_TGT_DIRS[#]}
do
file=${file/master_repo/$SOME_REPO_NAME}
# Do something with $file here
done
If you need to modify the array in place, you need a different loop for the indexes:
for ((i = 0; i < ${#FULL_TGT_DIRS[#]}; i++))
do
FULL_TGT_DIRS[$i]=${FULL_TGT_DIRS[$i]/master_repo/"$SOME_REPO_NAME"}
done
You can even go a step further using bashes own replacement:
for file in "${FULL_TGT_DIRS[#]/master_repo/somedir}"
do
...work on file variable here...
done

Sed: Match, remove and replace in one sed call

Let's say I have an string like:
Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720
I would like to use sed to remove the first part (Image.Resolution=) and then split the rest by comma so I can put all the resolutions in a bash array.
I know how to do it in two steps (two sed calls) like:
sed 's/Image.Resolution=//g' | sed 's/,/ /g'.
But as an exercise, I'd like to know if there's a way of doing it in one shot.
Thank you in advance.
Just put ; between the commands:
sed 's/Image.Resolution=//g; s/,/ /g'
From info sed:
3 `sed' Programs
****************
A `sed' program consists of one or more `sed' commands, passed in by
one or more of the `-e', `-f', `--expression', and `--file' options, or
the first non-option argument if zero of these options are used. This
document will refer to "the" `sed' script; this is understood to mean
the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed
in.
Commands within a SCRIPT or SCRIPT-FILE can be separated by
semicolons (`;') or newlines (ASCII 10). Some commands, due to their
syntax, cannot be followed by semicolons working as command separators
and thus should be terminated with newlines or be placed at the end of
a SCRIPT or SCRIPT-FILE. Commands can also be preceded with optional
non-significant whitespace characters.
This awk can also work:
s='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
awk -F '[=,]' '{$1=""; sub(/^ */, "")} 1' <<< "$s"
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720
For this concrete example you can do it in short way:
sed 's/[^x0-9]/ /g'
and
x='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
y=(${x//[^x0-9]/ })
will remove everything execpt x and digits 0-9, so output (or array y) is
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720
x="Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720"
x=${x#*=} # remove left part including =
array=(${x//,/ }) # replace all `,` with whitespace and create array
echo ${array[#]} # print array $array
Output:
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720

KSH scripting: how to split on ',' when values have escaped commas?

I try to write KSH script for processing a file consisting of name-value pairs, several of them on each line.
Format is:
NAME1 VALUE1,NAME2 VALUE2,NAME3 VALUE3, etc
Suppose I write:
read l
IFS=","
set -A nvls $l
echo "$nvls[2]"
This will give me second name-value pair, nice and easy. Now, suppose that the task is extended so that values could include commas. They should be escaped, like this:
NAME1 VALUE1,NAME2 VALUE2_1\,VALUE2_2,NAME3 VALUE3, etc
Obviously, my code no longer works, since "read" strips all quoting and second element of array will be just "NAME2 VALUE2_1".
I'm stuck with older ksh that does not have "read -A array". I tried various tricks with "read -r" and "eval set -A ....", to no avail. I can't use "read nvl1 nvl2 nvl3" to do unescaping and splitting inside read, since I dont know beforehand how many name-value pairs are in each line.
Does anyone have a useful trick up their sleeve for me?
PS
I know that I have do this in a nick of time in Perl, Python, even in awk. However, I have to do it in ksh (... or die trying ;)
As it often happens, I deviced an answer minutes after asking the question in public forum :(
I worked around the quoting/unquoting issue by piping the input file through the following sed script:
sed -e 's/\([^\]\),/\1\
/g;s/$/\
/
It converted the input into:
NAME1.1 VALUE1.1
NAME1.2 VALUE1.2_1\,VALUE1.2_2
NAME1.3 VALUE1.3
<empty line>
NAME2.1 VALUE2.1
<second record continues>
Now, I can parse this input like this:
while read name value ; do
echo "$name => $value"
done
Value will have its commas unquoted by "read", and I can stuff "name" and "value" in some associative array, if I like.
PS
Since I cant accept my own answer, should I delete the question, or ...?
You can also change the \, pattern to something else that is known not to appear in any of your strings, and then change it back after you've split the input into an array. You can use the ksh builtin pattern-substitution syntax to do this, you don't need to use sed or awk or anything.
read l
l=${l//\\,/!!}
IFS=","
set -A nvls $l
unset IFS
echo ${nvls[2]/!!/,}

Resources