Sed: Match, remove and replace in one sed call - arrays

Let's say I have an string like:
Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720
I would like to use sed to remove the first part (Image.Resolution=) and then split the rest by comma so I can put all the resolutions in a bash array.
I know how to do it in two steps (two sed calls) like:
sed 's/Image.Resolution=//g' | sed 's/,/ /g'.
But as an exercise, I'd like to know if there's a way of doing it in one shot.
Thank you in advance.

Just put ; between the commands:
sed 's/Image.Resolution=//g; s/,/ /g'
From info sed:
3 `sed' Programs
****************
A `sed' program consists of one or more `sed' commands, passed in by
one or more of the `-e', `-f', `--expression', and `--file' options, or
the first non-option argument if zero of these options are used. This
document will refer to "the" `sed' script; this is understood to mean
the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed
in.
Commands within a SCRIPT or SCRIPT-FILE can be separated by
semicolons (`;') or newlines (ASCII 10). Some commands, due to their
syntax, cannot be followed by semicolons working as command separators
and thus should be terminated with newlines or be placed at the end of
a SCRIPT or SCRIPT-FILE. Commands can also be preceded with optional
non-significant whitespace characters.

This awk can also work:
s='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
awk -F '[=,]' '{$1=""; sub(/^ */, "")} 1' <<< "$s"
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720

For this concrete example you can do it in short way:
sed 's/[^x0-9]/ /g'
and
x='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
y=(${x//[^x0-9]/ })
will remove everything execpt x and digits 0-9, so output (or array y) is
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720

x="Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720"
x=${x#*=} # remove left part including =
array=(${x//,/ }) # replace all `,` with whitespace and create array
echo ${array[#]} # print array $array
Output:
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720

Related

Shell Script regex matches to array and process each array element

While I've handled this task in other languages easily, I'm at a loss for which commands to use when Shell Scripting (CentOS/BASH)
I have some regex that provides many matches in a file I've read to a variable, and would like to take the regex matches to an array to loop over and process each entry.
Regex I typically use https://regexr.com/ to form my capture groups, and throw that to JS/Python/Go to get an array and loop - but in Shell Scripting, not sure what I can use.
So far I've played with "sed" to find all matches and replace, but don't know if it's capable of returning an array to loop from matches.
Take regex, run on file, get array back. I would love some help with Shell Scripting for this task.
EDIT:
Based on comments, put this together (not working via shellcheck.net):
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=($(sed 'asset\((.*)\)' $examplefile))
for el in ${!examplearr[*]}
do
echo "${examplearr[$el]}"
done
This works in bash on a mac:
#!/bin/sh
examplefile="
asset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')
asset('3a/3b/3c.ext')
"
examplearr=(`echo "$examplefile" | sed -e '/.*/s/asset(\(.*\))/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
output:
'1a/1b/1c.ext'
'2a/2b/2c.ext'
'3a/3b/3c.ext'
Note the wrapping of $examplefile in quotes, and the use of sed to replace the entire line with the match. If there will be other content in the file, either on the same lines as the "asset" string or in other lines with no assets at all you can refine it like this:
#!/bin/sh
examplefile="
fooasset('1a/1b/1c.ext')
asset('2a/2b/2c.ext')bar
foobar
fooasset('3a/3b/3c.ext')bar
"
examplearr=(`echo "$examplefile" | grep asset | sed -e '/.*/s/^.*asset(\(.*\)).*$/\1/'`)
for el in ${examplearr[*]}; do
echo "$el"
done
and achieve the same result.
There are several ways to do this. I'd do with GNU grep with perl-compatible regex (ah, delightful line noise):
mapfile -t examplearr < <(grep -oP '(?<=[(]).*?(?=[)])' <<<"$examplefile")
for i in "${!examplearr[#]}"; do printf "%d\t%s\n" $i "${examplearr[i]}"; done
0 '1a/1b/1c.ext'
1 '2a/2b/2c.ext'
2 '3a/3b/3c.ext'
This uses the bash mapfile command to read lines from stdin and assign them to an array.
The bits you're missing from the sed command:
$examplefile is text, not a filename, so you have to send to to sed's stdin
sed's a funny little language with 1-character commands: you've given it the "a" command, which is inappropriate in this case.
you only want to output the captured parts of the matches, not every line, so you need the -n option, and you need to print somewhere: the p flag in s///p means "print the [line] if a substitution was made".
sed -n 's/asset\(([^)]*)\)/\1/p' <<<"$examplefile"
# or
echo "$examplefile" | sed -n 's/asset\(([^)]*)\)/\1/p'
Note that this returns values like ('1a/1b/1c.ext') -- with the parentheses. If you don't want them, add the -r or -E option to sed: among other things, that flips the meaning of ( and \(

Using two files to search/replace a third file

I have two files:
correct.txt
the sky is blue
I like eat apple
.
.
and wrong.txt
the sky are blue
I like eat apple
.
.
.
There are a lot of lines in both files.
Now, I want to correct a third file using my search in the "wrong.txt"
to correct it using the "correct.txt".
I have created two files:
readarray -t correct_array < correct.txt
readarray -t wrong_array < wrong.txt
The file to be corrected is to_be_corrected.txt
This works:
for c in "${correct_array[#]}"
do
echo "$c"
done
I tried this
for e in "${correct_array[#]}"
do
sed -i.bak 's/$wrong_array[#]/$correct_array[#]/' to_be_corrected.txt
done
But this did not work.
How can I use sed with arrays?
You are using single quotes (') for your sed command, so the shell is not evaluating the variables $wrong_array[#] and $correct_array[#]. Try double quotes and braces on the variables. Also, you are using the entire array with ${correct_array[#]}. You need to pair the elements together, perhaps with an index:
for ((e=0; e<"${#correct_array[#]}"; ++e)); do
sed -i.bak "s/${wrong_array[$e]}/${correct_array[$e]}/" to_be_corrected.txt
done
This iterates e over the indexes of the array (${#correct_array[#]} gives the size of the array) then e is used to index the corresponding elements of wrong_array and correct_array. Hopefully you don't have any quotes (single or double) in your text files.
You should always use {} with arrays. This doesn't work:
$array[1]
But this will:
${array[1]}
As pointed out by e0k you should also use double quoted otherwise the variable won't be expanded to it's actual value.
Don't know what exaclty your array has, but I think you want to iterate it instead of use the whole thing. Try this approach:
for i in `seq 0 $((${#correct_array[#]}-1))`; do
sed -i.bak "s/${wrong_array[$i]}/${correct_array[$i]}/" to_be_corrected.txt
done

Parsing HTML to array only returns one word

I'm trying to parse some HTML subtitles into an array using Bash and html-xml-utils, and I've tried using a Lynx dump to pretty it up, but I had the same problem, because I can't get my sed to put more than one word at a time into the array.
Code:
array=($(echo $PAGE |
hxselect -i ".sub_info_container .sub_title" |
sed -r 's/.*\">(.*)<\/a>.*/\1/' ))
echo $array
This gets piped into sed:
<div class="sub_title"><a class="sub_title" href="/link">Some Random Title.</a></div><div class="sub_title"><a class="sub_title" href="/link2">Another subtitle I want.</a>
Output of echo $array:
Some
What I'm trying to get:
Some Random Title
Without the punctuation would be nice, and the subtitles often have ? or ! instead of period, but it could work including punctuation too.
Things I've tried:
Using Lynx to pretty up the code, then using awk to grab the elements
A lot of different sed and awk methods of grabbing the text
I'm not sure why, but my code ended up separating spaces into separate items. The solution was the following code:
array=($(echo $PAGE |
hxselect -i ".sub_info_container .sub_title" |
lynx -stdin -dump | tr " " - ))
I used tr to turn the spaces into dashes, allowing it to be passed into the array. Taking off the extra parenthesis as everybody suggested actually removed the function of assigning the values into an array, as I stated was my intention. After the code completed I simply re-converted all the dashes back to spaces. It's not pretty but it works!
Try this:
s='<div class="sub_title"><a class="sub_title" href="/link">Some Random Title.</a></div><div class="sub_title"><a class="sub_title" href="/link2">Another subtitle I want.</a>'
array=$(echo "$s" | sed 's/<\/div><div /\n/' | sed -r 's/.*\">(.*)<\/a>.*/\1/g')
echo "$array"
I had to add a newline between the divs to match both. I'm not that good with sed and couldn't figure out how to do it without that.
Your main problem was with the extra parenthesis
array=($(echo .....))

How to remove numbers from extensions from files

I have many files in a directory having extension like
.text(2) and .text(1).
I want to remove the numbers from extension and output should be like
.text and .text .
can anyone please help me with the shell script for that?
I am using centOs.
A pretty portable way of doing it would be this:
for i in *.text*; do mv "$i" "$(echo "$i" | sed 's/([0-9]\{1,\})$//')"; done
Loop through all files which end in .text followed by anything. Use sed to remove any parentheses containing one or more digits from the end of each filename.
If all of the numbers within the parentheses are single digits and you're using bash, you could also use built-in parameter expansion:
for i in *.text*; do mv "$i" "${i%([0-9])}"; done
The expansion removes any parentheses containing a single digit from the end of each filename.
Another way without loops, but also with sed (and all the regexp's inside) is piping to sh:
ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
Example:
[...]$ ls
xxxx.text(1) yyyy.text(2)
[...]$ ls *text* | sed 's/\(.*\)\..*/mv \1* \1.text/' | sh
[...]$ ls
xxxx.text yyyy.text
Explanation:
Everything between \( and \) is stored and can be pasted again by \1 (or \2, \3, ... a consecutive number for each pair of parentheses used). Therefore, the code above stores all the characters before the first dot \. and after that, compounds a sequence like this:
mv xxxx* xxxx.text
mv yyyy* yyyy.text
That is piped to sh
Most simple way if files are in same folder
rename 's/text\([0-9]+\)/text/' *.text*
link

How can I make 'grep' show a single line five lines above the grepped line?

I've seen some examples of grepping lines before and after, but I'd like to ignore the middle lines.
So, I'd like the line five lines before, but nothing else.
Can this be done?
OK, I think this will do what you're looking for. It will look for a pattern, and extract the 5th line before each match.
grep -B5 "pattern" filename | awk -F '\n' 'ln ~ /^$/ { ln = "matched"; print $1 } $1 ~ /^--$/ { ln = "" }'
basically how this works is it takes the first line, prints it, and then waits until it sees ^--$ (the match separator used by grep), and starts again.
If you only want to have the 5th line before the match you can do this:
grep -B 5 pattern file | head -1
Edit:
If you can have more than one match, you could try this (exchange pattern with your actual pattern):
sed -n '/pattern/!{H;x;s/^.*\n\(.*\n.*\n.*\n.*\n.*\)$/\1/;x};/pattern/{x;s/^\([^\n]*\).*$/\1/;p}' file
I took this from a Sed tutorial, section: Keeping more than one line in the hold buffer, example 2 and adapted it a bit.
This is option -B
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing -- between contiguous groups of
matches.
This way is easier for me:
grep --no-group-separator -B5 "pattern" file | sed -n 1~5p
This greps 5 lines before and including the pattern, turns off the --- group separator, then prints every 5th line.

Resources