Using sed in batchfile to get string between two other strings - batch-file

I have an file with this html code inside:
<p class="center-block"><img alt="ourpicture" class="picture" src="http://mypage.com/ourpicture123" /></p>
Now I would like to get just the source like http://mypage.com/ourpicture123.
How can I handle this problem with sed? It would be great if I can look for 'src="' before and '"' after.

Through sed,
$ sed -n 's/.*\bsrc="\([^"]*\)".*/\1/p' file
http://mypage.com/ourpicture123
Through grep,
grep -oP '\bsrc="\K[^"]*(?=")' file
The above sed command won't work if a line contains more than one src attribute present on a line. \K in the above grep command would discard the previously matched src=" characters from printing at the final.

Here is an awk version:
awk -F'src="' '{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123
Or like this:
awk -F'src="' '{sub(/".*$/,"",$2);print $2}' file
http://mypage.com/ourpicture123
If you have several lines, and only needs line with src= do:
awk -F'src="' 'NF>1{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123

Related

Search and delete links in markdown files

I run from time to time a linkchecker over my site and the external links 404 will be saved to a logfile.
Now I try to delete the links automated from the markdown files. I use multilingual websites so I start read in the logfile to an array.
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -r $i content/* | sed -e 's/([^()]*)//g'
done
This command deletes the link and title with () but the [Example Text] remains. I search for a way to remove [] so that at the end I only get Example Text.
Now:
[Example Text](http://example.com "Example Title")
Desired result:
Example Text
Assumptions
The i in for i in "${link[#]}" will evaluate to be a link like "http://example.com" each loop
The format of every section in your markdown file we care about will take on the form you described [Example Text](http://example.com "Example Title")
The code
IFS=$'\n'
link=( $(awk '{print $7}' $ext) )
for i in "${link[#]}"; do
grep -ro "\[.*\].*${i}" content/* | grep -o '\[.*\]' | tr -d '[]'
done
Explanation
grep -ro "\[.*\].*${i}" content/*:
Recursive search to run on all files in a dir: grep -r ... content/*
Print only the text that applies to our regex: grep -o
Print anything that starts with [ followed by anything .* then a ] followed by the value of our loop variable ${i} (The current link): "\[.*\].*${i}"
From that output all we want is "Example Text" which lives between the brackets, so anything not between brackets needs to go grep -o '\[.*\]'
Finally, we want to remove those pesky brackets: tr -d '[]'
The immediate fix is to extend your sed regex.
sed 's/\[\([^][]*\)\]([^()]*)/\1/g'
But probably a much better fix is to replace all the lines from the Awk script in content in a single go.
find content -type f -exec \
sed -i 's%\[\([^][]*\)\('"$(
awk 'NR>1 { printf "\|" }
{ printf "%s", $7 }' "$ext")"'\)%\1%g'
The Awk script produces a long regex like
http://one.example.net/nosuchpage\|http://two.exampe.org/404\|https://three.example.com/broken-link
from all the links in the input, and the sed script then replaces any links which match this regex in the parentheses after the square brackets. (Maybe you'll want to extend this to also permit a quoted string after the link before the closing round parenthesis, like in your example; I feel I am already quessing too many things about what you are actually hoping to accomplish.)
If you are on a *BSD platform (including MacOS) you'll need to add an empty string ar[ument after the -i argument, like sed -i '' 's%...

I want to edit a specific lines (multiple) with sed command

I have a test file having around 20K lines in that file I want to change some specific string in specific lines I am getting the line number and strings to change.here I have a scenario where I want to change the one string to another in multiple lines. I used earlier like
sed -i '12s/stringone/stringtwo/g' filename
but in this case I have to run the multiple commands for same test like
sed -i '15s/stringone/stringtwo/g' filename
sed -i '102s/stringone/stringtwo/g' filename
sed -i '11232s/stringone/stringtwo/g' filename
Than I tried below
sed -i '12,15,102,11232/stringone/stringtwo/g' filename
but I am getting the error
sed: -e expression #1, char 5: unknown command: `,'
Please some one help me to achieve this.
To get the functionality you're trying to get with GNU sed would be this in GNU awk:
awk -i inplace '
BEGIN {
split("12 15 102 11232",tmp)
for (i in tmp) lines[tmp[i]]
}
NR in lines { gsub(/stringone/,"stringtwo") }
' filename
Just like with a sed script, the above will fail when the strings contain regexp or backreference metacharacters. If that's an issue then with awk you can replace gsub() with index() and substr() for string literal operations (which are not supported by sed).
You get the error because the N,M in sed is a range (from N to M) and doesn't apply to a list of single line number.
An alternative is to use printf and sed:
sed -i "$(printf '%ds/stringone/stringtwo/g;' 12 15 102 11232)" filename
The printf statement is repeating the pattern Ns/stringone/stringtwo/g; for all numbers N in argument.
This might work for you (GNU sed):
sed '12ba;15ba;102ba;11232ba;b;:a;s/pattern/replacement/' file
For each address, branch to a common place holder (in this case :a) and do a substitution, otherwise break out of the sed cycle.
If the addresses were in a file:
sed 's/.*/&ba/' fileOfAddresses | sed -f - -e 'b;:a;s/pattern/replacement/' file

How to split lines in a file, and have the output names be based on those lines

I am using CentOS. I have a file that contains information like:
100000,UniqueName1
100000,UniqueName2
100000,UniqueName4
100000,SoloName9
I want to split this out into files, one for each line, each named:
[secondvalue]_file.txt
For an example:
SoloName9_file.txt
Is it possible to split the file in this fashion using a command, or will I need to write a shell script? If the former, what's the command?
Thank you!
Here's one approach. Use the sed command to turn this file into a valid shell script that you can then execute.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile >your.sh
chmod +x your.sh
./your.sh
Note that trailing whitespace in the file would take some additional work.
Writing it into a shell script file gives you a chance to review it, but you can also execute it as a single line.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile | sh

How to edit a file with shell scripting

I have a file containing thousands of lines like this:
0x7f29139ec6b3: W 0x7fff06bbf0a8
0x7f29139f0010: W 0x7fff06bbf0a0
0x7f29139f0014: W 0x7fff06bbf098
0x7f29139f0016: W 0x7fff06bbf090
0x7f29139f0036: R 0x7f2913c0db80
I want to make a new file which contains only the second hex number on each line (the part marked in bold above)
I have to put all these hex numbers in an array in a C program. So I am trying to make a file with only the hex numbers on the right hand side, so that my C program can use the fscanf function to read these numbers from the modified file.
I guess we can use some shell script to make a file containing those hex numbers? grep or something?
You can use sed and edit inplace. For matching "R" or any other char use
sed -i "s/.*:..//g" file
cat file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090
You can use grep -oP command:
grep -oP ' \K0x[a-fA-F0-9]*' file
0x7fff06bbf0a8
0x7fff06bbf0a0
0x7fff06bbf098
0x7fff06bbf090
0x7f2913c0db80
You can run a command on the file that will create a new file in the format you want: somecommand <oldfile >newfile. That will leave the original file intact and create a new one for you to feed to your C program.
As to what somecommand should be, you have multiple options. The easiest is probably awk:
awk '{print $NF}'
But you can also do it with sed or grep or perl or cut ... see other answers for an embarrassment of choices.
Since it seems that you always want to select the third field, the simplest approach is to use cut:
cut -d ' ' -f 3 file
or awk:
awk '{print $3}' file

sed counting lines incorrect

Running sed for counting lines in my file returns 1, but Sublime and Textedit count more than 88000 lines. Why does sed do that? How can I fix it?
$sed -n '$=' out_data1.txt
1
I use sed to count lines of a very large file ~10GB of mongodb query result to split it later for multithread.
You command should work, but try:
wc -l out_data1.txt
or just for test
awk 'END {print NR}' data1.txt
sed have some buffer limit but try (i don't recommand sed on huge file especially just for counting lines)
sed -u -n "$="
maybe a "s/.*//;$=" if there is also a buffer problem on line size itself

Resources