sed counting lines incorrect - file

Running sed for counting lines in my file returns 1, but Sublime and Textedit count more than 88000 lines. Why does sed do that? How can I fix it?
$sed -n '$=' out_data1.txt
1
I use sed to count lines of a very large file ~10GB of mongodb query result to split it later for multithread.

You command should work, but try:
wc -l out_data1.txt
or just for test
awk 'END {print NR}' data1.txt

sed have some buffer limit but try (i don't recommand sed on huge file especially just for counting lines)
sed -u -n "$="
maybe a "s/.*//;$=" if there is also a buffer problem on line size itself

Related

I want to edit a specific lines (multiple) with sed command

I have a test file having around 20K lines in that file I want to change some specific string in specific lines I am getting the line number and strings to change.here I have a scenario where I want to change the one string to another in multiple lines. I used earlier like
sed -i '12s/stringone/stringtwo/g' filename
but in this case I have to run the multiple commands for same test like
sed -i '15s/stringone/stringtwo/g' filename
sed -i '102s/stringone/stringtwo/g' filename
sed -i '11232s/stringone/stringtwo/g' filename
Than I tried below
sed -i '12,15,102,11232/stringone/stringtwo/g' filename
but I am getting the error
sed: -e expression #1, char 5: unknown command: `,'
Please some one help me to achieve this.
To get the functionality you're trying to get with GNU sed would be this in GNU awk:
awk -i inplace '
BEGIN {
split("12 15 102 11232",tmp)
for (i in tmp) lines[tmp[i]]
}
NR in lines { gsub(/stringone/,"stringtwo") }
' filename
Just like with a sed script, the above will fail when the strings contain regexp or backreference metacharacters. If that's an issue then with awk you can replace gsub() with index() and substr() for string literal operations (which are not supported by sed).
You get the error because the N,M in sed is a range (from N to M) and doesn't apply to a list of single line number.
An alternative is to use printf and sed:
sed -i "$(printf '%ds/stringone/stringtwo/g;' 12 15 102 11232)" filename
The printf statement is repeating the pattern Ns/stringone/stringtwo/g; for all numbers N in argument.
This might work for you (GNU sed):
sed '12ba;15ba;102ba;11232ba;b;:a;s/pattern/replacement/' file
For each address, branch to a common place holder (in this case :a) and do a substitution, otherwise break out of the sed cycle.
If the addresses were in a file:
sed 's/.*/&ba/' fileOfAddresses | sed -f - -e 'b;:a;s/pattern/replacement/' file

sed addressing for each of multiple input files

I would like to print from line 10 until the end of the file for each of several files in a folder. For a single file, I would do this with sed -n '10,$p', however when providing multiple input files to sed the addressing becomes in terms of the concatenated files. How can I print using the sed command and address each file's line numbers? This website says that the $ addressing character refers to each file's end if the -s option is used, but this does not work for me on my Macbook Pro.
Ideally I would like the whole procedure to be done with a single tool without writing a loop. I'm ok with the output being concatenated. I'm open to other tools than sed. tail might work for this like so tail -n +10 filenames but this is very very slow, so I imagine sed is better to use.
awk 'FNR>9{print $0}' file1 file2
This will do it

Using sed in batchfile to get string between two other strings

I have an file with this html code inside:
<p class="center-block"><img alt="ourpicture" class="picture" src="http://mypage.com/ourpicture123" /></p>
Now I would like to get just the source like http://mypage.com/ourpicture123.
How can I handle this problem with sed? It would be great if I can look for 'src="' before and '"' after.
Through sed,
$ sed -n 's/.*\bsrc="\([^"]*\)".*/\1/p' file
http://mypage.com/ourpicture123
Through grep,
grep -oP '\bsrc="\K[^"]*(?=")' file
The above sed command won't work if a line contains more than one src attribute present on a line. \K in the above grep command would discard the previously matched src=" characters from printing at the final.
Here is an awk version:
awk -F'src="' '{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123
Or like this:
awk -F'src="' '{sub(/".*$/,"",$2);print $2}' file
http://mypage.com/ourpicture123
If you have several lines, and only needs line with src= do:
awk -F'src="' 'NF>1{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123

How to split lines in a file, and have the output names be based on those lines

I am using CentOS. I have a file that contains information like:
100000,UniqueName1
100000,UniqueName2
100000,UniqueName4
100000,SoloName9
I want to split this out into files, one for each line, each named:
[secondvalue]_file.txt
For an example:
SoloName9_file.txt
Is it possible to split the file in this fashion using a command, or will I need to write a shell script? If the former, what's the command?
Thank you!
Here's one approach. Use the sed command to turn this file into a valid shell script that you can then execute.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile >your.sh
chmod +x your.sh
./your.sh
Note that trailing whitespace in the file would take some additional work.
Writing it into a shell script file gives you a chance to review it, but you can also execute it as a single line.
sed -e 's/^/echo /g' -e 's/,/ >/g' -e 's/$/_file.txt/g' <your.textfile | sh

How do I let sed 'w' command know where the filename ends?

Every example I was able to find demonstrating the w command of sed has it in the end of the script. What if I can't do that?
An example will probably demonstrate the problem better:
$ echo '123' | sed 'w tempfile; s/[0-9]/\./g'
sed: couldn't open file tempfile; s/[0-9]/\./g: No such file or directory
(How) can I change the above so that sed knows where the filename ends?
P.S. I'm aware that I can do
$ echo '123' | sed 'w tempfile
> s/[0-9]/\./g'
...
Are there prettier options?
P.P.S. People tend to suggest to split it in two scripts. The question is then: is it safe? What if I was going to branch somewhere after the w command, and so on. Can someone confirm that any script can be split in two after any command and that will not affect the results?
Final edit: I checked that multiple -e work just as concatenated commands. I thought it was more complex (like the first one should always exit before the second one starts, etc.). However, I tried splitting a {..} block of commands between two scripts and it still worked, so the w thing is really not a serious problem. Thanks to all.
You can give a two line script to sed in one shell line:
echo '123' | sed -e 'w tempfile' -e 's/[0-9]/\./g'
This might work for you (if you're using BASH and probably GNU sed):
echo '123' | sed 'w tempfile'$'\n'';s/[0-9]/\./g'
Explanation:
The r, R and w commands need a newline to terminate the file name.
The answer to the question is "newline":
sed will treat a non-escaped literal newline as the end of the file name.
If your shell is bash, or supports the $'\n' syntax, you can solve the OP's original question this way:
echo '123' | sed 'w tempfile'$'\n''s/[0-9]/\./g'
In a more limited sh you can say
$ echo '123' | sed 'w tempfile'\
> 's/[0-9]/\./g'
What I did here was write \ as an escape, then hit enter and wrote the rest of the command there. Note that here I am escaping the newline from bash but it is being passed to sed.
Reverse the 2 sed command sequences like this:
echo '123' | sed 's/[0-9]/\./g;w tempfile'
i.e. perform replacements first and then write pattern space into a file.
EDIT: There was some misunderstanding whether OP wants replaced text in final file or not. My above command puts replaced text in tempfile. Since this is not what OP wanted here is one more version that avoids it:
echo '123' | sed -e 'h;s/[0-9]/\./g;g;w tempfile'

Resources