Removing double new lines using awk/sed - file

Hey everyone.
I have a file full of data, each row consists of something similiar to "755545;45634;1244". Sometimes somewhere there can occur an unknown number of additional new lines, which I dont want.
Example:
256163;16816;1651
16156;165165;1165
15153;135135;15351
15153;1351;8
165;15313;153513
254;45;45
Desired output:
256163;16816;1651
16156;165165;1165
15153;135135;15351
15153;1351;8
165;15313;153513
254;45;45
Can this be done easily with awk/sed utility in unix?

The answer from #Luixv is correct if there are no whitespace on the "empty" lines.
If whitespace is present use this instead:
sed '/^[ \t]*$/d'
Thats an space before the \t within the brackets, i.e. [space\t]
If this doesn't work, you might have a problem with newlines. Do a:
$ file test_file
test_file: ISO-8859 text, with CRLF, LF line terminators
If you get the output above, convert the file to unix using:
$ dos2unix test_file

sed '/^$/d'

sed -nre "s/([^$])/\1/p" input

sed -n 's/^[ ,\t]*$/!p' filename

Solved with ssed super_sed if you do not have it installed, install it.
ssed -R -e '/(^$|\s)/ d' yourFile
or
cat yourFile| ssed -R -e '/(^$|\s)/ d'
Happy seding
PS: will work even if you have tabs or \r \t \n hence the \s in the ReqExp.
\r = Return Carriage
\n = New Line
\t = Tab

Related

Sed: Match, remove and replace in one sed call

Let's say I have an string like:
Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720
I would like to use sed to remove the first part (Image.Resolution=) and then split the rest by comma so I can put all the resolutions in a bash array.
I know how to do it in two steps (two sed calls) like:
sed 's/Image.Resolution=//g' | sed 's/,/ /g'.
But as an exercise, I'd like to know if there's a way of doing it in one shot.
Thank you in advance.
Just put ; between the commands:
sed 's/Image.Resolution=//g; s/,/ /g'
From info sed:
3 `sed' Programs
****************
A `sed' program consists of one or more `sed' commands, passed in by
one or more of the `-e', `-f', `--expression', and `--file' options, or
the first non-option argument if zero of these options are used. This
document will refer to "the" `sed' script; this is understood to mean
the in-order catenation of all of the SCRIPTs and SCRIPT-FILEs passed
in.
Commands within a SCRIPT or SCRIPT-FILE can be separated by
semicolons (`;') or newlines (ASCII 10). Some commands, due to their
syntax, cannot be followed by semicolons working as command separators
and thus should be terminated with newlines or be placed at the end of
a SCRIPT or SCRIPT-FILE. Commands can also be preceded with optional
non-significant whitespace characters.
This awk can also work:
s='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
awk -F '[=,]' '{$1=""; sub(/^ */, "")} 1' <<< "$s"
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720
For this concrete example you can do it in short way:
sed 's/[^x0-9]/ /g'
and
x='Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720'
y=(${x//[^x0-9]/ })
will remove everything execpt x and digits 0-9, so output (or array y) is
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720
x="Image.Resolution=1024x768,800x600,640x480,480x360,320x240,240x180,160x120,1280x720"
x=${x#*=} # remove left part including =
array=(${x//,/ }) # replace all `,` with whitespace and create array
echo ${array[#]} # print array $array
Output:
1024x768 800x600 640x480 480x360 320x240 240x180 160x120 1280x720

sed command works fine under shell terminal, but fails in 'system()' call under C code

I'm trying to delete some special lines in a log file, so I use sed of busybox on an embeded linux system.
# sed
BusyBox v1.18.4 (2013-01-16 16:00:18 CST) multi-call binary.
Usage: sed [-efinr] SED_CMD [FILE]...
Options:
-e CMD Add CMD to sed commands to be executed
-f FILE Add FILE contents to sed commands to be executed
-i Edit files in-place (else sends result to stdout)
-n Suppress automatic printing of pattern space
-r Use extended regex syntax
If no -e or -f, the first non-option argument is the sed command string.
Remaining arguments are input files (stdin if none).
execute the following command under shell and everything works fine:
export MODULE=sshd
sed "/$MODULE\[/d" logfile
but if I try to use the following C code to accomplish this:
char logfile[] = "logfile";
char module_str[] = "sshd";
char env_str[64] = {0};
int offset = 0;
strcpy(env_str, "MODULE=");
offset += strlen("MODULE=");
strcpy(env_str + offset, module_str);
putenv(env_str);
system("sed \"/$MODULE\[/d\" logfile");
when executing the a.out, I got the error message:
sed: unmatched '/'
what's wrong with my 'system()' call? I'm totally a newbie in text processing, so anybody can give me some clue? Thanks.
Best regards,
dejunl
straight off I can see that the \ before the [ is going to be swallowed by 'C'
so you'll need to double it,
system("sed \"/$MODULE\\[/d\" logfile");
But the shell might want to swallow the one that's left swallow that one so double it again
system("sed \"/$MODULE\\\\[/d\" logfile");
of course system("sed \"/$MODULE\\[/d\" logfile"); can't be sure I'm reading the question you posed. try it with echo instead of sed and adjust it until the string comes out as you want sed to see it.

How do I let sed 'w' command know where the filename ends?

Every example I was able to find demonstrating the w command of sed has it in the end of the script. What if I can't do that?
An example will probably demonstrate the problem better:
$ echo '123' | sed 'w tempfile; s/[0-9]/\./g'
sed: couldn't open file tempfile; s/[0-9]/\./g: No such file or directory
(How) can I change the above so that sed knows where the filename ends?
P.S. I'm aware that I can do
$ echo '123' | sed 'w tempfile
> s/[0-9]/\./g'
...
Are there prettier options?
P.P.S. People tend to suggest to split it in two scripts. The question is then: is it safe? What if I was going to branch somewhere after the w command, and so on. Can someone confirm that any script can be split in two after any command and that will not affect the results?
Final edit: I checked that multiple -e work just as concatenated commands. I thought it was more complex (like the first one should always exit before the second one starts, etc.). However, I tried splitting a {..} block of commands between two scripts and it still worked, so the w thing is really not a serious problem. Thanks to all.
You can give a two line script to sed in one shell line:
echo '123' | sed -e 'w tempfile' -e 's/[0-9]/\./g'
This might work for you (if you're using BASH and probably GNU sed):
echo '123' | sed 'w tempfile'$'\n'';s/[0-9]/\./g'
Explanation:
The r, R and w commands need a newline to terminate the file name.
The answer to the question is "newline":
sed will treat a non-escaped literal newline as the end of the file name.
If your shell is bash, or supports the $'\n' syntax, you can solve the OP's original question this way:
echo '123' | sed 'w tempfile'$'\n''s/[0-9]/\./g'
In a more limited sh you can say
$ echo '123' | sed 'w tempfile'\
> 's/[0-9]/\./g'
What I did here was write \ as an escape, then hit enter and wrote the rest of the command there. Note that here I am escaping the newline from bash but it is being passed to sed.
Reverse the 2 sed command sequences like this:
echo '123' | sed 's/[0-9]/\./g;w tempfile'
i.e. perform replacements first and then write pattern space into a file.
EDIT: There was some misunderstanding whether OP wants replaced text in final file or not. My above command puts replaced text in tempfile. Since this is not what OP wanted here is one more version that avoids it:
echo '123' | sed -e 'h;s/[0-9]/\./g;g;w tempfile'

How can I make 'grep' show a single line five lines above the grepped line?

I've seen some examples of grepping lines before and after, but I'd like to ignore the middle lines.
So, I'd like the line five lines before, but nothing else.
Can this be done?
OK, I think this will do what you're looking for. It will look for a pattern, and extract the 5th line before each match.
grep -B5 "pattern" filename | awk -F '\n' 'ln ~ /^$/ { ln = "matched"; print $1 } $1 ~ /^--$/ { ln = "" }'
basically how this works is it takes the first line, prints it, and then waits until it sees ^--$ (the match separator used by grep), and starts again.
If you only want to have the 5th line before the match you can do this:
grep -B 5 pattern file | head -1
Edit:
If you can have more than one match, you could try this (exchange pattern with your actual pattern):
sed -n '/pattern/!{H;x;s/^.*\n\(.*\n.*\n.*\n.*\n.*\)$/\1/;x};/pattern/{x;s/^\([^\n]*\).*$/\1/;p}' file
I took this from a Sed tutorial, section: Keeping more than one line in the hold buffer, example 2 and adapted it a bit.
This is option -B
-B NUM, --before-context=NUM
Print NUM lines of leading context before matching lines.
Places a line containing -- between contiguous groups of
matches.
This way is easier for me:
grep --no-group-separator -B5 "pattern" file | sed -n 1~5p
This greps 5 lines before and including the pattern, turns off the --- group separator, then prints every 5th line.

Convert DOS/Windows line endings to Linux line endings in Vim

If I open files I created in Windows, the lines all end with ^M.
How do I delete these characters all at once?
dos2unix is a commandline utility that will do this, or :%s/^M//g will if you use Ctrl-v Ctrl-m to input the ^M, or you can :set ff=unix and Vim will do it for you.
There is documentation on the fileformat setting, and the Vim wiki has a comprehensive page on line ending conversions.
Alternately, if you move files back and forth a lot, you might not want to convert them, but rather to do :set ff=dos, so Vim will know it's a DOS file and use DOS conventions for line endings.
Change the line endings in the view:
:e ++ff=dos
:e ++ff=mac
:e ++ff=unix
This can also be used as saving operation (:w alone will not save using the line endings you see on screen):
:w ++ff=dos
:w ++ff=mac
:w ++ff=unix
And you can use it from the command-line:
for file in *.cpp
do
vi +':w ++ff=unix' +':q' "$file"
done
I typically use
:%s/\r/\r/g
which seems a little odd, but works because of the way that Vim matches linefeeds. I also find it easier to remember :)
I prefer to use the following command:
:set fileformat=unix
You can also use mac or dos to respectively convert your file to Mac or MS-DOS/Windows file convention. And it does nothing if the file is already in the correct format.
For more information, see the Vim help:
:help fileformat
:set fileformat=unix to convert from DOS to Unix.
:%s/\r\+//g
In Vim, that strips all carriage returns, and leaves only newlines.
In VIM:
:e ++ff=dos | set ff=unix | w!
In shell with VIM:
vim some_file.txt +'e ++ff=dos | set ff=unix | wq!'
e ++ff=dos - force open file in dos format.
set ff=unix - convert file to unix format.
From: File format
[Esc] :%s/\r$//
dos2unix can directly modify the file contents.
You can directly use it on the file, without any need for temporary file redirection.
dos2unix input.txt input.txt
The above uses the assumed US keyboard. Use the -437 option to use the UK keyboard.
dos2unix -437 input.txt input.txt
Convert directory of files from DOS to Unix
Using command line and sed, find all files in current directory with the extension ".ext" and remove all "^M"
# https://gist.github.com/sparkida/7773170
find $(pwd) -type f -name "*.ext" | while read file; do sed -e 's/^M//g' -i "$file"; done;
Also, as mentioned in a previous answer, ^M = Ctrl+V + Ctrl+M (don't just type the caret "^" symbol and M).
tr -d '\15\32' < winfile.txt > unixfile.txt
(See: Convert between Unix and Windows text files)
To run directly in a Linux console:
vim file.txt +"set ff=unix" +wq
The following steps can convert the file format for DOS to Unix:
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:setlocal ff=unix This buffer will use LF-only line endings when written.[A 2]
:w Write buffer using Unix (LF-only) line endings.
Reference: File format
I found a very easy way: Open the file with nano: nano file.txt
Press Ctrl + O to save, but before pressing Enter, press: Alt+D to toggle between DOS and Unix/Linux line-endings, or: Alt+M to toggle between Mac and Unix/Linux line-endings, and then press Enter to save and Ctrl+X to quit.
The comment about getting the ^M to appear is what worked for me. Merely typing "^M" in my vi got nothing (not found). The CTRL+V CTRL+M sequence did it perfectly though.
My working substitution command was
:%s/Ctrl-V Ctrl-M/\r/g
and it looked like this on my screen:
:%s/^M/\r/g
With the following command:
:%s/^M$//g
To get the ^M to appear, type CtrlV and then CtrlM. CtrlV tells Vim to take the next character entered literally.
:g/Ctrl-v Ctrl-m/s///
CtrlM is the character \r, or carriage return, which DOS line endings add. CtrlV tells Vim to insert a literal CtrlM character at the command line.
Taken as a whole, this command replaces all \r with nothing, removing them from the ends of lines.
You can use:
vim somefile.txt +"%s/\r/\r/g" +wq
Or the dos2unix utility.
You can use the following command:
:%s/^V^M//g
where the '^' means use CTRL key.
The below command is used for reformating all .sh file in the current directory. I tested it on my Fedora OS.
for file in *.sh; do awk '{ sub("\r$", ""); print }' $file >luxubutmp; cp -f luxubutmp $file; rm -f luxubutmp ;done
In Vim, type:
:w !dos2unix %
This will pipe the contents of your current buffer to the dos2unix command and write the results over the current contents. Vim will ask to reload the file after.
I wanted newlines in place of the ^M's. Perl to the rescue:
perl -pi.bak -e 's/\x0d/\n/g' excel_created.txt
Or to write to stdout:
perl -p -e 's/\x0d/\n/g' < excel_created.txt
Usually there is a dos2unix command you can use for this. Just make sure you read the manual as the GNU and BSD versions differ on how they deal with the arguments.
BSD version:
dos2unix $FILENAME $FILENAME_OUT
mv $FILENAME_OUT $FILENAME
GNU version:
dos2unix $FILENAME
Alternatively, you can create your own dos2unix with any of the proposed answers here, for example:
function dos2unix(){
[ "${!}" ] && [ -f "{$1}" ] || return 1;
{ echo ':set ff=unix';
echo ':wq';
} | vim "${1}";
}
From Wikia:
%s/\r\+$//g
That will find all carriage return signs (one and more reps) up to the end of line and delete, so just \n will stay at EOL.
This is my way. I opened a file in DOS EOL and when I save the file, that will automatically convert to Unix EOL:
autocmd BufWrite * :set ff=unix
If you create a file in Notepad or Notepad++ in Windows, bring it to Linux, and open it by Vim, you will see ^M at the end of each line. To remove this,
At your Linux terminal, type
dos2unix filename.ext
This will do the required magic.
I knew I'd seen this somewhere. Here is the FreeBSD login tip:
Do you need to remove all those ^M characters from a DOS file? Try
tr -d \\r < dosfile > newfile
-- Originally by Dru <genesis#istar.ca>
This is a little more than you asked for but:
nmap <C-d> :call range(line('w0'),line('w$'))->map({_,v-> getline(v)})->map({_,v->trim(v,join(map(range(1,0x1F)+[0xa0],{n->n->nr2char()}),''),2)})->map({k,v->setline(k+1,v)})<CR>
Run this and :set ff=unix|dos and no more need for unix2dos.
the single arg form of trim() has the same default mask above, plus 0X20 (an actual space) instead of 0x1F
that default mask clears out all non-printing chars including non-breaking spaces [0xa0] that are hard to find
create a list of lines from the range of lines
map that list to the trim function with using the same mask code as the source, less spaces
map that again to setline to replace the lines.
all :set fileformat= does at this point is choose which eol to save it with, dos or unix
it should be pretty easy to change the range of characters above if you want to eliminate or add some

Resources