Convert DOS/Windows line endings to Linux line endings in Vim - file

If I open files I created in Windows, the lines all end with ^M.
How do I delete these characters all at once?

dos2unix is a commandline utility that will do this, or :%s/^M//g will if you use Ctrl-v Ctrl-m to input the ^M, or you can :set ff=unix and Vim will do it for you.
There is documentation on the fileformat setting, and the Vim wiki has a comprehensive page on line ending conversions.
Alternately, if you move files back and forth a lot, you might not want to convert them, but rather to do :set ff=dos, so Vim will know it's a DOS file and use DOS conventions for line endings.

Change the line endings in the view:
:e ++ff=dos
:e ++ff=mac
:e ++ff=unix
This can also be used as saving operation (:w alone will not save using the line endings you see on screen):
:w ++ff=dos
:w ++ff=mac
:w ++ff=unix
And you can use it from the command-line:
for file in *.cpp
do
vi +':w ++ff=unix' +':q' "$file"
done

I typically use
:%s/\r/\r/g
which seems a little odd, but works because of the way that Vim matches linefeeds. I also find it easier to remember :)

I prefer to use the following command:
:set fileformat=unix
You can also use mac or dos to respectively convert your file to Mac or MS-DOS/Windows file convention. And it does nothing if the file is already in the correct format.
For more information, see the Vim help:
:help fileformat

:set fileformat=unix to convert from DOS to Unix.

:%s/\r\+//g
In Vim, that strips all carriage returns, and leaves only newlines.

In VIM:
:e ++ff=dos | set ff=unix | w!
In shell with VIM:
vim some_file.txt +'e ++ff=dos | set ff=unix | wq!'
e ++ff=dos - force open file in dos format.
set ff=unix - convert file to unix format.

From: File format
[Esc] :%s/\r$//

dos2unix can directly modify the file contents.
You can directly use it on the file, without any need for temporary file redirection.
dos2unix input.txt input.txt
The above uses the assumed US keyboard. Use the -437 option to use the UK keyboard.
dos2unix -437 input.txt input.txt

Convert directory of files from DOS to Unix
Using command line and sed, find all files in current directory with the extension ".ext" and remove all "^M"
# https://gist.github.com/sparkida/7773170
find $(pwd) -type f -name "*.ext" | while read file; do sed -e 's/^M//g' -i "$file"; done;
Also, as mentioned in a previous answer, ^M = Ctrl+V + Ctrl+M (don't just type the caret "^" symbol and M).

tr -d '\15\32' < winfile.txt > unixfile.txt
(See: Convert between Unix and Windows text files)

To run directly in a Linux console:
vim file.txt +"set ff=unix" +wq

The following steps can convert the file format for DOS to Unix:
:e ++ff=dos Edit file again, using dos file format ('fileformats' is ignored).[A 1]
:setlocal ff=unix This buffer will use LF-only line endings when written.[A 2]
:w Write buffer using Unix (LF-only) line endings.
Reference: File format

I found a very easy way: Open the file with nano: nano file.txt
Press Ctrl + O to save, but before pressing Enter, press: Alt+D to toggle between DOS and Unix/Linux line-endings, or: Alt+M to toggle between Mac and Unix/Linux line-endings, and then press Enter to save and Ctrl+X to quit.

The comment about getting the ^M to appear is what worked for me. Merely typing "^M" in my vi got nothing (not found). The CTRL+V CTRL+M sequence did it perfectly though.
My working substitution command was
:%s/Ctrl-V Ctrl-M/\r/g
and it looked like this on my screen:
:%s/^M/\r/g

With the following command:
:%s/^M$//g
To get the ^M to appear, type CtrlV and then CtrlM. CtrlV tells Vim to take the next character entered literally.

:g/Ctrl-v Ctrl-m/s///
CtrlM is the character \r, or carriage return, which DOS line endings add. CtrlV tells Vim to insert a literal CtrlM character at the command line.
Taken as a whole, this command replaces all \r with nothing, removing them from the ends of lines.

You can use:
vim somefile.txt +"%s/\r/\r/g" +wq
Or the dos2unix utility.

You can use the following command:
:%s/^V^M//g
where the '^' means use CTRL key.

The below command is used for reformating all .sh file in the current directory. I tested it on my Fedora OS.
for file in *.sh; do awk '{ sub("\r$", ""); print }' $file >luxubutmp; cp -f luxubutmp $file; rm -f luxubutmp ;done

In Vim, type:
:w !dos2unix %
This will pipe the contents of your current buffer to the dos2unix command and write the results over the current contents. Vim will ask to reload the file after.

I wanted newlines in place of the ^M's. Perl to the rescue:
perl -pi.bak -e 's/\x0d/\n/g' excel_created.txt
Or to write to stdout:
perl -p -e 's/\x0d/\n/g' < excel_created.txt

Usually there is a dos2unix command you can use for this. Just make sure you read the manual as the GNU and BSD versions differ on how they deal with the arguments.
BSD version:
dos2unix $FILENAME $FILENAME_OUT
mv $FILENAME_OUT $FILENAME
GNU version:
dos2unix $FILENAME
Alternatively, you can create your own dos2unix with any of the proposed answers here, for example:
function dos2unix(){
[ "${!}" ] && [ -f "{$1}" ] || return 1;
{ echo ':set ff=unix';
echo ':wq';
} | vim "${1}";
}

From Wikia:
%s/\r\+$//g
That will find all carriage return signs (one and more reps) up to the end of line and delete, so just \n will stay at EOL.

This is my way. I opened a file in DOS EOL and when I save the file, that will automatically convert to Unix EOL:
autocmd BufWrite * :set ff=unix

If you create a file in Notepad or Notepad++ in Windows, bring it to Linux, and open it by Vim, you will see ^M at the end of each line. To remove this,
At your Linux terminal, type
dos2unix filename.ext
This will do the required magic.

I knew I'd seen this somewhere. Here is the FreeBSD login tip:
Do you need to remove all those ^M characters from a DOS file? Try
tr -d \\r < dosfile > newfile
-- Originally by Dru <genesis#istar.ca>

This is a little more than you asked for but:
nmap <C-d> :call range(line('w0'),line('w$'))->map({_,v-> getline(v)})->map({_,v->trim(v,join(map(range(1,0x1F)+[0xa0],{n->n->nr2char()}),''),2)})->map({k,v->setline(k+1,v)})<CR>
Run this and :set ff=unix|dos and no more need for unix2dos.
the single arg form of trim() has the same default mask above, plus 0X20 (an actual space) instead of 0x1F
that default mask clears out all non-printing chars including non-breaking spaces [0xa0] that are hard to find
create a list of lines from the range of lines
map that list to the trim function with using the same mask code as the source, less spaces
map that again to setline to replace the lines.
all :set fileformat= does at this point is choose which eol to save it with, dos or unix
it should be pretty easy to change the range of characters above if you want to eliminate or add some

Related

sed: how to replace sth. by a backslash followed by reference

Despite all the sed-backslash discussions on Stackoverflow I cannot find a working solution for my specific problem. I want to precede a certain string in a file by a backslash: something -> \something.
sed -i -- 's/\(something\)/\\\1/g' file
This always returns the string \1 instead of \something, because for some reason sed thinks it should escape the third backslash. The (from my point of view more logical) behaviour can be achieved by inserting a space between \\ and \1 in the sed command, but then the result is \ something (i.e. with an inserted space in the result) which is not what I want.
I am running this command in a batch file on Windows, using sed from cygwin (I hope this does not matter as I am aiming for a cross-platform solution).
EDIT: /usr/bin/sed version 4.2.2.
In Windows cmd with Cygwin, use this sed command:
sed -e 's/\(something\)/\\\\\1/g' file
You can start your script from a batch file
myBatch.bat
#echo off
c:\cygwin64\bin\bash ./mySed
mySed
#!/bin/bash
echo asdfsomethingasdf | sed 's/\(something\)/\\\1/g'
It can be necessary to use /usr/bin/sed when your path isn't completely set

Replace a slash with the word 'slash' in Korn shell scripting

I am using a while loop to read a file line by line and, for example, if the file contains a line like:
ABC.///.AB_SWift_ABC
I need to replace it with
ABC.slashslashslash.AB_SWift_ABC
How can I do that with a Korn shell?
If you are using bash, you can use the variable substring replacement functionality.
>> a=ABC.///.AB_SWift_ABC
>> echo $a
ABC.///.AB_SWift_ABC
>> echo ${a//\//slash}
ABC.slashslashslash.AB_SWift_ABC
Why are you reading line by line? It would be quicker to simply run sed on the file:
sed 's%/%slash%g'
If you must do it in Bash, then use the substitute shell parameter expansion:
${line//\//slash}
You mention backslashes but use forward slashes, which is confusing. If, perchance, you mean backslashes, then:
sed 's/\\/slash/g'
${lines//\\/slash}
In c you can replace / by slash using bash command like
char cmd[]="sed -i 's|/|slash|g' file";
system(cmd);
So not require read file line by line.

<0xEF,0xBB,0xBF> character showing up in files. How to remove them?

I am doing compressing of JavaScript files and the compressor is complaining that my files have  character in them.
How can I search for these characters and remove them?
You can easily remove them using vim, here are the steps:
1) In your terminal, open the file using vim:
vim file_name
2) Remove all BOM characters:
:set nobomb
3) Save the file:
:wq
Another method to remove those characters - using Vim:
vim -b fileName
Now those "hidden" characters are visible (<feff>) and can be removed.
Thanks for the previous answers, here's a sed(1) variant just in case:
sed '1s/^\xEF\xBB\xBF//'
On Unix/Linux:
sed 's/\xEF\xBB\xBF//' < inputfile > outputfile
On MacOSX
sed $'s/\xEF\xBB\xBF//' < inputfile > outputfile
Notice the $ after sed for mac.
On Windows
There is Super Sed an enhanced version of sed. For Windows this is a standalone .exe, intended for running from the command line.
perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js
I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...)
Edit: added the -CSD option, as per tchrist's comment.
Using tail might be easier:
tail --bytes=+4 filename > new_filename
#tripleee's solution didn't work for me. But changing the file encoding to ASCII and again to UTF-8 did the trick :-)
I've used vimgrep for this
:vim "[\uFEFF]" *
also normal vim search command
/[\uFEFF]
The 'file' command shows if the BOM is present:
For example: 'file myfile.xml' displays: "XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines, with CRLF line terminators"
dos2unix will remove the BOM.
I'm suggest the use of "dos2unix" tool, please test to run dos2unix ./thefile.js.
If necessary try to use something like this for multiple files:
for x in $(find . -type f -exec echo {} +); do dos2unix $x ; done
My Regards.
In windows you could use backported recode utility from UnxUtils.
In Sublime Text you can install the Highlighter package and then customize the regular expression in your user settings.
Here I added \uFEFF to the end of the highlighter_regex property.
{
"highlighter_enabled": true,
"highlighter_regex": "(\t+ +)|( +\t+)|[\u2026\u2018\u2019\u201c\u201d\u2013\u2014\uFEFF]|[\t ]+$",
"highlighter_scope_name": "invalid",
"highlighter_max_file_size": 1048576,
"highlighter_delay": 3000
}
To overwrite the default package settings place the file here:
~/.config/sublime-text-3/Packages/User/highlighter.sublime-settings
Save the file without code signature.

Removing double new lines using awk/sed

Hey everyone.
I have a file full of data, each row consists of something similiar to "755545;45634;1244". Sometimes somewhere there can occur an unknown number of additional new lines, which I dont want.
Example:
256163;16816;1651
16156;165165;1165
15153;135135;15351
15153;1351;8
165;15313;153513
254;45;45
Desired output:
256163;16816;1651
16156;165165;1165
15153;135135;15351
15153;1351;8
165;15313;153513
254;45;45
Can this be done easily with awk/sed utility in unix?
The answer from #Luixv is correct if there are no whitespace on the "empty" lines.
If whitespace is present use this instead:
sed '/^[ \t]*$/d'
Thats an space before the \t within the brackets, i.e. [space\t]
If this doesn't work, you might have a problem with newlines. Do a:
$ file test_file
test_file: ISO-8859 text, with CRLF, LF line terminators
If you get the output above, convert the file to unix using:
$ dos2unix test_file
sed '/^$/d'
sed -nre "s/([^$])/\1/p" input
sed -n 's/^[ ,\t]*$/!p' filename
Solved with ssed super_sed if you do not have it installed, install it.
ssed -R -e '/(^$|\s)/ d' yourFile
or
cat yourFile| ssed -R -e '/(^$|\s)/ d'
Happy seding
PS: will work even if you have tabs or \r \t \n hence the \s in the ReqExp.
\r = Return Carriage
\n = New Line
\t = Tab

Change File Encoding to utf-8 via vim in a script

I just got knocked down after our server has been updated from Debian 4 to 5.
We switched to UTF-8 environment and now we have problems getting the text printed correctly on the browser, because all files are in non-utf8 encodings like iso-8859-1, ascii, etc.
I tried many different scripts.
The first one I tried is "iconv". That one doesn't work, it changes the content, but the file's encoding is still non-utf8.
Same problem with enca, encamv, convmv and some other tools I installed via apt-get.
Then I found a python code, which uses chardet Universal Detector module, to detect encoding of a file (which works fine), but using the unicode class or the codec class to save it as utf-8 doesn't work, without any errors.
The only way I found to get the file and its content converted to UTF-8, is vi.
These are the steps I do for one file:
vi filename.php
:set bomb
:set fileencoding=utf-8
:wq
That's it. That one works perfect. But how can I get this running via a script?
I would like to write a script (Linux shell) which traverses a directory taking all php files, then converting them using vi with the commands above.
As I need to start the vi app, I do not know how to do something like this:
"vi --run-command=':set bomb, :set fileencoding=utf-8' filename.php"
Hope someone can help me.
This is the simplest way I know of to do this easily from the command line:
vim +"argdo se bomb | se fileencoding=utf-8 | w" $(find . -type f -name *.php)
Or better yet if the number of files is expected to be pretty large:
find . -type f -name *.php | xargs vim +"argdo se bomb | se fileencoding=utf-8 | w"
You could put your commands in a file, let's call it script.vim:
set bomb
set fileencoding=utf-8
wq
Then you invoke Vim with the -S (source) option to execute the script on the file you wish to fix. To do this on a bunch of files you could do
find . -type f -name "*.php" -exec vim -S script.vim {} \;
You could also put the Vim commands on the command line using the + option, but I think it may be more readable like this.
Note: I have not tested this.
You may actually want set nobomb (BOM = byte order mark), especially in the [not windows] world.
e.g., I had a script that didn't work as there was a byte order mark at the start. It isn't usually displayed in editors (even with set list in vi), or on the console, so its difficult to spot.
The file looked like this
#!/usr/bin/perl
...
But trying to run it, I get
./filename
./filename: line 1: #!/usr/bin/perl: No such file or directory
Not displayed, but at the start of the file, is the 3 byte BOM. So, as far as linux is concerned, the file doesn't start with #!
The solution is
vi filename
:set nobomb
:set fileencoding=utf-8
:wq
This removes the BOM at the start of the file, making it correct utf8.
NB Windows uses the BOM to identify a text file as being utf8, rather than ANSI. Linux (and the official spec) doesn't.
The accepted answer will keep the last file open in Vim. This problem can be easily resolved using the -c option of Vim,
vim +"argdo set bomb | set fileencoding=utf-8 | w" -c ":q" file1.txt file2.txt
If you need only process one file, the following will also work,
vim -c ':set bomb' -c ':set fileencoding=utf-8' -c ':wq' file1.txt

Resources