Difference between linefeed and newline [duplicate] - c

What is the meaning of the following control characters:
Carriage return
Line feed
Form feed

Carriage return means to return to the beginning of the current line without advancing downward. The name comes from a printer's carriage, as monitors were rare when the name was coined. This is commonly escaped as "\r", abbreviated CR, and has ASCII value 13 or 0xD.
Linefeed means to advance downward to the next line; however, it has been repurposed and renamed. Used as "newline", it terminates lines (commonly confused with separating lines). This is commonly escaped as "\n", abbreviated LF or NL, and has ASCII value 10 or 0xA. CRLF (but not CRNL) is used for the pair "\r\n".
Form feed means advance downward to the next "page". It was commonly used as page separators, but now is also used as section separators. Text editors can use this character when you "insert a page break". This is commonly escaped as "\f", abbreviated FF, and has ASCII value 12 or 0xC.
As control characters, they may be interpreted in various ways.
The most important interpretation is how these characters delimit lines. Lines end with NL on Unix (including OS X), CRLF on Windows, and CR on older Macs. Note the shift in meaning from LF to NL, for the exact same character, gives the differences between Windows and Unix, which is also why many Windows programs use CRLF to separate instead of terminate lines. Many text editors can read files in any of these three formats and convert between them, but not all utilities can.
Form feed is much less commonly used. As page separator, it can only come between lines or at the start or end of the file.

\r is carriage return and moves the cursor back like if i will do-
printf("stackoverflow\rnine")
ninekoverflow
means it has shifted the cursor to the beginning of "stackoverflow" and overwrites the starting four characters since "nine" is four character long.
\n is new line character which changes the line and takes the cursor to the beginning of a new line like-
printf("stackoverflow\nnine")
stackoverflow
nine
\f is form feed, its use has become obsolete but it is used for giving indentation like
printf("stackoverflow\fnine")
stackoverflow
nine
if i will write like-
printf("stackoverflow\fnine\fgreat")
stackoverflow
nine
great

In short:
Carriage return (\r or 0xD): To take control at starting on the same line.
          Line feed (\n or 0xA): To Take control at starting on the next line.
         Form feed (\f or 0xC): To take control at starting on the next page.
More details and more control characters can be found on the following page: Control character

Have a look at Wikipedia:
Systems based on ASCII or a compatible character set use either LF (Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r', 0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF, 0x0D 0x0A). These characters are based on printer commands: The line feed indicated that one line of paper should feed out of the printer, and a carriage return indicated that the printer carriage should return to the beginning of the current line.

\f is used for page break.
You cannot see any effect in the console. But when you use this character constant in your file then you can see the difference.
Other example is that if you can redirect your output to a file then you don't have to write a file or use file handling.
For ex:
Write this code in c++
void main()
{
clrscr();
cout<<"helloooooo" ;
cout<<"\f";
cout<<"hiiiii" ;
}
and when you compile this it generate an exe(for ex. abc.exe)
then you can redirect your output to a file using this:
abc > xyz.doc
then open the file xyz.doc you can see the actual page break between hellooo and hiiii....

On old paper-printer terminals, advancing to the next line involved two actions: moving the print head back to the beginning of the horizontal scan range (carriage return) and advancing the roll of paper being printed on (line feed).
Since we no longer use paper-printer terminals, those actions aren't really relevant anymore, but the characters used to signal them have stuck around in various incarnations.

Apart from above information, there is still an interesting history of LF (\n) and CR (\r).
[Original author : 阮一峰 Source : http://www.ruanyifeng.com/blog/2006/04/post_213.html]
Before computer came out, there was a type of teleprinter called Teletype Model 33. It can print 10 characters each second. But there is one problem with this, after finishing printing each line, it will take 0.2 second to move to next line, which is time of printing 2 characters. If a new characters is transferred during this 0.2 second, then this new character will be lost.
So scientists found a way to solve this problem, they add two ending characters after each line, one is 'Carriage return', which is to tell the printer to bring the print head to the left.; the other one is 'Line feed', it tells the printer to move the paper up 1 line.
Later, computer became popular, these two concepts are used on computers. At that time, the storage device was very expensive, so some scientists said that it was expensive to add two characters at the end of each line, one is enough, so there are some arguments about which one to use.
In UNIX/Mac and Linux, '\n' is put at the end of each line, in Windows, '\r\n' is put at the end of each line. The consequence of this use is that files in UNIX/Mac will be displayed in one line if opened in Windows. While file in Windows will have one ^M at the end of each line if opened in UNIX or Mac.

Consider an IBM 1403 impact printer. CR moved the print head to the start of the line, but did NOT advance the paper. This allowed for "overprinting", placing multiple lines of output on one line. Things like underlining were achieved this way, as was BOLD print. LF advanced the paper one line. If there was no CR, the next line would print as a staggered-step because LF didn't move the print head. FF advanced the paper to the next page. It typically also moved the print head to the start of the first line on the new page, but you might need CR for that. To be sure, most programmers coded CRFF instead of CRLF at the end of the last line on a page because an extra CR created by FF wouldn't matter.

As a supplement,
1, Carriage return: It's a printer terminology meaning changing the print location to the beginning of current line. In computer world, it means return to the beginning of current line in most cases but stands for new line rarely.
2, Line feed: It's a printer terminology meaning advancing the paper one line. So Carriage return and Line feed are used together to start to print at the beginning of a new line. In computer world, it generally has the same meaning as newline.
3, Form feed: It's a printer terminology, I like the explanation in this thread.
If you were programming for a 1980s-style printer, it would eject the
paper and start a new page. You are virtually certain to never need
it.
http://en.wikipedia.org/wiki/Form_feed
It's almost obsolete and you can refer to Escape sequence \f - form feed - what exactly is it? for detailed explanation.
Note, we can use CR or LF or CRLF to stand for newline in some platforms but newline can't be stood by them in some other platforms. Refer to wiki Newline for details.
LF: Multics, Unix and Unix-like systems (Linux, OS X, FreeBSD, AIX,
Xenix, etc.), BeOS, Amiga, RISC OS, and others
CR: Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple
II family, Oberon, the classic Mac OS up to version 9, MIT Lisp
Machine and OS-9
RS: QNX pre-POSIX implementation
0x9B: Atari 8-bit machines using ATASCII variant of ASCII (155 in
decimal)
CR+LF: Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10,
RT-11, CP/M, MP/M, Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC,
and most other early non-Unix and non-IBM OSes
LF+CR: Acorn BBC and RISC OS spooled text output.

"\n" is the linefeed character. It means end the present line and go to a new line for anyone who is reading it.

Carriage return and line feed are also references to typewriters, in that the with a small push on the handle on the left side of the carriage (the place where the paper goes), the paper would rotate a small amount around the cylinder, advancing the document one line. If you had finished typing one line, and wanted to continue on to the next, you pushed harder, both advancing a line and sliding the carriage all the way to the right, then resuming typing left to right again as the carriage traveled with each keystroke. Needless to say, word-wrap was the default setting for all word processing of the era. P:D

Those are non-printing characters, relating to the concept of "new line". \n is linefeed. \r is carriage return. On different platforms they have different meanings, relative to a valid new line. In windows, a new line is \r\n. In linux, \n. In mac, \r.
In practice, you put them in any string, and it will have effect on the print-out of the string.

when I was an apprentice in the Royal Signals many (50) years ago, teletypes and typewriters had "Carriage" with the printing head on them. When you pressed RETURN the Carriage would fly to the left. Hence Carriage Return (CR). You could just return the Carriage, but on mechanical typewriters, you'd use the Lever (much like a tremolo lever on an electric guitar) which would also do the Line Feed. Your next question is why would you not want the line feed? heh heh well in those days to delete characters we'd do a CR then use a Tip-ex-like paper in between the hammerheads and paper and type the same keys to over-write with white ink. Some fancy typewriters had a key you could press. So there you go.

Related

How is erasing output in terminal implemented in C?

Some applications running in terminal can erase their outputs. e.g.
when it tells you to wait, it will show a sequence of dots alternating between different lengths.
How is erasing output in terminal implemented in C? Is it done by reverse line feed?
Can a program only erase the previous characters in the current line, not the characters in the previous line in stdout?
Thanks.
It depends on the terminal.
The COMSPEC shell on Windows (often called the DOS prompt or command.com) exposes an API in C to control the cursor. I haven't done any Windows programming so I can't tell you much about it.
Most other terminals (especially on unixen) emulate protocols that resemble the VT100 serial terminal (the VT100 terminal was a physical device, a monitor and keyboard, that you attached to a modem or serial port to communicate with a server).
On VT100 terminals, carriage return and line feed are separate commands, both one byte. The carriage return command sets the cursor to the beginning of the line. The line feed command moves the cursor down a line (but doesn't bring the cursor to the beginning of the line by itself). Most shells on unixen automatically insert a carriage return after a line feed but almost none inserts a line feed after a carriage return.
With this knowledge, the simplest implementation is to simply output a carriage return and reprint the entire line:
printf("\rprogress: %d percent ", x);
Note the extra spaces at the end of the line. Printing "\r" doesn't erase the line so reprinting over the old line may end up leaving some of the old string on screen. The extra spaces is used to try and erase the remainder of the old line.
If you googled "VT100 escape secquence", you'll find commands that will allow you to do things like erase the current line, change color of text, goto a specific row/column on screen etc. The most popular use of VT100 sequences is to output coloured text. But you can also do other things with them.
The next simplest implementation is to cleanly delete the line and reprint it:
printf("\033[2K\rprogress: %d percent", x);
The \033[2K is the escape sequence to delete the current line (ESC[2K).
From here you can get more creative if you want. You can use the cursor save/restore command with the erase until end of line command to only erase the part you want to update (instead of the entire line). You can use the goto commands to put the cursor in a specific location on screen to update text there etc.
It should be noted that the more advanced stuff such as VT102 sequences or some of the full ANSI escape sequences are generally not portable accross terminals (by terminals I don't mean the shell, I mean the terminals: rxvt, xterm, linux terminal, hyperterminal(on windows) etc).
If you want portability (and/or sane API) you should use the curses or ncurses libraries.
If you wanted to know how it's done, then that's how it's done. It's just printing specially formatted strings to screen (except for the COMSPEC shell). Kind of like HTML but old-school.

How to check, which (ASCII) code was used for Line Feed in a simple text file?

I'd like to parse a simple text file in a .c-program, where I want to react on all the line feeds in it. Unfortunately checking it with "is character == \n" does not work always.
I know there are different methods to code a line feed (e.g. 0x0A in ASCII code), so my question is: is there a safe way to check whether a character is LF or not?
Ok here is a list of newlines per operating system type:
Linux Systems:
LF - LF (Line feed, '\n', 0x0A, 10 in decimal)
Unix Systems:
LF - LF (Line feed, '\n', 0x0A, 10 in decimal)
Windows Systems:
CR followed by LF (CR+LF, '\r\n', 0x0D0A)
Mac OS Systems:
LF: Line Feed, U+000A
Android Systems:
LF - LF (Line feed, '\n', 0x0A, 10 in decimal)
Unicode Systems:
The Unicode standard defines a number of characters that conforming applications should recognize as line terminators:[3]
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029
Based on:
http://en.wikipedia.org/wiki/Newline
The end-of-line marker is operating system specific. On some OSes it is just \n, on others it may be \r or a mixture like \r\n etc. Probably the form-feed \f might sometimes be considered as an end-of-line.
On some systems, not passing the b mode flag to fopen(3) is altering the way it is read by the OS. On these systems, the file is then opened in binary mode with b and in text mode without it (and the text mode may mean to interpret end-of-line differently). You could also use getline(3) and handle the terminating characters as spaces (e.g. use isspace(3)...)
BTW, on Linux the dos2unix(1) command might be useful.
Also, your app might get a textual file produced on some other OS (without conversion). I would use getline (or the old fgets(3) if you don't care about very long lines) and handle all the spaces characters (tab, newline, formfeed, return, etc...) the same (like fscanf(3) or sscanf does).
I can't understand why the real end-of-line marker matters to you; why can't you use getline (or perhaps fgets) and handle every "end-of-line" character (be it \n, \r, \f or some mix of them) equally (in other words, as space tested with isspace). And this handles the case of a text file edited on Windows or MacOSX and passed to Linux or vice-versa.
try to use \r\n rather then \n .
the ASCII code of \n = 10 and ASCII code of \r = 13. so In simple text file the line feed is the combination of \r\n (carriage return and new line).
I would recommend just opening as text file, and relying on the standard library's built-in conversions to handle this. Just read lines using fgets() and you should be fine.

What exactly is \r in C language?

#include <stdio.h>
int main()
{
int countch=0;
int countwd=1;
printf("Enter your sentence in lowercase: ");
char ch='a';
while(ch!='\r')
{
ch=getche();
if(ch==' ')
countwd++;
else
countch++;
}
printf("\n Words =%d ",countwd);
printf("Characters = %d",countch-1);
getch();
}
This is the program where I came across \r. What exactly is its role here? I am beginner in C and I appreciate a clear explanation on this.
'\r' is the carriage return character. The main times it would be useful are:
When reading text in binary mode, or which may come from a foreign OS, you'll find (and probably want to discard) it due to CR/LF line-endings from Windows-format text files.
When writing to an interactive terminal on stdout or stderr, '\r' can be used to move the cursor back to the beginning of the line, to overwrite it with new contents. This makes a nice primitive progress indicator.
The example code in your post is definitely a wrong way to use '\r'. It assumes a carriage return will precede the newline character at the end of a line entered, which is non-portable and only true on Windows. Instead the code should look for '\n' (newline), and discard any carriage return it finds before the newline. Or, it could use text mode and have the C library handle the translation (but text mode is ugly and probably should not be used).
It's Carriage Return. Source: http://msdn.microsoft.com/en-us/library/6aw8xdf2(v=vs.80).aspx
The following repeats the loop until the user has pressed the Return key.
while(ch!='\r')
{
ch=getche();
}
Once upon a time, people had terminals like typewriters (with only upper-case letters, but that's another story). Search for 'Teletype', and how do you think tty got used for 'terminal device'?
Those devices had two separate motions. The carriage return moved the print head back to the start of the line without scrolling the paper; the line feed character moved the paper up a line without moving the print head back to the beginning of the line. So, on those devices, you needed two control characters to get the print head back to the start of the next line: a carriage return and a line feed. Because this was mechanical, it took time, so you had to pause for long enough before sending more characters to the terminal after sending the CR and LF characters. One use for CR without LF was to do 'bold' by overstriking the characters on the line. You'd write the line out once, then use CR to start over and print twice over the characters that needed to be bold. You could also, of course, type X's over stuff that you wanted partially hidden, or create very dense ASCII art pictures with judicious overstriking.
On Unix, all the logic for this stuff was hidden in a terminal driver. You could use the stty command and the underlying functions (in those days, ioctl() calls; they were sanitized into the termios interface by POSIX.1 in 1988) to tweak all sorts of ways that the terminal behaved.
Eventually, you got 'glass terminals' where the speeds were greater and and there were new idiosyncrasies to deal with - Hazeltine glitches and so on and so forth. These got enshrined in the termcap and later terminfo libraries, and then further encapsulated behind the curses library.
However, some other (non-Unix) systems did not hide things as well, and you had to deal with CRLF in your text files - and no, this is not just Windows and DOS that were in the 'CRLF' camp.
Anyway, on some systems, the C library has to deal with text files that contain CRLF line endings and presents those to you as if there were only a newline at the end of the line. However, if you choose to treat the text file as a binary file, you will see the CR characters as well as the LF.
Systems like the old Mac OS (version 9 or earlier) used just CR (aka \r) for the line ending. Systems like DOS and Windows (and, I believe, many of the DEC systems such as VMS and RSTS) used CRLF for the line ending. Many of the Internet standards (such as mail) mandate CRLF line endings. And Unix has always used just LF (aka NL or newline, hence \n) for its line endings. And most people, most of the time, manage to ignore CR.
Your code is rather funky in looking for \r. On a system compliant with the C standard, you won't see the CR unless the file is opened in binary mode; the CRLF or CR will be mapped to NL by the C runtime library.
There are a few characters which can indicate a new line. The usual ones are these two:
'\n' or '0x0A' (10 in decimal) -> This character is called "Line Feed" (LF).
'\r' or '0x0D' (13 in decimal) -> This one is called "Carriage return" (CR).
Different Operating Systems handle newlines in a different way. Here is a short list of the most common ones:
DOS and Windows
They expect a newline to be the combination of two characters, namely '\r\n' (or 13 followed by 10).
Unix (and hence Linux as well)
Unix uses a single '\n' to indicate a new line.
Mac
Macs use a single '\r'.
That is not always true; it only works in Windows.
For interacting with terminal in putty, Linux shell,... it will be used for returning the cursor to the beginning of line.
following picture shows the usage of that:
Without '\r':
Data comes without '\r' to the putty terminal, it has just '\n'.
it means that data will be printed just in next line.
With '\r':
Data comes with '\r', i.e. string ends with '\r\n'. So the cursor in putty terminal not only will go to the next line but also at the beginning of line
It depends upon which platform you're on as to how it will be translated and whether it will be there at all: Wikipedia entry on newline
\r is an escape sequence character or void character. It is used to bring the cursor to the beginning of the line (it maybe of same or new line) to overwrite with new content (content written ahead of \r like: \rhello);
int main ()
{
printf("Hello \rworld");
return 0;
}
The output of the program will be world not Hello world
because \r has put the cursor at the beginning of the line and Hello has been overwritten with world.

What do \t and \b do?

I expect this simple line of code
printf("foo\b\tbar\n");
to replace "o" with "\t" and to produce the following output
fo bar
(assuming that tab stop occurs every 8 characters).
On the contrary I get
foo bar
It seems that my shell interprets \b as "move the cursors one position back" and \t as "move cursor to the next tab stop". Is this behaviour specific to the shell in which I'm running the code? Should I expect different behaviour on different systems?
No, that's more or less what they're meant to do.
In C (and many other languages), you can insert hard-to-see/type characters using \ notation:
\a is alert/bell
\b is backspace/rubout
\n is newline
\r is carriage return (return to left margin)
\t is tab
You can also specify the octal value of any character using \0nnn, or the hexadecimal value of any character with \xnn.
EG: the ASCII value of _ is octal 137, hex 5f, so it can also be typed \0137 or \x5f, if your keyboard didn't have a _ key or something. This is more useful for control characters like NUL (\0) and ESC (\033)
As someone posted (then deleted their answer before I could +1 it), there are also some less-frequently-used ones:
\f is a form feed/new page (eject page from printer)
\v is a vertical tab (move down one line, on the same column)
On screens, \f usually works the same as \v, but on some printers/teletypes, it will
go all the way to the next form/sheet of paper.
Backspace and tab both move the cursor position. Neither is truly a 'printable' character.
Your code says:
print "foo"
move the cursor back one space
move the cursor forward to the next tabstop
output "bar".
To get the output you expect, you need printf("foo\b \tbar"). Note the extra 'space'. That says:
output "foo"
move the cursor back one space
output a ' ' (this replaces the second 'o').
move the cursor forward to the next tabstop
output "bar".
Most of the time it is inappropriate to use tabs and backspace for formatting your program output. Learn to use printf() formatting specifiers. Rendering of tabs can vary drastically depending on how the output is viewed.
This little script shows one way to alter your terminal's tab rendering. Tested on Ubuntu + gnome-terminal:
#!/bin/bash
tabs -8
echo -e "\tnormal tabstop"
for x in `seq 2 10`; do
tabs $x
echo -e "\ttabstop=$x"
done
tabs -8
echo -e "\tnormal tabstop"
Also see man setterm and regtabs.
And if you redirect your output or just write to a file, tabs will quite commonly be displayed as fewer than the standard 8 chars, especially in "programming" editors and IDEs.
So in otherwords:
printf("%-8s%s", "foo", "bar"); /* this will ALWAYS output "foo bar" */
printf("foo\tbar"); /* who knows how this will be rendered */
IMHO, tabs in general are rarely appropriate for anything. An exception might be generating output for a program that requires tab-separated-value input files (similar to comma separated value).
Backspace '\b' is a different story... it should never be used to create a text file since it will just make a text editor spit out garbage. But it does have many applications in writing interactive command line programs that cannot be accomplished with format strings alone. If you find yourself needing it a lot, check out "ncurses", which gives you much better control over where your output goes on the terminal screen. And typically, since it's 2011 and not 1995, a GUI is usually easier to deal with for highly interactive programs. But again, there are exceptions. Like writing a telnet server or console for a new scripting language.
The C standard (actually C99, I'm not up to date) says:
Alphabetic escape sequences representing nongraphic characters in the execution character set are intended to produce actions on display devices as follows:
\b (backspace) Moves the active position to the previous position on the current line. [...]
\t (horizontal tab) Moves the active position to the next horizontal tabulation position on the current line. [...]
Both just move the active position, neither are supposed to write any character on or over another character. To overwrite with a space you could try: puts("foo\b \tbar"); but note that on some display devices - say a daisy wheel printer - the o will show the transparent space.
This behaviour is terminal-specific and specified by the terminal emulator you use (e.g. xterm) and the semantics of terminal that it provides. The terminal behaviour has been very stable for the last 20 years, and you can reasonably rely on the semantics of \b.
\t is the tab character, and is doing exactly what you're anticipating based on the action of \b - it goes to the next tab stop, then gets decremented, and then goes to the next tab stop (which is in this case the same tab stop, because of the \b.

Why should text files end with a newline?

I assume everyone here is familiar with the adage that all text files should end with a newline. I've known of this "rule" for years but I've always wondered — why?
Because that’s how the POSIX standard defines a line:
3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
Therefore, lines not ending in a newline character aren't considered actual lines. That's why some programs have problems processing the last line of a file if it isn't newline terminated.
There's at least one hard advantage to this guideline when working on a terminal emulator: All Unix tools expect this convention and work with it. For instance, when concatenating files with cat, a file terminated by newline will have a different effect than one without:
$ more a.txt
foo
$ more b.txt
bar$ more c.txt
baz
$ cat {a,b,c}.txt
foo
barbaz
And, as the previous example also demonstrates, when displaying the file on the command line (e.g. via more), a newline-terminated file results in a correct display. An improperly terminated file might be garbled (second line).
For consistency, it’s very helpful to follow this rule – doing otherwise will incur extra work when dealing with the default Unix tools.
Think about it differently: If lines aren’t terminated by newline, making commands such as cat useful is much harder: how do you make a command to concatenate files such that
it puts each file’s start on a new line, which is what you want 95% of the time; but
it allows merging the last and first line of two files, as in the example above between b.txt and c.txt?
Of course this is solvable but you need to make the usage of cat more complex (by adding positional command line arguments, e.g. cat a.txt --no-newline b.txt c.txt), and now the command rather than each individual file controls how it is pasted together with other files. This is almost certainly not convenient.
… Or you need to introduce a special sentinel character to mark a line that is supposed to be continued rather than terminated. Well, now you’re stuck with the same situation as on POSIX, except inverted (line continuation rather than line termination character).
Now, on non POSIX compliant systems (nowadays that’s mostly Windows), the point is moot: files don’t generally end with a newline, and the (informal) definition of a line might for instance be “text that is separated by newlines” (note the emphasis). This is entirely valid. However, for structured data (e.g. programming code) it makes parsing minimally more complicated: it generally means that parsers have to be rewritten. If a parser was originally written with the POSIX definition in mind, then it might be easier to modify the token stream rather than the parser — in other words, add an “artificial newline” token to the end of the input.
Each line should be terminated in a newline character, including the last one. Some programs have problems processing the last line of a file if it isn't newline terminated.
GCC warns about it not because it can't process the file, but because it has to as part of the standard.
The C language standard says
A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character.
Since this is a "shall" clause, we must emit a diagnostic message for a violation of this rule.
This is in section 2.1.1.2 of the ANSI C 1989 standard. Section 5.1.1.2 of the ISO C 1999 standard (and probably also the ISO C 1990 standard).
Reference: The GCC/GNU mail archive.
This answer is an attempt at a technical answer rather than opinion.
If we want to be POSIX purists, we define a line as:
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
An incomplete line as:
A sequence of one or more non- <newline> characters at the end of the file.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195
A text file as:
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_397
A string as:
A contiguous sequence of bytes terminated by and including the first null byte.
Source: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_396
From this then, we can derive that the only time we will potentially encounter any type of issues are if we deal with the concept of a line of a file or a file as a text file (being that a text file is an organization of zero or more lines, and a line we know must terminate with a <newline>).
Case in point: wc -l filename.
From the wc's manual we read:
A line is defined as a string of characters delimited by a <newline> character.
What are the implications to JavaScript, HTML, and CSS files then being that they are text files?
In browsers, modern IDEs, and other front-end applications there are no issues with skipping EOL at EOF. The applications will parse the files properly. It has to since not all Operating Systems conform to the POSIX standard, so it would be impractical for non-OS tools (e.g. browsers) to handle files according to the POSIX standard (or any OS-level standard).
As a result, we can be relatively confident that EOL at EOF will have virtually no negative impact at the application level - regardless if it is running on a UNIX OS.
At this point we can confidently say that skipping EOL at EOF is safe when dealing with JS, HTML, CSS on the client-side. Actually, we can state that minifying any one of these files, containing no <newline> is safe.
We can take this one step further and say that as far as NodeJS is concerned it too cannot adhere to the POSIX standard being that it can run in non-POSIX compliant environments.
What are we left with then? System level tooling.
This means the only issues that may arise are with tools that make an effort to adhere their functionality to the semantics of POSIX (e.g. definition of a line as shown in wc).
Even so, not all shells will automatically adhere to POSIX. Bash for example does not default to POSIX behavior. There is a switch to enable it: POSIXLY_CORRECT.
Food for thought on the value of EOL being <newline>: https://www.rfc-editor.org/old/EOLstory.txt
Staying on the tooling track, for all practical intents and purposes, let's consider this:
Let's work with a file that has no EOL. As of this writing the file in this example is a minified JavaScript with no EOL.
curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o x.js
curl http://cdnjs.cloudflare.com/ajax/libs/AniJS/0.5.0/anijs-min.js -o y.js
$ cat x.js y.js > z.js
-rw-r--r-- 1 milanadamovsky 7905 Aug 14 23:17 x.js
-rw-r--r-- 1 milanadamovsky 7905 Aug 14 23:17 y.js
-rw-r--r-- 1 milanadamovsky 15810 Aug 14 23:18 z.js
Notice the cat file size is exactly the sum of its individual parts. If the concatenation of JavaScript files is a concern for JS files, the more appropriate concern would be to start each JavaScript file with a semi-colon.
As someone else mentioned in this thread: what if you want to cat two files whose output becomes just one line instead of two? In other words, cat does what it's supposed to do.
The man of cat only mentions reading input up to EOF, not <newline>. Note that the -n switch of cat will also print out a non- <newline> terminated line (or incomplete line) as a line - being that the count starts at 1 (according to the man.)
-n Number the output lines, starting at 1.
Now that we understand how POSIX defines a line , this behavior becomes ambiguous, or really, non-compliant.
Understanding a given tool's purpose and compliance will help in determining how critical it is to end files with an EOL. In C, C++, Java (JARs), etc... some standards will dictate a newline for validity - no such standard exists for JS, HTML, CSS.
For example, instead of using wc -l filename one could do awk '{x++}END{ print x}' filename , and rest assured that the task's success is not jeopardized by a file we may want to process that we did not write (e.g. a third party library such as the minified JS we curld) - unless our intent was truly to count lines in the POSIX compliant sense.
Conclusion
There will be very few real life use cases where skipping EOL at EOF for certain text files such as JS, HTML, and CSS will have a negative impact - if at all. If we rely on <newline> being present, we are restricting the reliability of our tooling only to the files that we author and open ourselves up to potential errors introduced by third party files.
Moral of the story: Engineer tooling that does not have the weakness of relying on EOL at EOF.
Feel free to post use cases as they apply to JS, HTML and CSS where we can examine how skipping EOL has an adverse effect.
It may be related to the difference between:
text file (each line is supposed to end in an end-of-line)
binary file (there are no true "lines" to speak of, and the length of the file must be preserved)
If each line does end in an end-of-line, this avoids, for instance, that concatenating two text files would make the last line of the first run into the first line of the second.
Plus, an editor can check at load whether the file ends in an end-of-line, saves it in its local option 'eol', and uses that when writing the file.
A few years back (2005), many editors (ZDE, Eclipse, Scite, ...) did "forget" that final EOL, which was not very appreciated.
Not only that, but they interpreted that final EOL incorrectly, as 'start a new line', and actually start to display another line as if it already existed.
This was very visible with a 'proper' text file with a well-behaved text editor like vim, compared to opening it in one of the above editors. It displayed an extra line below the real last line of the file. You see something like this:
1 first line
2 middle line
3 last line
4
Some tools expect this. For example, wc expects this:
$ echo -n "Line not ending in a new line" | wc -l
0
$ echo "Line ending with a new line" | wc -l
1
A separate use case: commit hygiene, when your text file is version controlled.
If content is added to the end of the file, then the line that was previously the last line will have been edited to include a newline character. This means that blameing the file to find out when that line was last edited will show the newline addition, not the commit before that you actually wanted to see.
(The example is specific to git, but the same approach applies to other version control systems too.)
Basically there are many programs which will not process files correctly if they don't get the final EOL EOF.
GCC warns you about this because it's expected as part of the C standard. (section 5.1.1.2 apparently)
"No newline at end of file" compiler warning
I've wondered this myself for years. But i came across a good reason today.
Imagine a file with a record on every line (ex: a CSV file). And that the computer was writing records at the end of the file. But it suddenly crashed. Gee was the last line complete? (not a nice situation)
But if we always terminate the last line, then we would know (simply check if the last line is terminated). Otherwise we would probably have to discard the last line every time, just to be safe.
This originates from the very early days when simple terminals were used. The newline char was used to trigger a 'flush' of the transferred data.
Today, the newline char isn't required anymore. Sure, many apps still have problems if the newline isn't there, but I'd consider that a bug in those apps.
If however you have a text file format where you require the newline, you get simple data verification very cheap: if the file ends with a line that has no newline at the end, you know the file is broken. With only one extra byte for each line, you can detect broken files with high accuracy and almost no CPU time.
In addition to the above practical reasons, it wouldn't surprise me if the originators of Unix (Thompson, Ritchie, et al.) or their Multics predecessors realized that there is a theoretical reason to use line terminators rather than line separators: With line terminators, you can encode all possible files of lines. With line separators, there's no difference between a file of zero lines and a file containing a single empty line; both of them are encoded as a file containing zero characters.
So, the reasons are:
Because that's the way POSIX defines it.
Because some tools expect it or "misbehave" without it. For example, wc -l will not count a final "line" if it doesn't end with a newline.
Because it's simple and convenient. On Unix, cat just works and it works without complication. It just copies the bytes of each file, without any need for interpretation. I don't think there's a DOS equivalent to cat. Using copy a+b c will end up merging the last line of file a with the first line of file b.
Because a file (or stream) of zero lines can be distinguished from a file of one empty line.
Why should text files end with a newline?
Because that's the sanest choice to make.
Take a file with the following content,
one\n
two\n
three
where \n means a newline character, which on Windows is \r\n, a return character followed by line feed, because it's so cool, right?
How many lines does this file have? Windows says 3, we say 3, POSIX (Linux) says that the file is crippled because there should be a \n at the end of it.
Regardless, what would you say its last line is? I guess anybody agrees that three is the last line of the file, but POSIX says that's a crippled line.
And what is its second line? Oh, here we have the first strong separation:
Windows says two because a file is "lines separated by newlines" (wth?);
POSIX says two\n, adding that that's a true, honest line.
What's the consequence of Windows choice, then? Simple:
You cannot say that a file is made up of lines
Why? Try to take the last line from the previous file and replicate it a few times... What you get? This:
one\n
two\n
threethreethreethree
Try, instead, to swap second and third line... And you get this:
one\n
threetwo\n
Therefore
You must say that a text file is an alternation of lines and \ns, which starts with a line and ends with a line
which is quite a mouthful, right?
And you want another strange consequence?
You must accept that an empty file (0 bytes, really 0 bits) is a one-line file, magically, always because they are cool at Microsoft
Which is quite a crazyness, don't you think?
What is the consequence of POSIX choice?
That the file on the top is just a bit crippled, and we need some hack to deal with it.
Being serious
I'm being provocative, in the preceding text, for the reason that dealing with text files lacking the \n at the end forces you to treat them with ad-hoc ticks/hacks. You always need an if/else somewhere to make things work, where the branch dealing with the crippled line only deals with the crippled line, all the other lines taking the other branch. It's a bit racist, no?
My conclusion
I'm in favour of POSIX definition of a line for the following reasons:
A file is naturally conceived as a sequence of lines
A line shouldn't be one thing or another depending on where it is in the file
An empty file is not a one-line file, come on!
You should not be forced to make hacks in your code
And yes, Windows does encourage you to omit the trailing \r\n. If you want a two lines file below, you have to omit the trailing \r\n otherwise text editors will show it as a 3-lines file:
Presumably simply that some parsing code expected it to be there.
I'm not sure I would consider it a "rule", and it certainly isn't something I adhere to religiously. Most sensible code will know how to parse text (including encodings) line-by-line (any choice of line endings), with-or-without a newline on the last line.
Indeed - if you end with a new line: is there (in theory) an empty final line between the EOL and the EOF? One to ponder...
There's also a practical programming issue with files lacking newlines at the end: The read Bash built-in (I don't know about other read implementations) doesn't work as expected:
printf $'foo\nbar' | while read line
do
echo $line
done
This prints only foo! The reason is that when read encounters the last line, it writes the contents to $line but returns exit code 1 because it reached EOF. This breaks the while loop, so we never reach the echo $line part. If you want to handle this situation, you have to do the following:
while read line || [ -n "${line-}" ]
do
echo $line
done < <(printf $'foo\nbar')
That is, do the echo if the read failed because of a non-empty line at end of file. Naturally, in this case there will be one extra newline in the output which was not in the input.
Why should (text) files end with a newline?
As well expressed by many, because:
Many programs do not behave well, or fail without it.
Even programs that well handle a file lack an ending '\n', the tool's functionality may not meet the user's expectations - which can be unclear in this corner case.
Programs rarely disallow final '\n' (I do not know of any).
Yet this begs the next question:
What should code do about text files without a newline?
Most important - Do not write code that assumes a text file ends with a newline. Assuming a file conforms to a format leads to data corruption, hacker attacks and crashes. Example:
// Bad code
while (fgets(buf, sizeof buf, instream)) {
// What happens if there is no \n, buf[] is truncated leading to who knows what
buf[strlen(buf) - 1] = '\0'; // attempt to rid trailing \n
...
}
If the final trailing '\n' is needed, alert the user to its absence and the action taken. IOWs, validate the file's format. Note: This may include a limit to the maximum line length, character encoding, etc.
Define clearly, document, the code's handling of a missing final '\n'.
Do not, as possible, generate a file the lacks the ending '\n'.
It's very late here but I just faced one bug in file processing and that came because the files were not ending with empty newline. We were processing text files with sed and sed was omitting the last line from output which was causing invalid json structure and sending rest of the process to fail state.
All we were doing was:
There is one sample file say: foo.txt with some json content inside it.
[{
someProp: value
},
{
someProp: value
}] <-- No newline here
The file was created in widows machine and window scripts were processing that file using PowerShell commands. All good.
When we processed same file using sed command sed 's|value|newValue|g' foo.txt > foo.txt.tmp
The newly generated file was
[{
someProp: value
},
{
someProp: value
and boom, it failed the rest of the processes because of the invalid JSON.
So it's always a good practice to end your file with empty new line.
I was always under the impression the rule came from the days when parsing a file without an ending newline was difficult. That is, you would end up writing code where an end of line was defined by the EOL character or EOF. It was just simpler to assume a line ended with EOL.
However I believe the rule is derived from C compilers requiring the newline. And as pointed out on “No newline at end of file” compiler warning, #include will not add a newline.
Imagine that the file is being processed while the file is still being generated by another process.
It might have to do with that? A flag that indicates that the file is ready to be processed.
I personally like new lines at the end of source code files.
It may have its origin with Linux or all UNIX systems for that matter. I remember there compilation errors (gcc if I'm not mistaken) because source code files did not end with an empty new line. Why was it made this way one is left to wonder.
IMHO, it's a matter of personal style and opinion.
In olden days, I didn't put that newline. A character saved means more speed through that 14.4K modem.
Later, I put that newline so that it's easier to select the final line using shift+downarrow.

Resources