Is \n multi-character in C? - c

I read that \n consists of CR & LF. Each has their own ASCII codes.
So is the \n in C represented by a single character or is it multi-character?
Edit: Kindly specify your answer, rather than simply saying "yes, it is" or "no, it isn't"

In a C program, it's a single character, '\n'representing end of line. However, some operating systems (most notably Microsoft Windows) use two characters to represent end of line in text files, and this is likely where the confusion comes from.
It's the responsibility of the C I/O functions to do the conversions between the C representation of '\n' and whatever the OS uses.
In C programs, simply use '\n'. It is guaranteed to be correct. When looking at text files with some sort of editor, you might see two characters. When a text file is transferred from Windows to some Unix-based system, you might get "^M" showing up at the end of each line, which is annoying, but has nothing to do with C.

Generally: '\n' is a single character, which represents a newline. '\r' is a single character, which represents a carriage-return. They are their own independent ASCII characters.
Issues arise because in the actual file representation, UNIX-based systems tend to use '\n' alone to represent what you think of when you hit "enter" or "return" on the keyboard, whereas Windows uses a '\r' followed directly by a '\n'.
In a file:
"This is my UNIX file\nwhich spans two lines"
"This is my Windows file\r\nwhich spans two lines"
Of course, like all binary data, these characters are all about interpretation, and that interpretation depends on the application using the data. Stick to '\n' when you are making C-strings, unless you want a literal carriage-return, because as people have pointed out in the comments, the OS representation doesn't concern you. IO libraries, including C's, are supposed to handle this themselves and abstract it away from you.
For your curiosity, in decimal, '\n' in ASCII is 10, '\r' is 13, but note that this is the ASCII standard, not a C standard.

It depends:
'\n' is a single character (ASCII LF)
"\n" is a '\n' character followed by a 0 terminator
some I/O operations transform a '\n' into '\r\n' on some systems (CR-LF).

When you print the \n to a file, using the windows C stdio libraries, the library interprets that as a logical new-line, not the literal character 0x0A. The output to the file will be the windows version of a new-line: 0x0D0A (\r\n).
Writing
Sample code:
#include <stdio.h>
int main() {
FILE *f = fopen("foo.txt","w");
fprintf(f,"foo\nbar");
return 0;
}
A quick cl /EHsc foo.c later and you get
0x666F6F 0x0D0A 0x626172 (separated for convenience)
in foo.txt under a hex editor.
It's important to note that this translation DOES NOT occur if you are writing to a file in 'binary mode'.
Reading
If you are reading the file back in using the same tools, also on windows, the "windows EOL" will be interpreted properly if you try to match up against \n.
When reading it back
#include <stdio.h>
int main() {
FILE *f = fopen("foo.txt", "r");
char c;
while (EOF != fscanf(f, "%c", &c))
printf("%x-", c);
}
You get
66-6f-6f-a-62-61-72-
Therefore, the only time this should be relevant to you is if you are
Moving files back and forth between mac/unix and windows. Unix needs no real explanation here, since \n directly translates to 0x0A on those platforms. (pre-OSX \n was 0x0D on mac iirc)
Putting text in binary files, only do this carefully please
Trying to figure out why your binary data is being messed up when you opened the file "w", instead of "wb"
Estimating something important based on the size of the file, on windows you'll have an extra byte per newline.

\n is a new-line -- it's a logical representation of whatever separates one line from another in a text file.
A given platform will have some physical representation of that logical separation between lines. On Unix and most similar systems, the new-line is represented by a line-feed (LF) character (and since Unix was/is so closely associated with C, on Unix the LF is often just called a new-line). On MacOS, it's typically represented by a carriage-return (CR). On a fair number of other systems, most prominently Windows, it's represented by a carriage return/line feed pair -- normally in that order, though once in a while you see something use LF followed by CR (as I recall, Clarion used to do that).
In theory, a new-line doesn't need to correspond to any characters in the stream at all though. For example, a system could have text files that were stored as a length followed by the appropriate number of characters. In such a case, the run-time library would need to carry out a slightly more extensive translation between internal and external representations of text files than is now common, but such is life.

According to the C99 Standard (section 5.2.2),
\n "moves the active position [where the next character from fputc would appear] to the initial position on the next line".
Also
[\n] shall produce a unique implementation-defined value
which can be stored in a single char object. The external representations in a text file
need not be identical to the internal representations and are outside the scope of [the C99 Standard]
Most C implementations choose to define \n as ASCII line feed (0x0A) for historical reasons. However, on many computer operating systems, the sequence for moving the active position to the beginning of the next line requires two characters usually 0x0D, 0x0A. So, when writing to a text file, the C implementation must convert the internal sequence of 0x0A to the external one of 0x0D, 0x0A. How this is done is outside of the scope of the C standard, but usually, the file IO library will perform the conversion on any file opened in text mode.

Your question is about text files.
A text file is a sequence of lines.
A line is a sequence of characters ending in (and including) a line break.
A line breaks is represented differently by different Operating Systems.
On Unix/Linux/Mac they are usually represented by a single LINEFEED
On Windows they are usually represented by the pair CARRIAGE RETURN + LINEFEED
On old Macs they were usually represented by a single CARRIAGE RETURN
On other systems (AS/400 ??) there may even not be a specific character that represents a line break ...
Anyway, the library code in C is responsible to translating the system's line break to '\n' when reading text files and do the reverse operation when writing text files.
So, no matter what the representation is on any given system, when you read a text file in C, lines will be ended by a '\n'.
Note: The '\n' is not necessarily 0x0a in all systems.

Yes it is.
\n is a newline. Hex code is 0x0A.
\r is a carriage return. Hex code is 0x0D

It is a single character. It represents Newline (but is not the only representation - Wikipedia).
EDIT: The question was changed while I was typing the answer.

Related

why when we write \n in the file it converts into \r\n combination?

I read this concept from book that when we attemp to write \n to the file using fputs(), fputs() converts the \n to \r\n combination and then if we read the same line back using fgets () the reverse conversion happens means \r\n back convert to \n.
I don't get that what is the purpose behind this?
It is because Windows (and MS-DOS) text files are supposed to have lines ending in \r\n, and portable C programs are supposed to simply use \n because C was originally defined on Unix.
And it's not just fputs and fgets that do it - any I/O function on a text file, even getc and fread, will do the same conversion.
Succinctly, DOS is the reason for this.
Different systems have different conventions for line endings. Unix reckons one character, '\n', is sufficient to mark the end of a line. DOS decided that it needed two characters, '\r' and '\n', though other systems also used that convention. The versions of Mac OS 1-9 (prior to Mac OS X) used just '\r' instead. Other systems could use a count and the line data instead of a line ending, or could simulate punched cards with blanks up to a fixed length (72 or 80). Unix also doesn't distinguish between binary and text files; DOS does. (DOS also uses Control-Z to mark EOF in a text file. Unix doesn't have an EOF marker; it knows exactly how big the file is and uses that length to determine when it has reached EOF.)
C originate on Unix, but to make it easier to migrate code between the systems, the standard I/O package defined that when it was working on text files, the input side would convert a native line ending to the single '\n' character for uniform input, and the output side would convert a '\n' to the native line ending.
However, the mention of text files also meant that there needed to be binary files, where these mappings do not occur.
You might note that most of the internet protocols (HTTP, for example) mandate CRLF (carriage return, line feed, or '\r', '\n') for the end of line markers.
(Actually, blaming DOS, as in MS-DOS or PC-DOS, is a little unfair. There were other systems that used the CRLF line end convention before DOS existed, and they may have been more influential on the Internet. However, almost all those ancestral systems are substantially defunct, and Windows is the environment that you'll run into these days where the distinction between binary and text files matters, and where you'll encounter CRLF line endings.)
Note that the C standard has this to say about text files:
ISO/IEC 9899:2011 §7.21.2 Files
¶2 A text stream is an ordered sequence of characters composed into lines, each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to conform to differing
conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a new-line character.
Whether space characters that are written out immediately before a new-line character
appear when read in is implementation-defined.
That's a lot of things that might or might not happen. Note, in particular, that trailing blanks written to a file might, or might not, appear in the input — according to the standard. That allows the systems that support punched card images or fixed length records to comply with the standard.
Note, too (as pointed out by Giacomo Degli Eposti), that this all means that if you open a file in binary mode that was originally written as a text file, you may very well get a significantly different list of bytes back from the I/O system. You'll see two characters per newline; you might see a Control-Z followed by other characters (possibly null bytes) up to a 'block' boundary that might be a multiple of 256 bytes, etc.

Newline character in C other than \n? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
Is there a replacement of \n in C? Can I jump to next line without using \n? I came across this question and I cannot seem to figure out a way..
#include < stdio.h>
void main()
{
printf("A");
printf("B");
}
I want it to print
A
B
And not
AB
But I cannot use \n.
You can use puts("A"); puts("B");.
To output \n without writing \n you could simply printf("%c",10);, or char c=10;write(1,&c,1);, or 100 other ways to output ascii 10.
But if you would like to avoid the newline character completely, the answer varies. It's not C or your code that actually decides to go to a new line, but whatever the device is that displays your output.
E.g. a terminal, or a line printer, or a web browser showing html.
Suppose you are outputting to a browser, the answer would be <br>
If you are outputting to an xterm terminal, \033[1B moves the cursor one line down. The output a carriage return \r to move to the beginning of the line.
So for example:
printf("test\033[1B\r123");
will output (in an xterm)
test
123
As an alternative to puts(), you can use the octal equivalent of \n: \012.
E.g.
#include <stdio.h>
int main()
{
printf("A\012");
printf("B\012");
return 0;
}
\n means "new line". \rmeans "carriage return". These characters have their origin from old form-feed printers, which needed both, one to advance the paper to a new line, one to return the carriage with the print head to the start of the line.
Different OS-es have uses different combinations of the two to indicate a new line in text:
Unix (and Linux, Mac OSX, etc) uses only \n
Windows uses \r\n
Mac OS up to version 9 uses only \r
There is no canonical platform-independent way in the C standard libraries to represent the "System"'s newline/carriage return character. However, many other languages and/or libraries have implemented this, e.g:
Java: System.getProperty("line.separator")
.NET: System.Environment.NewLine
However, if you have an output stream open on a text file, the standard C libraries should translate \n to whatever is necessary to indicate a new line on the current system.
I'd simply use:
printf("A%cB",10);
In C the escape sequence \n represents a character that belongs to the basic execution character set.
In particular, one can be sure that a new-line character exists when our C programs are running.
More important: \n is mapped to 1 and only 1 character, whose code is a positive integer number in the range of the type char.
However, the behaviour of the display devices can produce other results.
For example, when we send a character \n to a text file under Windows, this is replaced by the sequence of two characters \x0D\x0A (LF+CR, that is: Line Feed + Carriage Return).
The standard C99 or C11 says:
(5.2.2) \n (new line) Moves the active position to the initial position of the next line.
The meaning of that (in every system that prints several lines in its standard display device), the "effect" of sending a character \n is "advance to the next line" and "go to the beginning of that line". It is the sum of "line feed" and "carriage return" operations.
In DOS/Windows this is equivalent to send the sequence of these two characters: '\xD', '\xA'.
In Linux/Unix this is equivalent to send the only character '\xD'.
More details here: http://en.wikipedia.org/wiki/Newline
Summarizing, we have to distinguish betwenn the character \n by itself, and the semantic related to \n when is considered as a control character in C.
You can "produce" a new line just by sending the character '\n' with putc() or printf().
By using \n is the better way to do that.
Another very good alternative is that pointed out by #1'': to use the function puts(str).
This function appends a new-line at the end of the string str.
I think is not at all a good idea to try this other alternatives:
printf("%c", 10); // 10 == 0x0A
printf("%c", 13); // 10 == 0x0D
printf("\x0A"); // CR
printf("\x0D"); // LF
You are not solving the problem properly, because your program becomes system dependent.
Valid for Windows, Linux or what?
Worst: in theory, the standard C does not ensures that you even have correspondence between the ASCII/Unicode code numbers for the characters and the characters used in the system your program will run. This issue involves to the control characters, too.
So, you cannot be sure that 0x10 means "carriage return" and 0x13 means "line feed".
To be sure of that, it is necessary to check the existence of the following macro:
__STDC_ISO_10646__
(That macro, if exists, is a long int number containing information about the version of Unicode supported by your compiler.)
The important thing is: if the macro __STDC_ISO_10646__ is not defined, you cannot have certainty about the codes assigned to the characters in your system.
Thus, it cannot be used the "magic numbers" 10 and 13.

How can tell the end of a line with c

I don't know whether the line is ended by '\n' or '\r' or '\r\n'
and don't what the text is encoded by , besides if the encode is utf-8, it can be no bom.
Is there a function or a lib can do this ,or just tell me the termination of a line.
Are you by chance using fgets, fread, fputs, fwrite, etc, on a file that is open for reading text? If so, the implementation will automatically transform OS-specific line terminators (eg. "\r\n") into '\n' when reading, and transform '\n' into OS-specific line terminators when writing.
There are two other scenarios, one of which it turns out was OP:
OP was struggling with "\r\n" being carried over from other OS software, and so opening files for reading in his (presumably Unix-like) OS would no longer convert that. My suggestion is to use dos2unix for these one-off conversions, rather than bloating your code with something which will likely never run again.
You're not using one of those functions. This could be because you're using a stream such as a socket, and perhaps the protocol requires "\r\n". In this case, you should use strstr to find the exact sequence "\r\n".
UTF-8 was designed with a degree of compatibility to ASCII in mind, hence you can assume that any system that uses UTF-8 will also use ASCII or some similar character set. Any characters that use sequences larger than one byte will only use values 0x80 or greater to represent. Since '\n' lies within the 0x00-0x7F range, you're guaranteed that it'll be a single byte and it won't exist as part of a multi-byte character.
Use wcslen to get the size in byte of an utf8 string.
http://linux.die.net/man/3/wcslen

carriage return by fgets

I am running the following code:
#include<stdio.h>
#include<string.h>
#include<io.h>
int main(){
FILE *fp;
if((fp=fopen("test.txt","r"))==NULL){
printf("File can't be read\n");
exit(1);
}
char str[50];
fgets(str,50,fp);
printf("%s",str);
return 0;
}
text.txt contains: I am a boy\r\n
Since I am on Windows, it takes \r\n as a new line character and so if I read this from a file it should store "I am a boy\n\0" in str, but I get "I am a boy\r\n". I am using mingw compiler.
The behavior depends on the c library implementation and which mode you pass to fopen. See this quote from the MSDN documentation on fopen (fopen on MSDN):
b - Open in binary (untranslated) mode; translations involving carriage-return and linefeed characters are suppressed.
Means, if you use the Microsoft c library, and open your file omitting the 'b', the carriage return characters will be removed from the stream.
Since you're using mingw, your compiler probably links against the GNU c library which follows the POSIX standard. This is what the GNU documentation says about fopen (fopen on gnu.org):
The character ‘b’ in opentype has a standard meaning; it requests a binary stream rather than a text stream. But this makes no difference in POSIX systems (including GNU systems).
Concluding: you're omitting the 'b' mode char, which opens your stream in text mode. You're on Windows but use a GNU c library which makes no difference between text and binary mode. This is why fgets reads both carriage return and new line.
Since I am on Windows, it takes \r\n as a new line character...
This assumption is wrong. The C standard treats carriage return and new line as two different things, as evidenced in C99 §5.2.1/3 (Character sets):
[...] In the basic execution character set, there shall be
control characters representing alert, backspace, carriage return, and new-line. [...]
The fgets function description is as follows, in C99 §7.19.7.2/2:
The fgets function reads at most one less than the number of characters specified by n
from the stream pointed to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is retained) or after end-of-file. A
null character is written immediately after the last character read into the array.
Therefore, when encountering the string I am a boy\r\n, a conforming implementation should read up to the \n character. There is no possibly sane reason why the implementation should discard \r based on the platform.
The c standard says this about text streams in (among other things):
Characters may have to be added, altered, or deleted on input and output to
conform to differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one correspondence between the
characters in a stream and those in the external representation. Data read in
from a text stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line
character is immediately preceded by space characters; and the last character
is a new-line character.
In other words, if a file is opened in text mode, an implementation is free to add, remove and modify control characters if it wants/needs to when going to and from disk. Which is apparently what the microsoft implementation does with the carriage return, but the gnu implementation doesn't.

In C how to write whichever end of line character is appropriate to the OS?

Unix has \n, Mac was \r but is now \n and DOS/Win32 is \r\n. When creating a text file with C, how to ensure whichever end of line character(s) is appropriate to the OS gets used?
fprintf(your_file, "\n");
This will be converted to an appropriate EOL by the stdio library on your operating system provided that you opened the file in text mode. In binary mode no conversion takes place.
From Wikipedia:
When writing a file in text mode, '\n'
is transparently translated to the
native newline sequence used by the
system, which may be longer than one
character. (Note that a C
implementation is allowed not to store
newline characters in files. For
example, the lines of a text file
could be stored as rows of a SQL table
or as fixed-length records.) When
reading in text mode, the native
newline sequence is translated back to
'\n'. In binary mode, the second mode
of I/O supported by the C library, no
translation is performed, and the
internal representation of any escape
sequence is output directly.
When you open a file in text mode (pass "w" to fopen instead of "wb") any newline characters written to the file will automatically be converted to the appropriate newline sequence for the system. Newline sequences will be translated back to newline characters when you read the file.
This is why it's important to distinguish between text and binary mode; if you're writing in binary mode, C will not tamper with the bytes you write to a file.

Resources