Why exactly does printf not compile? - c

I am reading the Kernigan and Ritchie manual on C.
The following example arises:
printf("hello, world
");
The book states that the C compiler will produce an error message. How exactly does the compiler detect this, and why is this an issue? Shouldn't it just read the newline (presumably at the end of world) as no space?

You are not allowed to have a newline in a string literal, we can see this by looking at the grammar from the C99 draft standard section 6.4.5 String literals:
string-literal:
" s-char-sequenceopt "
L" s-char-sequenceopt "
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except
the double-quote ", backslash \, or new-line character
We can see that s-char allows any character except ", \ and new-line.

The compiler detects the error because it finds a \n newline (ie, the one right after the d in world) before it has found a " (ie, to match the opening ").
It does this because, as other commenters have said, that is what the specification requires that it do.

There's "\n" in the argument. Never hit enter while writing arguments for printf() :)

Related

Why does C not recognize strings across multiple lines?

(I am very new to C.)
Visual newlines seem to be unimportant in C.
For instance:
int i; int j;
is same as
int i;
int j;
and
int k = 0 ;
is same as
int
k
=
0
;
so why is
"hello
hello"
not the same as
"hello hello"
It is because a line that contains a starting quote character and not an ending quote character was more likely a typing mistake or other error than an attempt to write a string across multiple lines, so the decision was made that string literals would not span source lines, unless deliberately indicated with \ at the end of a line.
Further, when such an error occurs, the compile would be faced with reading possibly thousands of lines of code before determining there was no closing quote character (end of file was reached) or finding what was intended as an opening quote character for some other string literal and then attempting to parse the contents of that string literal as C code. In addition to burdening early compilers with limited compute resources, this could result in confusing error messages in a part of the source code far removed from the missing quote character.
This choice is effected in C 2018 6.4.5 1, which says that a string literal is " s-char-sequenceopt ", where s-char-sequence is any member of the character set except the quote character, a backslash, or a new-line character (and a string literal may also have an encoding prefix, a u8, u, U, or L before the first ").
Strings can be continued over newlines by putting a backslash immediately before the newline:
"hello \
hello"
Or (better), using string concatenation:
"hello "
"hello"
Note that the space has been carefully preserved so that these are equivalent to "hello hello" except for the line numbering in the file after the appearance.
The backslash-newline line elimination is done very early in the translation process — in phase 2 of the conceptual translation phases.
Note that there is no stripping leading blanks or anything. If you write:
printf("Some long string with maybe an integer %d in it\
and some more data on the next line\n", i);
Then the string has a sequence of (at least) 8 blanks in it between in it and and some. The count of 8 assumes that the printf() statement is aligned in the left margin; if it is indented, you'd need to add the extra white space corresponding to the indentation.
1- using double quotes for each string :
char *str = "hello "
"hello" ;
** One problem with this approach is that we need to escape specially characters such as quotation mark " itself.
2- Using - \ :
char *str = "hello \
hello" ;
** This form is a lot easier to write, we don't need to write quotation marks for each line.
We can think of a C program as a series of tokens: groups of characters that can't be split up without changing their meaning. Identifiers and keywords are tokens. So are operators like + and -, punctuation marks such as the comma
and semicolon, and string literals.
For example, the line
int i; int j;
consists of 6 tokens: int, i, ;, int, j and ;. Most of the time, and particularly in this case, the amount of space (space, tab and newline characters) is not critical. That's why the compiler will treat
int i
;int
j;
The same.
Writing
"Hello
Hello"
Is like writing
un signed
and hope that the compiler treat it as
unsigned
Just like space is not allowed between a keyword, newline character is not allowed in a string literal token. But it can be included using the newline escape '\n' when needed.
To write strings across lines use string concatenation method
"Hello"
"Hello"
Although the above method is recommended, you can also use a backslash
"Hello \
Hello"
With the backslash method, beware of the beginning space in a new line. The string will include everything in that line until it finds a closing quote or another backslash.

\\\ prints just one backslash?

In C, on mentioning three backslashes, like so:
#include <stdio.h>
int main() {
printf(" \\\ ");
}
prints out just one backslash in the output. Why and how does this work?
That sequence is:
A space
A double backslash, which encodes a single backslash in the runtime string
A backslash followed by a space, which is not a standard escape sequence and should give you a diagnostic
The C11 draft says (in note 77):
The semantics of these characters were discussed in 5.2.2. If any other
character follows a backslash, the result is not a token and a diagnostic is
required.
On godbolt.org I got:
<source>:8:14: warning: unknown escape sequence '\ ' [-Wunknown-escape-sequence]
So you seem to be using a non-conforming compiler, which chooses to implement undefined backslash sequences by just letting the character through.
That is printing:
space
slash
escaped space
The 3rd slash is being interpreted as "slash space"
C11; 6.4.4.4 Character constants:
The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences \" and \?, respectively, but the single-quote ' and the backslash \
shall be represented, respectively, by the escape sequences \' and \\.
So, To represent a single backslash, it’s necessary to place double backslashes \\ in the source code. To print two \\ you need four backslash \\\\. In your code extra \ is a space character, which isn't valid.
It is indeed a very simple operation in c. A \ is just a escape sequences. Hence below statement will print two slash.
printf(" \\\\ ");
For example some characters in c are represented with a slash like end of a line character \n or end of a string character \0 etc. But if you want to print such a character as it is what will you do? Hence you need to add a escape sequence character in front of it:
printf("\\n"); // will print \n
But
printf("\n"); // will print end of character hence you don't see anything in output

Multi character constant warning for escaped \t

If I write putchar('\\t'); while trying to print "\t" instead of an actual tab, I get the multi character constant warning. On the other hand, if I write putchar('\\'); I get no warning. Upon looking in the ASCII table, there is no character '\\', only '\'. So why is there no warning? Why is '\\' one character but '\\t' is more than one? Can a backslash only be used to escape one following character?
You cannot print \ and t with one putchar invocation, since putchar puts one and exactly only one character into the standard output. Use 2:
putchar('\\');
putchar('t');
Another option would be to use the string "\\t" with fputs:
fputs("\\t", stdout);
There is no warning for '\\' because that is one way how you enter the character literal for the character \. On ASCII this is synonymous with '\134' and '\x5c'.
From C11 6.4.4.4 paragraphs 2 and 4:
2
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'. [...] With a few exceptions detailed later, the elements of the sequence are any members of the source character set; they are mapped in an implementation-defined manner to members of the execution character set.
[...]
4
The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.
The reason why you get a warning for this is that the behaviour is wholly implementation-defined. In C11 J.3.4 the following is listed as implementation-defined behaviour:
The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character (6.4.4.4).
Since '\\' contains an escape sequence that maps to a single-byte execution character \, there is no implementation-defined pitfalls, and nothing to warn about; but \\t contains 2 characters: \ and t, and it wouldn't do what you want portably.
\\ is one character, t is one character, so that is clearly two characters.
\\ is an escape sequence, just like \t; it means \.
If you want to print the two characters \ and t, you clearly need either two calls to putch() or a function that takes a string argument "\\t".
https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences

Is '\n' necessary to be given to printf() in C?

I am reading the book "The C Programming Language" by Brian Kernighan and Dennis Ritchie(2nd edition, published by PHI). In the first article 1.1 Getting started of the first chapter A Tutorial Introduction, page number 7, they say that one must use \n in the printf() argument, otherwise the C compile will produce an error message. But when I compiled the program without \n in printf(), it went fine. I did not see any error message. I am using Dev-C portable with "MinGW GCC 4.6.2 32-bit" compiler.
Why I do not get the error message?
Here is the passage in question, from page 7 of the second edition of K&R:
You must use \n to include a newline character in the printf argument; if you try something like
printf("hello, world
");
the C compiler will produce an error message.
This means that you can't embed a literal newline in a quoted string.
Either one of the lines below, however, are fine:
printf("hello, world"); /* does not print a newline */
printf("hello, world\n"); /* prints a newline */
All the text above is saying is that you can't have a quoted string that spans multiple lines in the source code.
You can also escape a newline with a backslash. The C preprocessor will remove the backslash and newline, so the following two statements are equivalent:
printf("hello, world\
");
printf("hello, world");
And if you have a lot of text, you can put multiple quoted strings next to each other, or separated by whitespace, and the compiler will join them for you:
printf("hello, world\n"
"this is a second line of text\n"
"but you still need to include backslash-n to break each line\n");
You don't get a compile-time error message because there is no error.
In the first article they say that one must use \n in the printf() argument, otherwise the C compiler will produce an error message.
Can you cite (by section and/or page number) where that statement appears? I seriously do not believe that K&R (you're using the second edition, right?) says that. If it did say that, it would be an error in the book.
Update: What the book says, quite correctly, is that a newline in a string literal is represented by the two-character sequence \n, not by an actual newline character. A string literal must be on a single logical source line; something like
printf("hello
world");
is a syntax error. This applies to all string literals, whether they're printf format strings or not.
An actual newline in a string literal is an error. A \n sequence that represents a newline is optional; its lack is not an error, but a printf format string should usually end with a \n.
There is no requirement for a printf call to include the \n character, and I've never seen a compiler complain about a printf that lacks a \n.
There is an issue here, but it's not a compile-time error.
Some examples:
printf("No newline");
This is a perfectly legal call. It prints the specified string on standard output without a newline character.
printf("hello%c", '\n');
There's no \n in the format string, but it prints hello followed by a newline. Again, this is perfectly legal.
The actual issue is that you should (almost) always print a newline at the very end of your output. This complete program:
#include <stdio.h>
int main(void) {
printf("hello");
return 0;
}
is legal, but its behavior may be undefined in some implementations. The relevant rule is in the standard, section 7.21.2 paragraph 2 (the quote is from the N1570 draft):
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires a
terminating new-line character is implementation-defined.
Whether that terminating newline character is required or not, it's (almost always) a very good idea to end your output with a newline. If I run it on my system, I get the string hello immediately followed by my shell prompt on the same line. It's not illegal, but it's inconvenient and ugly.
But that applies only at the very end of the program's output. This program is perfectly valid and has well defined behavior:
#include <stdio.h>
int main(void) {
printf("hello");
putchar('\n');
return 0;
}
Still, the easiest and most reliable way to produce clean output is for each printf call to print exactly one line, which ends with exactly one '\n' character. This isn't a universal rule; sometimes it's convenient to print a line a piece at a time, or to print two or more lines in a single printf.
Very often, if you don't end your printf format string with a \n, some of the output stays in the stdout buffer, and you need to call fflush to get all the output shown.
This means that if you don't get all the expected output you should add fflush at appropriate places (e.g. before calls to fork).
But you won't get a compiler message in such case, because it is not an error (it may be a mistake many beginners are doing). If you really wanted, you could customize your compiler (e.g. with MELT if using a recent GCC compiler) to get the warning. I believe it is not worth the effort (because there are legitimate calls to printf without any \n....)
An example of legitimate printf calls without newlines would be if you coded a (recursive) function to output an expression from its AST; you certainly should not emit a newline after each token.
See documentation of printf(3), fflush(3), stdio(3), setvbuf(3) etc...

Writing printf(#"") with multiple line string

Is it possible to write something like this:
printf(#"
-
-
-
-
");
I can do it in C#, but can't in C. It gives me an error in CodeBlocks. Am I allowed to do such ?
Error message: error: stray '#' in program.
No. That syntax doesn't exist in C.
If you want a multiple-line string, write it as multiple double-quoted strings with no other tokens in between them. They will be combined.
printf(
"some string"
"more of the string"
"even more of the string"
);
(You will, of course, need to add a \n at the end of each line if that's what you want.)
No that's not a syntax that C understands, C doesn't have raw literals.
You can use \ as the last character to continue on the next line:
const char *str = "hello\n\
world";
Also, consecutive string literals will be concatenated. So you can do e.g.
const char *str = "Hello\n"
"world\n";
C#'s verbatim strings are not available in C. If you have some characters to escape, like " or \, escape them with '\', there is no there option in this language.
If you want to embed multiple lines in a string literal, you can either insert \n at the appropriate location in your string, or escape the return character as well:
printf("Here's\
a multiline\
string litteral");
Line continuation with \ at the end of the line.
printf("\
\
-\
-\
-\
-\
");
String literals in C may not contain newlines. You have two workarounds:
Use implicit string concatenation (done by the compiler).
printf("The quick brown"
" fox jumps over"
" the sleazy dog.");
Escape the newline by placing a backslash in front of it.
printf("The quick brown\
fox jumps over\
the sleazy dog.");
Personally, I prefer the first form since the second looks ugly (my opinion) and forces you to ruin your code indentation.
In either case, the string will simply not contain the newlines. So if you really meant for them to be there, you'll have to add them via \n.

Resources