how Backslash \ joins printf strings when written on separate lines in C?

how Backslash \ joins printf strings when written on separate lines in C? - c

Using Dev C++ I was doing some fun with C and got this :
#include<stdio.h>
main()
{
printf("Hello
world" );
}
^^^^ here I thought output would be like "Hello (with spaces) World" but
Errors :
C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c In function 'main':
5 10 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Warning] missing terminating " character
5 3 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Error] missing terminating " character
6 8 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Warning] missing terminating " character
6 1 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Error] missing terminating " character
6 1 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Error] 'world' undeclared (first use in this function)
6 1 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Note] each undeclared identifier is reported only once for each function it appears in
7 1 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Error] expected ')' before '}' token
7 1 C:\Users\ASUS\Documents\Dev C++ Programs\helloWorldDk.c [Error] expected ';' before '}' token
but when i added a \ it worked :
#include<stdio.h>
main()
{
printf("Hello \
World" );
}
Without any warnings and errors.
What Magic of '\' is this ?
And do any other soccery exists , please let me know .

The backslash has many special meanings, e.g. escape sequences to represent special characters.
But the special meaning you found is the one of \ immediatly followed by a newline; which is "ignore me and the newline". For the compiler this solves the problem of encountering a newline in the middle of a string.

The C pre-processor will line splice, so one could have written,
#include <stdio.h>
int main(void) {
printf("Hello\n"
"World\n");
return 0;
}
Arguably nicer syntax with long strings. Note that the maximum length is still enforced. From a theoretical point-of-view, the C pre-processor is a language unto itself, see a discussion on Turing-completeness. For a practical example, x-macros are very useful in some cases.

Related

C lexer: token concatenation of unterminated string literals

Consider the following c code:
#include <stdio.h>
#define PRE(a) " ## a
int main() {
printf("%s\n", PRE("));
return 0;
}
If we adhere strictly to the tokenization rules of c99, I would expect it to break up as:
...
[#] [define] [PRE] [(] [a] [)] ["]* [##] [a]
...
[printf] [(] ["%s\n"] [,] [PRE] [(] ["]* [)] [)] [;]
...
* A single non-whitespace character that does not match any preprocessing-token pattern
And thus, after running preprocessing directives, the printf line should become:
printf("%s\n", "");
And parse normally. But instead, it throws an error when compiled with gcc, even when using the flags -std=c99 -pedantic. What am I missing?

From C11 6.4. Lexical elements:
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
3 [...] The categories of preprocessing tokens are: header names,
identifiers, preprocessing numbers, character constants, string
literals, punctuators, and single non-white-space characters that do
not lexically match the other preprocessing token categories.69) If a
' or a " character matches the last category, the behavior is
undefined. [...]
So if " is not part of a string-literal, but is a non-white-space character, the behavior is undefined. I do not know why it's undefined and not a hard error - I think it's to allow compilers to parse multiline string literals.
it throws an error
But on godbolt:
<source>:3:16: warning: missing terminating " character
3 | #define PRE(a) " ## a
| ^
<source>:6:24: warning: missing terminating " character
6 | printf("%s\n", PRE("));
| ^
<source>:8: error: unterminated argument list invoking macro "PRE"
8 | }
|
<source>: In function 'int main()':
<source>:6:20: error: 'PRE' was not declared in this scope
6 | printf("%s\n", PRE("));
| ^~~
<source>:6:20: error: expected '}' at end of input
<source>:5:12: note: to match this '{'
5 | int main() {
| ^
it throws an error not on #define PRE line (it could), but on PRE(") line. Tokens are recognized before macro substitutions (phase 3 vs phase 4), so whatever you do you can't like "create" new lexical string literals as a result of macro substitution by for example gluing two macros or like you want to do. Note that -pedantic will not change the warning into error - -pedantic throws errors where standard tells to throw error, but the standard tells the behavior is undefined, so no error is needed there.

Difference between \% and %% [duplicate]

This question already has answers here:
Why is percentage character not escaped with backslash in C?
(4 answers)
How to escape the % (percent) sign in C's printf
(13 answers)
Closed 9 years ago.
After reading over some K&R C I saw that printf can "recognize %% for itself" I tested this and it printed out "%", I then tried "\%" which also printed "%".
So, is there any difference?
Edit for code request:
#include <stdio.h>
int main()
{
printf("%%\n");
printf("\%\n");
return 0;
}
Output:
%
%
Compiled with GCC using -o
GCC version: gcc (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]

%% is not a C escape sequence, but a printf formatter acting like an escape for its own special character.
\% is illegal because it has the syntax of a C escape sequence, but no defined meaning. Escape sequences besides the few listed as standard are compiler-specific. In all likelihood the compiler ignored the backslash, and printf did not see any backslash at runtime. If it had, it would have printed the backslash in the output, because backslash is not special to printf.

Both are not the same. The second one will print %, but in case of the first one, you will get compiler warning:
[Warning] unknown escape sequence: '%' [enabled by default]
The warning is self explanatory that there is no escape sequence like \% in C.
6.4.4.4 Character constants;
says
The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.
It is clear that % can't be represented as \%. There isn't any \% in C.

When "%%" is passed to printf it will print % to standard output, but "\%" in not an valid escape sequence in C. Hence the program will compile, but it will not print anything and will generate a warning:
warning: spurious trailing ‘%’ in format [-Wformat=] printf("%");
The list of escape sequences in C can be found in Escape sequences in C.
This won't print % for the second printf.
int main()
{
printf("%%\n");
printf("\%");
printf("\n");
return 0;
}
Output:
%

Compilation Error in C-File handling

#include<stdio.h>
#include<conio.h>
FILE *fp;
int main()
{
int val;
char line[80];
fp=fopen("\Users\P\Desktop\Java\a.txt","rt");
while( fgets(line,80,fp)!=NULL )
{
sscanf(line,"%d",&val);
printf("val is:: %d",val);
}
fclose(fp);
return 0;
}
Why is there a compile error in the line fp=fopen("\Users\P\Desktop\Java\a.txt","rt")?

Escape your backslashes.
fp=fopen("\\Users\\P\\Desktop\\Java\\a.txt","rt");

xx.c:8:12: error: \u used with no following hex digits
fp=fopen("\Users\P\Desktop\Java\a.txt","rt");
^
xx.c:8:12: warning: unknown escape sequence '\P'
xx.c:8:12: warning: unknown escape sequence '\D'
xx.c:8:12: warning: unknown escape sequence '\J'
The issue with the backslash. Backslash is an escape in a C char string.
Try this
fp=fopen("\\Users\\P\Desktop\\Java\\a.txt","rt");
or this depending on your OS:
fp=fopen("/Users/P/Desktop/Java/a.txt","rt");

You may be familiar with how "\n" (newline) and "\t" (tab) are used in C-strings.
The compiler will look at any \<Character> and try to interpret it as an Escape-Sequence.
So, where you wrote "\Users\P\Desktop\Java\a.txt", the compiler is trying to treat
\U, \P, \D, \J and \a as special escape-sequences.
(The only one that seems to be valid is \a, which is the Bell/Beep sequence. The others should all generate errors)
As others have said, use \\ to insert a literal Backslash character, and not start an escape sequence.
P.S. Shame on you for not including the the compiler message in your question.
The worst questions all say, "I got an error", without ever describing what the error was.

Why can't we use the preprocessor to create custom-delimited strings?

I was playing around a bit with the C preprocessor, when something which seemed so simple failed:
#define STR_START "
#define STR_END "
int puts(const char *);
int main() {
puts(STR_START hello world STR_END);
}
When I compile it with gcc (note: similar errors with clang), it fails, with these errors:
$ gcc test.c
test.c:1:19: warning: missing terminating " character
test.c:2:17: warning: missing terminating " character
test.c: In function ‘main’:
test.c:7: error: missing terminating " character
test.c:7: error: ‘hello’ undeclared (first use in this function)
test.c:7: error: (Each undeclared identifier is reported only once
test.c:7: error: for each function it appears in.)
test.c:7: error: expected ‘)’ before ‘world’
test.c:7: error: missing terminating " character
Which sort of confused me, so I ran it through the pre-processor:
$ gcc -E test.c
# 1 "test.c"
# 1 ""
# 1 ""
# 1 "test.c"
test.c:1:19: warning: missing terminating " character
test.c:2:17: warning: missing terminating " character
int puts(const char *);
int main() {
puts(" hello world ");
}
Which, despite the warnings, produces completely valid code (in the bolded text)!
If, macros in C are simply a textual replace, why is it that my initial example would fail? Is this a compiler bug? If not, where in the standards does it have information pertaining to this scenario?
Note: I am not looking for how to make my initial snippet compile. I am simply looking for info on why this scenario fails.

The problem is that even though the code expands to " hello, world ", it's not being recognized as a single string literal token by the preprocessor; instead, it's being recognized as the (invalid) sequence of tokens ", hello, ,, world, ".
N1570:
6.4 Lexical elements
...
3 A token is the minimal lexical element of the language in translation phases 7 and 8. The
categories of tokens are: keywords, identiﬁers, constants, string literals, and punctuators.
A preprocessing token is the minimal lexical element of the language in translation
phases 3 through 6. The categories of preprocessing tokens are: header names,
identiﬁers, preprocessing numbers, character constants, string literals, punctuators, and
single non-white-space characters that do not lexically match the other preprocessing
token categories.69) If a ' or a " character matches the last category, the behavior is
undeﬁned. Preprocessing tokens can be separated by white space; this consists of
comments (described later), or white-space characters (space, horizontal tab, new-line,
vertical tab, and form-feed), or both. As described in 6.10, in certain circumstances
during translation phase 4, white space (or the absence thereof) serves as more than
preprocessing token separation. White space may appear within a preprocessing token
only as part of a header name or between the quotation characters in a character constant
or string literal.
69) An additional category, placemarkers, is used internally in translation phase 4 (see 6.10.3.3); it cannot
occur in source ﬁles.
Note that neither ' nor " are punctuators under this definition.

The preprocessor runs in multiple phases. Phase 3, tokenization, occurs before expansion, so preprocessor macros must represent full tokens. In your case, STR_START and STR_END are tokenized and then substituted, which makes those tokens invalid.

Here
#define STR_START "
compiler expects string literal. String literal must end with closing quote. That's why compiler complains about missing terminating " character.
After macro expansion compiler complains again, because invalid tokens.
For example, MSVC compiler complains in other words:
error C2001: newline in constant
and after expansion it complains about missing quotes.

What can create a lexical error in C?

Besides not closing a comment /*..., what constitutes a lexical error in C?

Here are some:
"abc<EOF>
where EOF is the end of the file. In fact, EOF in the middle of many lexemes should produce errors:
0x<EOF>
I assume that using bad escapes in strings is illegal:
"ab\qcd"
Probably trouble with floating point exponents
1e+%
Arguably, you shouldn't have stuff at the end of a preprocessor directive:
#if x %

Basically anything that is not conforming to ISO C 9899/1999, Annex A.1 "Lexical Grammar" is a lexical fault if the compiler does its lexical analysis according to this grammar. Here are some examples:
"abc<EOF> // invalid string literal (from Ira Baxter's answer) (ISO C 9899/1999 6.4.4.5)
'a<EOF> // invalid char literal (6.4.4.4)
where EOF is the end of the file.
double a = 1e*3; // misguided floating point literal (6.4.4.2)
int a = 0x0g; // invalid integer hex literal (6.4.4.1)
int a = 09; // invalid octal literal (6.4.4.1)
char a = 'aa'; // too long char literal (from Joel's answer, 6.4.4.4)
double a = 0x1p1q; // invalid hexadecimal floating point constant (6.4.4.2)
// instead of q, only a float suffix, that is 'f', 'l', 'F' or 'L' is allowed.
// invalid header name (6.4.7)
#include <<a.h>
#include ""a.h"

Aren't [#$`] and other symbols like that (maybe from unicode) lexical errors in C if put anywhere outside of string or comment? They are not constituting any valid lexical sequence of that language. They cannot pass the lexer because the lexer cannot recognize them as any kind of valid token. Usually lexers are FSMs or regex based so these symbols are just unrecognized input.
For example in the following code there are several lexical errors:
int main(void){
` int a = 3;
# —
return 0;
}
We can support it by feeding this to gcc, which gives
../a.c: In function ‘main’:
../a.c:2: error: stray ‘`’ in program
../a.c:3: error: stray ‘#’ in program
../a.c:3: error: stray ‘\342’ in program
../a.c:3: error: stray ‘\200’ in program
../a.c:3: error: stray ‘\224’ in program
GCC is smart and does error-recovery so it parsed a function definition (it knows we are in 'main') but these errors definitely look like lexical errors, they are not syntax errors and rightly so. GCC's lexer doesn't have any types of tokens that can be built from these symbols. Note that it even treats a three-byte UTF-8 symbol as three unrecognized symbols.

Illegal id
int 3d = 1;
Illegal preprocessor directive
#define x 1
Unexpected token
if [0] {}
Unresolvable id
while (0) {}

Lexical errors:
An unterminated comment
Any sequence of non-comment and non-whitespace characters that is not a valid preprocessor token
Any preprocessor token that is not a valid C token; an example is 0xe-2, which looks like an expression but is in fact a syntax error according to the standard -- an odd corner case resulting from the rules for pp-tokens.

Badly formed float constant (e.g. 123.34e, or 123.45.33).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight