printf("string1""string2") is this valid C? - c

I was trying to figure out something when I wrote this by a mistake
printf("string1""string2");
To my surprise it compiled and produced a concatenated string output i.e
string1string2
Is this valid C?
I am using gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)

Yes it is. Consecutive string literals are concatenated early in the parsing of C.
6.4.5 / 4:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and wide string literal tokens are concatenated into a single multibyte character sequence. If any of the tokens are wide string literal tokens, the resulting multibyte character sequence is treated as a wide string literal; otherwise, it is treated as a character string literal.

Yes, and it can be very useful to concatenate string constants at compile-time.
#define VERSION "1.0"
#define COMPANY "Trivial Software"
printf("hello world: v. " VERSION " copyright (c) " COMPANY);
or
puts(
"blah blah blah\n"
"blah blah blah\n"
"blah blah blah\n"
"blah blah blah\n"
);

Yes, it is valid and has been part of the C language for a very long time (if not since the beginning). The concatenation is done at compile time.

As other said, yes, it is valid. I only wanted to add that it is really useful to input long strings that fill several lines. You don't have to mess with \ to indicate the string continues, and don't wanting to add a carriage return too, so you just write:
"very long string "
"that continues over here"
(watch out the spaces at the end of each string, it is a common mistake. In this case, "string" and "that" would be joint.)

Related

string splicing in c language, A stange phenomenon

Recently, I saw a C language code, the following:
printf("%s\n", "1234" "qwer");
// output: 1234qwer
snprintf(buffer, sizeof(buffer), "bvcx" "mju");
// buffer data: bvcxmju
To be honest, it's amazing for me. Before that, I didn't know that the strings can be pasted in "1234" "qwer" format. Why can it run?
then, I try this 'char a[] = "1234" "qwer"', gcc return an error!
so, can someone explain this phenomenon and explain theory?
What you saw has been part of the C language syntax for a long time. A string literal can be split in multiple parts separated only by white space, after preprocessing and comment removal. This syntax enables for example:
writing a long string literal on multiple lines:
char message[] = "This is a long message that can be split on "
"multiple lines for readability";
combining string fragments defined as macros:
printf("The value of i32 is %" PRId32 "\n", i32);
separating string contents that have a different meaning if juxtaposed:
char s1[] = "This is ESC 4: \x1B" "4";
char s2[] = "so is this: \0334 and this: \33""4";
char s3[] = "but not this: \334";
char s4[] = "nor this: \x1B4";
combining stringified macro arguments
Adjacent string literals are always concatenated into a single one as part of the translation phases. See C17 6.4.5/5:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence.
Formally, translation phase 6 happens after macro expansion but before preprocessor tokens are converted to tokens. Meaning for example that
sizeof "hello " "world" yields the result 12, equivalent to:
sizeof "hello world"
Practically, this is convenient when writing various "stringification" macros, example:
#include <stdio.h>
#define STRINGIFY(x) #x
#define STRINGIFY_CONCAT(a,b) STRINGIFY(a) " " STRINGIFY(b)
int main (void)
{
puts(STRINGIFY_CONCAT(hello,world));
}
It's also a useful feature whenever you have to use hex escape sequences and need to terminate them, since C allows them to be of variable length: puts("\xABBA") vs puts("\xAB" "BA") will give different outputs.

Why might a string literal not be a string?

I'm struggling with this part in the C standard about string literals, especially the second part of it:
"In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 80)"
"80) A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."
Source: ISO/IEC 9899:2018 (C18), §6.4.5/6, Page 51
I don't understand the explanation - "because a null character can be embedded in it by a \0 escape sequence.".
To look at the referenced section §7.1.1., regarding the definition of a "string", it is stated:
"A string is a contiguous sequence of characters terminated by and including the first null character."
Source: ISO/IEC 9899:2018 (C18), §7.1.1/1, Page 132
I've thought about that the focus maybe lays on the "can", in a way that a string literal does not have to include/embed the null character, while a string is needed to.
But then again I´m asking myself: How is one able to use a string literal as string if it has not a string-terminating null character in it, to determine the end of the string (required for string-operating functions)?
I´m totally drawing blanks at the moment.
Note: I´m aware of that a string literal is stored in read-only memory and can´t be modified and a string is a generic term for a sequence of characters terminated by NUL, which can or can not be mutable.
Thus, my question is not: "What is the difference between a string and a string literal?"
My Question is:
Why/How can a string-literal not be a string?
and, according to my concerns, so far:
Is it true, that a string literal can have the NUL byte omitted?
I wanted to ask this question myself but short before posting it, I got the clue. My confusion was made because of the little misplaced wording inside of the quote.
But I decided to not delete the question´s draft as it could be useful for future readers and provide a Q&A instead.
Feel free to comment and hint.
Related stuff:
What is the difference between char s[] and char *s?
What is the type of string literals in C and C++?
Are string literals const?
"Life-time" of a string literal in C
You're overthinking it.
"A string is a contiguous sequence of characters terminated by and including the first null character."
Source: ISO/IEC 9899:2018 (C18), §7.1.1/1, Page 132
says that a "string" only extends up to the first null character. Characters that may exist after the null are not part of the string. However
"80) A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."
makes it clear a string literal may contain an embedded null. If it does, the string literal AS A WHOLE is not a string -- the string is just the prefix of the string literal up to the first null
Let´s take a look at the definition of the term "string literal" at the same section in C18, §6.5.1/3:
"A character string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in "xyz"."
According to that, a string literal is only consisted of the characters enclosed in quotation marks, the bare string content. It does not have an appended \0. The NUL byte is appended later at translation, as said at §6.5.1/6:
"In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. 80)"
Let´s make an example:
"foo" is a string literal, but not a string because "foo" does not contain an embedded null character.
"foo\0" is a string literal and a string because the literal itself contains a null character at the end of the character sequence.
Note that you don´t need to explicitly insert the null character at the end of a string literal to change it to a string. As already said, it is implicitly appended during the program translation.
Means,
const char *s = "foo";
is equal to
const char *s = "foo\0";
I admit, that the sentence of:
"A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."
is a little confusing and illogical in the context. It would be better phrased:
"A string literal might not be a string (see 7.1.1), because a null character might not (OR is not required to) be embedded in it by a \0 escape sequence."
or alternatively:
"A string literal might not be a string (see 7.1.1), because a null character can be embedded in it by a \0 escape sequence."
As #EricPostpischil pointed in his comment, the meaning of the footnote is probably quite different.
It means that if the string literal contains a null character inside of it, but not at the end, as it is required for a string, the string literal is not equivalent to a string.
F.e.:
The string literal
"foo\0bar"
is not a string, as it contains the first null character embedded inside of the string literal, but not at the end of it.

Separating hexadecimal escape sequences in strings

Can a string constant like "foo" "\x01" "bar" be written as a single string literal (while keeping the hexadecimal notation)? With "foo\x01bar" the escape sequence seems to be interpreted as \x01ba since I get the warning "hex escape sequence out of range."
"foo" "\x01" "bar" is a string literal.
The C standard states that a hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence. Without the explicit concatenation (which is the common workaround to this problem), the compiler parses \x01ba which is obviously out of range.
How about "foo\x01\142ar"? Is that cheating?
Another solution is to simply write the escaped character in octal, instead of hexadecimal
"foo\1bar"
and no more ambiguity...

Writing printf(#"") with multiple line string

Is it possible to write something like this:
printf(#"
-
-
-
-
");
I can do it in C#, but can't in C. It gives me an error in CodeBlocks. Am I allowed to do such ?
Error message: error: stray '#' in program.
No. That syntax doesn't exist in C.
If you want a multiple-line string, write it as multiple double-quoted strings with no other tokens in between them. They will be combined.
printf(
"some string"
"more of the string"
"even more of the string"
);
(You will, of course, need to add a \n at the end of each line if that's what you want.)
No that's not a syntax that C understands, C doesn't have raw literals.
You can use \ as the last character to continue on the next line:
const char *str = "hello\n\
world";
Also, consecutive string literals will be concatenated. So you can do e.g.
const char *str = "Hello\n"
"world\n";
C#'s verbatim strings are not available in C. If you have some characters to escape, like " or \, escape them with '\', there is no there option in this language.
If you want to embed multiple lines in a string literal, you can either insert \n at the appropriate location in your string, or escape the return character as well:
printf("Here's\
a multiline\
string litteral");
Line continuation with \ at the end of the line.
printf("\
\
-\
-\
-\
-\
");
String literals in C may not contain newlines. You have two workarounds:
Use implicit string concatenation (done by the compiler).
printf("The quick brown"
" fox jumps over"
" the sleazy dog.");
Escape the newline by placing a backslash in front of it.
printf("The quick brown\
fox jumps over\
the sleazy dog.");
Personally, I prefer the first form since the second looks ugly (my opinion) and forces you to ruin your code indentation.
In either case, the string will simply not contain the newlines. So if you really meant for them to be there, you'll have to add them via \n.

Using printf without having to escape double quotes?

In C it is not normally possible to use ' for printf of a string. However, I have text which are full of double quote ", and I need to escape all of them as
printf("This is \"test\" for another \"text\"");
Is it possible to printf in a way without escaping ". I mean using another character for wrapping the string.
Not recommended, but you can use a macro:
#include <stdio.h>
#define S(x) #x
int main() {
printf(S(This "is" a string (with nesting)!\n));
}
This prints
This "is" a string (with nesting)!
Now the delimiters are balanced () characters. However, to escape single ), ", or ' characters, you have to write something like S(right paren: ) ")" S(!\n), which is quite ugly. This technique is not recommended for writing maintainable code.
No, that is not possible in the C language. There is only one syntax for string literals, and that is that they are delimited by double quotes.
The only way to write unescaped quotation marks is as character literals inside character arrays, which is uglier and more difficult to write, so there's very little reason to do so in a case like this:
char array[] = {'T', 'h', 'i', 's', ' ', 'i', 's', ' ', '"'}; // etc.
printf("%s", array);
No there is not other way, the draft C99 standard in section 6.4.5 String literals has the following grammar:
string-literal:
" s-char-sequenceopt "
L" s-char-sequenceopt "
No, it's not possible in standard C.
C11 6.4.5 String literals
The same considerations apply to each element of the sequence in a string literal as if it
were in an integer character constant (for a character or UTF−8 string literal) or a wide
character constant (for a wide string literal), except that the single-quote ' is representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".
First of all, separate a program's requirements from the solutions to meet those requirements. Given the minimum amount of info. in this question, the requirement is to print, using C, a string that has double quotes. There are several ways to do this in C.
For example, the following code fragment:
char string[] = "This string \" has one double quote.";
printf("This string %cprints%c with %cdouble%c quotes", '"', '"', '"', '"');
printf("%s", string);
produces:
This string "prints" with "double" quotes.
This string " has one double quote.
Your application might have more requirements that you have not mentioned, but it should be possible to achieve what you want, just NOT the way you initially believe it should be done (welcome to the world of "Needs Analysis").
//R "delimiter( raw_characters )delimiter"
printf(R"SOME/\STRING");
Raw string will terminate after the first )" it sees.
Therefore, if )" is in the string, you have to add delimiter ("a" is used below)
/* Print dog without any escape. */
printf(R"a(|\_/|
|q p| /}
( 0 )"""\
|"^"` |
||_/=\\__|)a");
}
It's C++11 feature and you can find more information in
document
Simliar question have been answered escape R"()" in a raw string in C++

Resources