string splicing in c language, A stange phenomenon - c

Recently, I saw a C language code, the following:
printf("%s\n", "1234" "qwer");
// output: 1234qwer
snprintf(buffer, sizeof(buffer), "bvcx" "mju");
// buffer data: bvcxmju
To be honest, it's amazing for me. Before that, I didn't know that the strings can be pasted in "1234" "qwer" format. Why can it run?
then, I try this 'char a[] = "1234" "qwer"', gcc return an error!
so, can someone explain this phenomenon and explain theory?

What you saw has been part of the C language syntax for a long time. A string literal can be split in multiple parts separated only by white space, after preprocessing and comment removal. This syntax enables for example:
writing a long string literal on multiple lines:
char message[] = "This is a long message that can be split on "
"multiple lines for readability";
combining string fragments defined as macros:
printf("The value of i32 is %" PRId32 "\n", i32);
separating string contents that have a different meaning if juxtaposed:
char s1[] = "This is ESC 4: \x1B" "4";
char s2[] = "so is this: \0334 and this: \33""4";
char s3[] = "but not this: \334";
char s4[] = "nor this: \x1B4";
combining stringified macro arguments

Adjacent string literals are always concatenated into a single one as part of the translation phases. See C17 6.4.5/5:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence.
Formally, translation phase 6 happens after macro expansion but before preprocessor tokens are converted to tokens. Meaning for example that
sizeof "hello " "world" yields the result 12, equivalent to:
sizeof "hello world"
Practically, this is convenient when writing various "stringification" macros, example:
#include <stdio.h>
#define STRINGIFY(x) #x
#define STRINGIFY_CONCAT(a,b) STRINGIFY(a) " " STRINGIFY(b)
int main (void)
{
puts(STRINGIFY_CONCAT(hello,world));
}
It's also a useful feature whenever you have to use hex escape sequences and need to terminate them, since C allows them to be of variable length: puts("\xABBA") vs puts("\xAB" "BA") will give different outputs.

Related

Why does C not recognize strings across multiple lines?

(I am very new to C.)
Visual newlines seem to be unimportant in C.
For instance:
int i; int j;
is same as
int i;
int j;
and
int k = 0 ;
is same as
int
k
=
0
;
so why is
"hello
hello"
not the same as
"hello hello"
It is because a line that contains a starting quote character and not an ending quote character was more likely a typing mistake or other error than an attempt to write a string across multiple lines, so the decision was made that string literals would not span source lines, unless deliberately indicated with \ at the end of a line.
Further, when such an error occurs, the compile would be faced with reading possibly thousands of lines of code before determining there was no closing quote character (end of file was reached) or finding what was intended as an opening quote character for some other string literal and then attempting to parse the contents of that string literal as C code. In addition to burdening early compilers with limited compute resources, this could result in confusing error messages in a part of the source code far removed from the missing quote character.
This choice is effected in C 2018 6.4.5 1, which says that a string literal is " s-char-sequenceopt ", where s-char-sequence is any member of the character set except the quote character, a backslash, or a new-line character (and a string literal may also have an encoding prefix, a u8, u, U, or L before the first ").
Strings can be continued over newlines by putting a backslash immediately before the newline:
"hello \
hello"
Or (better), using string concatenation:
"hello "
"hello"
Note that the space has been carefully preserved so that these are equivalent to "hello hello" except for the line numbering in the file after the appearance.
The backslash-newline line elimination is done very early in the translation process — in phase 2 of the conceptual translation phases.
Note that there is no stripping leading blanks or anything. If you write:
printf("Some long string with maybe an integer %d in it\
and some more data on the next line\n", i);
Then the string has a sequence of (at least) 8 blanks in it between in it and and some. The count of 8 assumes that the printf() statement is aligned in the left margin; if it is indented, you'd need to add the extra white space corresponding to the indentation.
1- using double quotes for each string :
char *str = "hello "
"hello" ;
** One problem with this approach is that we need to escape specially characters such as quotation mark " itself.
2- Using - \ :
char *str = "hello \
hello" ;
** This form is a lot easier to write, we don't need to write quotation marks for each line.
We can think of a C program as a series of tokens: groups of characters that can't be split up without changing their meaning. Identifiers and keywords are tokens. So are operators like + and -, punctuation marks such as the comma
and semicolon, and string literals.
For example, the line
int i; int j;
consists of 6 tokens: int, i, ;, int, j and ;. Most of the time, and particularly in this case, the amount of space (space, tab and newline characters) is not critical. That's why the compiler will treat
int i
;int
j;
The same.
Writing
"Hello
Hello"
Is like writing
un signed
and hope that the compiler treat it as
unsigned
Just like space is not allowed between a keyword, newline character is not allowed in a string literal token. But it can be included using the newline escape '\n' when needed.
To write strings across lines use string concatenation method
"Hello"
"Hello"
Although the above method is recommended, you can also use a backslash
"Hello \
Hello"
With the backslash method, beware of the beginning space in a new line. The string will include everything in that line until it finds a closing quote or another backslash.

partial C String initialisation [duplicate]

This question already has answers here:
How does concatenation of two string literals work?
(4 answers)
Closed 1 year ago.
In some code I today read, a type of C-String initialisation existed which is new to me.
It chains multiple String-Initialisation like "A""B""C"...
It also allows splinting the String Initialisation to multiple Lines
I set up a small Hello World demo, so you can see what I am talking about:
#include <stdio.h>
#define SPACE " "
#define EXCLAMATION_MARK "!"
#define HW "Hello"SPACE"World"EXCLAMATION_MARK
int main()
{
char hw_str[] =
"Hello"
SPACE
"World"
"!";
printf("%s\n",hw_str);
printf("%s\n",HW);
return 0;
}
So here are some questions:
is this valid according to the standard?
why this works? "abc" is like a array {'a','b','c'} right?, so why are array initialisations concatenated over multiple pairs of "" working?
has this feature an official name - like when you enter it in google, you find some documentation describing it?
is this portable?
From the C Standard (5.1.1.2 Translation phases)
1 The precedence among the syntax rules of translation is specified by
the following phases.
Adjacent string literal tokens are concatenated
So for example this part of the program
char hw_str[] =
"Hello"
SPACE
"World"
"!";
that after macro substitutions looks like
char hw_str[] =
"Hello"
" "
"World"
"!";
is processed by the preprocessor in the sixth phase by concatenating adjacent string literals and you have
char hw_str[] =
"Hello World!";

Error in macro expansion

I have been trying to understand macro expansion and found out that the second printf gives out an error. I am expecting the second print statement to generate the same output as the first one. I know there are functions to do string concatenation. I am finding it difficult to understand why first print statement works and the second doesn't.
#define CAT(str1, str2) str1 str2
void main()
{
char *string_1 = "s1", *string_2 = "s2";
printf(CAT("s1", "s2"));
printf(CAT(string_1, string_2));
}
Concatenating string literals, like "s1" "s2", is part of the language specification. Just placing two variables next to each other, like string_1 string_2 is not part of the language.
If you want to concatenate two string variables, consider using strcat instead, but remember to allocate enough space for the destination string.
Try to do the preprocessing "by hand":
CAT is supposed to take 2 input variables, and print them one after the other, with a space between. So... if we preprocess your code, it becomes:
void main()
{
char *string_1 = "s1", *string_2 = "s2";
printf("s1" "s2");
printf(string_1 string_2);
}
While "s1" "s2" is automatically concatenated to "s1s2" by the compiler, string_1 string_2 is invalid syntax.

How to store a paragraph in C source code?

I'm a perl programmer and surprised to find c language has no a convenient way to store a paragraph, like:
my $a = <<'dd';
hello wolrd..
1
2
3
dd
So how do I the smiliary operation in C?
It's done like this:
char a[] =
"hello world\n"
" 1\n"
" 2\n"
" 3\n";
You've two different answers to your question which provide many of the same functionalities.
Both
char *str="Line 1\nLine 2\nLine 3";
and
char str[]="Line 1\nLine 2\nLine 3";
Allow you to print a paragraph as such:
printf("%s",str);
However, the first declaration (char *str) creates a string in what is generally read-only memory, whereas the second allows the string to be edited during run-time. This delineation is important, but not always clear. See this question for a few more details.
The character \n is the line feed character, and you should check to ensure that it behaves the way you expect it to on your target platform. For instance, on DOS you may need to use `"\r\n", which is carriage return + line feed. Wiki has an article about this.
Another difference in these forms, as one commenter pointed out, is that *str works as a pointer whereas str[] does not. They often, but not always, have the same behavior; this question has more information regarding this.
As some commenters have pointed out, there is a limit on the length of string literals in some compilers. MSVC has a limit of 2048 characters (see here) whereas GCC has no limit, by some accounts. A length of at least 509 one-byte characters is guaranteed by C90; this was increased to 4095 in C99.
Regardless, if you want to avoid this length limit or you want to organize the text in a prettier way, you can use this format (note that newlines and quotes must be used explicitly, the compiler treats the adjacent strings as being concatenated):
char *str =
"Line 1\n"
"Line 2\n"
"Line 3\n";
or this (the backslashes at the end of the line escape the newline you've inserted for the formatting, if you indent your code here, that will become part of the string):
char *str =
"Line 1 \
Line 2 \
Line 3";
you may try
char *str = "hello world\n 1\n 2\n 3\n";

printf("string1""string2") is this valid C?

I was trying to figure out something when I wrote this by a mistake
printf("string1""string2");
To my surprise it compiled and produced a concatenated string output i.e
string1string2
Is this valid C?
I am using gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
Yes it is. Consecutive string literals are concatenated early in the parsing of C.
6.4.5 / 4:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and wide string literal tokens are concatenated into a single multibyte character sequence. If any of the tokens are wide string literal tokens, the resulting multibyte character sequence is treated as a wide string literal; otherwise, it is treated as a character string literal.
Yes, and it can be very useful to concatenate string constants at compile-time.
#define VERSION "1.0"
#define COMPANY "Trivial Software"
printf("hello world: v. " VERSION " copyright (c) " COMPANY);
or
puts(
"blah blah blah\n"
"blah blah blah\n"
"blah blah blah\n"
"blah blah blah\n"
);
Yes, it is valid and has been part of the C language for a very long time (if not since the beginning). The concatenation is done at compile time.
As other said, yes, it is valid. I only wanted to add that it is really useful to input long strings that fill several lines. You don't have to mess with \ to indicate the string continues, and don't wanting to add a carriage return too, so you just write:
"very long string "
"that continues over here"
(watch out the spaces at the end of each string, it is a common mistake. In this case, "string" and "that" would be joint.)

Resources