I was working with macros and wrote one like this:
#define STR(name) #name
I meant STR() to stringise whatever that was given to it as argument and it seemed to be working.
printf( STR(Hello) )
gave the output as expected:
Hello
So did
printf( STR(Hello world) );
printf( STR(String) STR(ise) );
which gave
Hello world
Stringise
But when I tried to use STR() to print only a space, it just didn’t work.
printf( STR(Hello) STR( ) STR(World) ); //There’s a space between the parenthesis of the second STR
Gave the output:
HelloWorld
Here the STR( ) is ignored.
Why is this? Is there a way around it using while still sticking to macros with only a space as argument?
I was just wondering if this was possible.
It is not possible for the stringification to result into a single space. The semantics of the # operator are detailed in C11 6.10.3.2p2:
If, in the replacement list, a parameter is immediately preceded by a # preprocessing token, both are replaced by a single character string literal preprocessing token that contains the spelling of the preprocessing token sequence for the corresponding argument. Each occurrence of white space between the argument's preprocessing tokens becomes a single space character in the character string literal. White space before the first preprocessing token and after the last preprocessing token composing the argument is deleted. [...] The character string literal corresponding to an empty argument is "". [...].
Thus, as the space is not a preprocessing token, and leading and trailing space is deleted, it is impossible for the stringification operator to create a resulting string literal that just contains a single space. As you've noticed, STR( ) would pass an empty argument to the macro, and this would be stringified into ""; likewise
STR( Hello
World
)
would be expanded into "Hello World"; i.e. each occurrence of white space would become a single space character, and the preceding and trailing whitespace would be deleted.
However, while it is not possible to stringify a single space, it is possible to achieve the required output. The preprocessor concatenates consecutive string literal tokens into one, so "Hello" " " "World" would be converted to `"Hello world"; therefore
printf(STR(Hello) " " STR(World));
would after macro expansion be expanded to
printf("Hello" " " "World");
and thereafter to
printf("Hello World");
Related
(I am very new to C.)
Visual newlines seem to be unimportant in C.
For instance:
int i; int j;
is same as
int i;
int j;
and
int k = 0 ;
is same as
int
k
=
0
;
so why is
"hello
hello"
not the same as
"hello hello"
It is because a line that contains a starting quote character and not an ending quote character was more likely a typing mistake or other error than an attempt to write a string across multiple lines, so the decision was made that string literals would not span source lines, unless deliberately indicated with \ at the end of a line.
Further, when such an error occurs, the compile would be faced with reading possibly thousands of lines of code before determining there was no closing quote character (end of file was reached) or finding what was intended as an opening quote character for some other string literal and then attempting to parse the contents of that string literal as C code. In addition to burdening early compilers with limited compute resources, this could result in confusing error messages in a part of the source code far removed from the missing quote character.
This choice is effected in C 2018 6.4.5 1, which says that a string literal is " s-char-sequenceopt ", where s-char-sequence is any member of the character set except the quote character, a backslash, or a new-line character (and a string literal may also have an encoding prefix, a u8, u, U, or L before the first ").
Strings can be continued over newlines by putting a backslash immediately before the newline:
"hello \
hello"
Or (better), using string concatenation:
"hello "
"hello"
Note that the space has been carefully preserved so that these are equivalent to "hello hello" except for the line numbering in the file after the appearance.
The backslash-newline line elimination is done very early in the translation process — in phase 2 of the conceptual translation phases.
Note that there is no stripping leading blanks or anything. If you write:
printf("Some long string with maybe an integer %d in it\
and some more data on the next line\n", i);
Then the string has a sequence of (at least) 8 blanks in it between in it and and some. The count of 8 assumes that the printf() statement is aligned in the left margin; if it is indented, you'd need to add the extra white space corresponding to the indentation.
1- using double quotes for each string :
char *str = "hello "
"hello" ;
** One problem with this approach is that we need to escape specially characters such as quotation mark " itself.
2- Using - \ :
char *str = "hello \
hello" ;
** This form is a lot easier to write, we don't need to write quotation marks for each line.
We can think of a C program as a series of tokens: groups of characters that can't be split up without changing their meaning. Identifiers and keywords are tokens. So are operators like + and -, punctuation marks such as the comma
and semicolon, and string literals.
For example, the line
int i; int j;
consists of 6 tokens: int, i, ;, int, j and ;. Most of the time, and particularly in this case, the amount of space (space, tab and newline characters) is not critical. That's why the compiler will treat
int i
;int
j;
The same.
Writing
"Hello
Hello"
Is like writing
un signed
and hope that the compiler treat it as
unsigned
Just like space is not allowed between a keyword, newline character is not allowed in a string literal token. But it can be included using the newline escape '\n' when needed.
To write strings across lines use string concatenation method
"Hello"
"Hello"
Although the above method is recommended, you can also use a backslash
"Hello \
Hello"
With the backslash method, beware of the beginning space in a new line. The string will include everything in that line until it finds a closing quote or another backslash.
As far my knowledge goes in C, C pre-processors replace the literals as it is in #define. But now, I am seeing that, it gives spaces before and after.
Is my explanation correct or am I doing something which should give some undefined behaviors?
Consider the following C code:
#include <stdio.h>
#define k +-6+-
#define kk xx+k-x
int main()
{
int x = 1029, xx = 4,t;
printf("x=%d,xx=%d\n",x,xx);
t=(35*kk*2)*4;
printf("t=%d,x=%d,xx=%d\n",t,x,xx);
return 0;
}
The initial values are: x = 1029, xx = 4. Lets calculate the value of t now.
t = (35*kk*2)*4;
t = (35*xx+k-x*2)*4; // replacing the literal kk
t = (35*xx++-6+--x*2)*4; // replacing the literal k
Now, the value of xx = 4 which would be increased by one just in the next statement and x is decremented by one and became 1028. So, the calculation of the current statement:
t = (35*4-6+1028*2)*4;
t = (140-6+2056)*4;
t = 2190*4;
t = 8760;
But the output of the above code is:
x=1029,xx=4
t=8768,x=1029,xx=4
From the second line of the output, it is clear that increments and decrements are not taken place.
That means after replacing k and kk, it is becoming:
t = (35*xx+ +-6+- -x*2)*4;
(If it is, then the calculation is clear.)
My concerning point: is it the standard of C or just an undefined behavior? Or am I doing something wrong?
The C standard specifies that the source file is analyzed and parsed into preprocessor tokens. When macro replacement occurs, a macro that is replaced is replaced with those tokens. The replacement is not literal text replacement.
C 2018 5.1.1.2 specifies translation phases (rephrasing and summarizing, not exact quotes):
Physical source file multibyte characters are mapped to the source character set. Trigraph sequences are replaced by single-character representations.
Lines continued with backslashes are merged.
The source file is converted from characters into preprocessing tokens and white-space characters—each sequence of characters that can be a preprocessing token is converted to a preprocessing token, and each comment becomes one space.
Preprocessing is performed (directives are executed and macros are expanded).
Source characters in character constants and string literals are converted to members of the execution character set.
Adjacent string literals are concatenated.
White-space characters are discarded. “Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.” (That quoted text is the main part of C compilation as we think of it!)
The program is linked to become an executable file.
So, in phase 3, the compiler recognizes that #define kk xx+k-x consists of the tokens #, define, kk, xx, +, k, -, and x. The compiler also knows there is white space between define and kk and between kk and xx, but this white space is not itself a preprocessor token.
In phase 4, when the compiler replaces kk in the source, it is doing so with these tokens. kk gets replaced by the tokens xx, +, k, -, and x, and k is replaced by the tokens +, -, 6, +, and -. Combined, those form xx, +, +, -, 6, +, -, -, -, and x.
The tokens remain that way. They are not reanalyzed to put + and + together to form ++.
As #EricPostpischil says in a comprehensive answer, the C pre-processor works on tokens, not character strings, and once the input is tokenised, whitespace is no longer needed to separate adjacent tokens.
If you ask a C preprocessor to print out the processed program text, it will probably add whitespace characters where needed to separate the tokens. But that's just for your convenience; the whitespace might or might not be present, and it makes almost no difference because it has no semantic value and will be discarded before the token sequence is handed over to the compiler.
But there is a brief moment during preprocessing when you can see some whitespace, or at least an indication as to whether there was whitespace inside a token sequence, if you can pass the token sequence as an argument to a function-like macro.
Most of the time, the preprocessor does not modify tokens. The tokens it receives are what it outputs, although not necessarily in the same order and not necessarily all of them. But there are two exceptions, involving the two preprocessor operators # (stringify) and ## (token concatenation). The first of these transforms a macro argument -- a possibly empty sequence of tokens -- into a string literal, and when it does so it needs to consider the presence or absence of whitespace in the token sequence.
(The token concatenation operator combines two tokens into a single token if possible; when it does so, intervening whitespace is ignored. That operator is not relevant here.)
The C standard actually specifies precisely how whitespace in a macro argument is handled if the argument is stringified, in paragraph 2 of §6.10.3.2:
Each occurrence of white space between the argument’s preprocessing tokens
becomes a single space character in the character string literal. White space before the first preprocessing token and after the last preprocessing token composing the argument is deleted.
We can see this effect in action:
/* I is just used to eliminate whitespace between two macro invocations.
* The indirection of `STRING/STRING_` is explained in many SO answers;
* it's necessary in order that the stringify operator apply to the expanded
* macro argument, rather than the literal argument.
*/
#define I(x) x
#define STRING_(x) #x
#define STRING(x) STRING_(x)
#define PLUS +
int main(void) {
printf("%s\n", STRING(I(PLUS)I(PLUS)));
printf("%s\n", STRING(I(PLUS) I(PLUS)));
}
The output of this program is:
++
+ +
showing that the whitespace in the second invocation was preserved.
Contrast the above with gcc's -E output for ordinary use of the macro:
int main(void) {
(void) I(PLUS)I(PLUS)3;
(void) I(PLUS) I(PLUS)3;
}
The macro expansion is
int main(void) {
(void) + +3;
(void) + +3;
}
showing that the preprocessor was forced to insert a cosmetic space into the first expansion, in order to preserve the semantics of the macro expansion. (Again, I emphasize that the -E output is not what the preprocessor module passes to the compiler, in normal GCC operation. Internally, it passes a token sequence. All of the whitespace in the -E output above is a courtesy which makes the generated file more useful.)
Given the following code that prints a string which is a stringification of two words:
#define PORT_INFO_MAC_2(portNum) port: portNum
#define PORT_INFO_MAC(portNum) PORT_INFO_MAC_2(portNum)
/* Stringify macro expansion instead of the macro itself */
#define INVOKE_MACRO(...) #__VA_ARGS__
printf(" %s " , INVOKE_MACRO(PORT_INFO_MAC(1)) ); /* In a more general way, I'll be using it like follows: INVOKE_MACRO(PORT_INFO_MAC(2), PORT_INFO_MAC(1), ...) */
The output is always " port: 1 " with a single space between the "port" and the "1". Why is there always a single space there and is there a way to control the amount of spaces?
changing the amount of spaces in PORT_INFO_MAC_2 macro between port and portNum doesn't change the output space amount.
EDIT
It seems that there are two cases, at the fist case the port and portNum are closest- PORT_INFO_MAC_2(portNum) port:portNum which then no space exist in the output between them. At the second case, in which any number of spaces exist in the macro between them, the amount of spaces in the output is always 1.
Is there any formal explanation for that? Is there any control over that?
Why is there always a single space there and is there a way to control the amount of spaces?
Because that's what the stringification operator is specified to do:
If, in the replacement list, a parameter is immediately preceded by a # preprocessing token, both are replaced by a single character string literal preprocessing token that contains the spelling of the preprocessing token sequence for the corresponding argument. Each occurrence of white space between the argument’s preprocessing tokens becomes a single space character in the character string literal.
(C2011 6.10.3.2/2; emphasis added)
Of course, if there is no whitespace at all between the preprocessing tokens, then none appears in the stringification.
To recap, the phases 5-7 are described in the standard:
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member
of the execution character set; if there is no corresponding member,
it is converted to an implementation- defined member other than the
null (wide) character. 7)
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting
tokens are syntactically and semantically analyzed and translated as a
translation unit.
Now I agree that whites-space characters are no longer significant at phase 7, but couldn't one get rid of them already after phase 4? Is there an example where this would make a difference?
Of course it should be realized that removing white-space characters separating tokens doesn't work at this stage as the data after phase 4 consists of preprocessing tokens. The idea is to get rid of spaces separating preprocessing tokens at an earlier stage.
Consider this source code
char* str = "some text" " with spaces";
In phase 5 this is converted to these tokens (one token per line):
char
*
str
=
"some text"
" with spaces"
Here matter the spaces in "some text" and " with spaces".
Afterwards all spacees between tokens (see above) are ignored.
If you remove whitespace before step 5 you get other string literals like "sometext"
Basically, I can't figure this out, I want my C program to store the entire plaintext of a batch program then insert in a file and then run.
I finished my program, but holding the contents is my problem. How do I insert the code in a string and make it ignore ALL special characters like %s \ etc?
You have to escape special characters with a \, you can escape backslash itself with another backslash (i.e. \\).
As Ian previously mentioned, you can escape characters that aren't allowed in normal C strings with \; for instance, newline becomes \n, double-quote becomes \", and backslash becomes \\.
If you're unable or unwilling to do this for whatever reason, then you may be out of luck if you're solution must be in C. However, if you're willing to switch to C++, then you can use raw strings:
const char* s1 = R"foo(
Hello
World
)foo";
This is equivalent to
const char* s2 = "\nHello\nWorld\n";
A raw string must begin with R" followed by an arbitrary delimiter (made of any source character but parentheses, backslash and spaces; can be empty; and at most 16 characters long), then (, and must end with ) followed by the delimiter and ". The delimiter must be chosen such that the termination substring (), delimiter, ") does not appear within the string.