Why a comma can be missed in a printf() call? - c

I am used to C# or Java.
How could the following statement be correct in C?
printf("aaa" "bbb");
On my Xubuntu 15.04 with GCC 4.9. It outputs:
aaabbb
And as I tried, below works too!
CHAR *p = "aaa""bbb""ccc";
printf(p);
It outputs:
aaabbbccc
I think there should be a comma but in that way, the first string will be treated as a format string. So, is this syntax legal?

Yes it is legal syntax because of translation phase 6 in ISO C99, #5.1.1.2 Translation phases:
Adjacent string literal tokens are concatenated.

As mentioned adjacent strings are concatenated by the compiler. But if you want to see some difference you may add a \0 null terminator in your strings.
On adding the aaa\0 your o/p will be just aaa as printf will print till it finds the 1st \0 null terminator.
#include<stdio.h>
int main()
{
printf("aaa\0" "bbb");
}
Output
aaa

The two strings are just concatenated by the compiler.

When the compiler see two consecutive string literals, it concatenate them (at parsing time in the compiler), like you observe. This won't work (compiler syntax error) for non literals.
The comma operator is unrelated to concatenation. It evaluates first the left operand, then the right one, and discards the result of the left, giving the right result. It is useful for side effects (like progn in Lisp, ; in Ocaml, begin in Scheme). Of course, the comma is also used to separate arguments in calls.

As #Jens said, adjacent string literals are concatenated by the compiler.
One reason for this is so that you can do preprocessor things like this:
#include <stdio.h>
#if defined(__linux__)
#define MY_OS "linux"
#elif defined(_WIN32)
#define MY_OS "windows"
#else
#define MY_OS "probably something BSD-derived"
#endif
int main(void){
printf("my os is " MY_OS "\n");
}
Which saves everybody a lot of time.

Related

Macro expands correctly, but gives me "expected expression" error

I've made a trivial reduction of my issue:
#define STR_BEG "
#define STR_END "
int main()
{
char * s = STR_BEG abc STR_END;
printf("%s\n", s);
}
When compiling this, I get the following error:
static2.c:12:16: error: expected expression
char * s = STR_BEG abc STR_END;
^
static2.c:7:17: note: expanded from macro 'STR_BEG'
#define STR_BEG "
Now, if I just run the preprocessor, gcc -E myfile.c, I get:
int main()
{
char * s = " abc ";
printf("%s\n", s);
}
Which is exactly what I wanted, and perfectly legal resultant code. So what's the deal?
The macro isn't really expanding "correctly", because this isn't a valid C Preprocessor program. As Kerrek says, the preprocessor doesn't quite work on arbitrary character sequences - it works on whole tokens. Tokens are punctuation characters, identifiers, numbers, strings, etc. of the same form (more or less) as the ones that form valid C code. Those defines do not describe valid strings - they open them, and fail to close them before the end of the line. So an invalid token sequence is being passed to the preprocessor. The fact it manages to produce output from an invalid program is arguably handy, but it doesn't make it correct and it almost certainly guarantees garbage output from the preprocessor at best. You need to terminate your strings for them to form whole tokens - right now they form garbage input.
To actually wrap a token, or token sequence, in quotes, use the stringification operator #:
#define STRFY(A) #A
STRFY(abc) // -> "abc"
GCC and similar compilers will warn you about mistakes like this if you compile or preprocess with the -Wall flag enabled.
(I assume you only get errors when you try to compile as C, but not when you do it in two passes, because internally to the compiler, it retains the information that these are "broken" tokens, which is lost if you write out an intermediate file and then compile the preprocessed source in a second pass... if so, this is an implementation detail, don't rely on it.)
One possible solution to your actual problem might look like this:
#define LPR (
#define start STRFY LPR
#define end )
#define STRFY(A) #A
#define ID(...) __VA_ARGS__
ID(
char * s = start()()()end; // -> char * s = "()()()";
)
The ID wrapper is necessary, though. There's no way to do it without that (it can go around any number of lines, or even your whole program, but it must exist for reasons that are well-covered in other questions).

Different behavior in visual c++ versus gcc/clang while stringifying parameter which contains comma

I'm using stringizing operator to convert parameter which may contains comma passed to a macro into string. As I know, some characters cannot be stringified – notably, the comma(,) because it is used to delimit parameters and the right parenthesis()) because it marks the end of the parameter. So I use a variadic macro to pass commas to the stringizing operator like this:
#include <stdio.h>
#define TEST 10, 20
#define MAKE_STRING(...) #__VA_ARGS__
#define STRING(x) MAKE_STRING(x)
int main()
{
printf("%s\n", STRING(TEST) );
return 0;
}
it works fine. But it occurs to me what would happen without variadic macro, so I modify the macro: #define MAKE_STRING(x) #x. It compiles fine unexpectedly in visual c++ 2008/2010, and output 10, 20 while gcc/clang give the compilation error as expected:
macro "MAKE_STRING" passed 2 arguments, but takes just 1
So my question: is the Visual c++ doing additional work or the behavior is undefined?
VS in general allows extra parameters in macros and then just drops them silently:
STRING(10, 20, 30) - still works and prints 10. This is not the case here, but it pretty much means VS don't even have the error gcc threw at you.
It's not any additional work but "merely" a difference in substitution order.
I am not sure if this will answer your question but i hope this will help you solving your problem. When defining a string constant in C, you should include it in double quotes (for spaces). Also, the # macro wrap the variable name inside double quotes so, for example, #a become "a".
#include <stdio.h>
#define TEST "hello, world"
#define MAKE_STRING(x) #x
int main()
{
int a;
printf("%s\n", TEST);
printf("%s\n", MAKE_STRING(a));
return 0;
}
I compiled this code using gcc 4.7.1 and the output is:
hello, world
a
I dunno why this has upvotes, or an answer got downvoted (so the poster deleted it) but I don't know what you expect!
#__VA_ARGS__ makes no sense, suppose I have MACRO(a,b,c) do you want "a,b,c" as the string?
http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html#Variadic-Macros
Read, that became standard behaviour, variable length arguments in macros allow what they do in variable length arguments to functions. The pre-processor operates on text!
The only special case involving # is ##, which deletes a comma before the ## if there are no extra arguments (thus preventing a syntax error)
NOTE:
It is really important you read the MACRO(a,b,c) part and what do you expect, a string "a,b,c"? or "a, b, c" if you want the string "a, b, c" WRITE THE STRING "a, b, c"
Using the # operator is great for stuff like
#define REGISTER_THING(THING) core_of_program.register_thing(THING); printf("%s registered\n",#THING);

Isn't there a syntax error? Should printf("one" ", two and " "%s.\n", "three" ); be valid code?

Take a look at this code:
#include <stdio.h>
#define _ONE "one"
#define _TWO_AND ", two and "
int main()
{
const char THREE[6] = "three" ;
printf(_ONE _TWO_AND "%s.\n", THREE );
return 0;
}
The printf is effectively:
printf("one" ", two and " "%s.\n", "three" );
and the output is:
one, two and three.
gcc gives neither error nor warning messages after compiling this code.
Is the gcc compiler supposed work in that way, or is it a bug?
This is standard behavior, adjacent string literals are concatenated together if we look at the C99 draft standard section 5.1.1.2 Translation phases paragraph 6 says:
Adjacent string literal tokens are concatenated
gcc does have many non-standard extensions, but if you build using -pedantic then gcc should warn you if it is doing something non-standard, you can read more in the documents section Extensions to the C Language Family.
The rationale is covered in the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
You had not got any error because there was/is Nothing wrong.
Two string "A""B" are concatenated. This is convention of language C.
Try gcc -E to display the preprocessed source code. You will have something like this:
int main()
{
const char THREE[6] = "three";
printf("one" ", two and" "%s.\n", THREE );
return 0;
}
Then, follow the correct answer from #shafik-yaghmour

Code blocks between #if 0 and #endif must have paired double quotes?

int main(void)
{
#if 0
something"
#endif
return 0;
}
A simple program above generates a warning: missing terminating " character in gcc. This seems odd, because it means that the compiler allow the code blocks between #if 0 and endif have invalid statement like something here, but not double quotes " that don't pair. The same happens in the use of #ifdef and #ifndef.
Real comments are fine here:
int main(void)
{
/*
something"
*/
return 0;
}
Why? And the single quote ' behave similarly, is there any other tokens that are treating specially?
See the comp.Lang.c FAQ, 11.19:
Under ANSI C, the text inside a "turned off" #if, #ifdef, or #ifndef must still consist of "valid preprocessing tokens." This means that the characters " and ' must each be paired just as in real C code, and the pairs mustn't cross line boundaries.
Compilation needs to go through many cycles, before generating executable binary.
You are not in the compiler yet. Your pre-processor is flagging this error. This will not check for C language syntax, but missing quotes, braces and things like that are pre-processor errors.
After this pre-processor pass, Your code will go to the C Compiler which will detect the error you are expecting...
The preprocessor works at the token level, and a string literal is considered a single token. The preprocessor is warning you that you have an invalid token.
According to the C99 standard, a preprocessing token is one of these things:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the
above
The standard also says:
If a ' or a " character matches the last category, the behavior is
undefined.
Things like "statement" above are invalid to the C compiler, but it is a valid token, and the preprocessor eliminates this token before it gets to the compiler.
Beside the Kevin's answer, Incompatibilities of GCC says:
GCC complains about unterminated character constants inside of preprocessing conditionals that fail. Some programs have English comments enclosed in conditionals that are guaranteed to fail; if these comments contain apostrophes, GCC will probably report an error. For example, this code would produce an error:
#if 0
You can't expect this to work.
#endif
The best solution to such a problem is to put the text into an actual C comment delimited by /*...*/.

how to define macro to convert concatenated char string to wchar_t string in C

Like _T() macro in Visual Studio, I defined and used my own _L macro as:
#define _L(x) __L(x)
#define __L(x) L ## x
It works for:
wprintf(_L("abc\n"));
But gets a compilation error for:
wprintf(_L("abc\n""def\n"));
It reports “string literals with Iifferent character kinds cannot be concatenated”.
I also did:
#define _L2(x) __L2(x)
#define __L2(x) L ## #x
wprintf(_L2("abc\n""def\n"));
The code gets compiled, but the \n doesn't work as an escape sequence of a new line. The \n becomes two characters, and the output of wprintf is:
"abc\n""def\n"
How I can have a macro to convert two concatenated char string to wchar_t? The problem is I have some already existing macros:
#define BAR "The fist line\n" \
"The second line\n"
I want to covert them to a wchar string during compilation. The compiler is ICC in windows.
Edit:
The _L macro works in gcc, the MSVC and ICC didn't works, it's a compiler bug.
Thanks #R.. to comment me.
I don't think you can do this.
Macros just do simple text processing. They can add L in the beginning, but can't tell that "abc\n" "def\n" are two strings, which need adding L twice.
You can make progress if you pass the strings to the macro separated by commas, rather than concatenated:
L("abc\n", "def\n")
It's trivial to define L that accepts exactly two strings:
#define L2(a,b) _L(a) _L(b)
Generalizing it to have a single macro that gets any number of parameters and adds Ls in the correct place is possible, but more complicated. You can find clever macros doing such things in Jens Gustedt's P99 macros.

Resources