Error in macro expansion - c

I have been trying to understand macro expansion and found out that the second printf gives out an error. I am expecting the second print statement to generate the same output as the first one. I know there are functions to do string concatenation. I am finding it difficult to understand why first print statement works and the second doesn't.
#define CAT(str1, str2) str1 str2
void main()
{
char *string_1 = "s1", *string_2 = "s2";
printf(CAT("s1", "s2"));
printf(CAT(string_1, string_2));
}

Concatenating string literals, like "s1" "s2", is part of the language specification. Just placing two variables next to each other, like string_1 string_2 is not part of the language.
If you want to concatenate two string variables, consider using strcat instead, but remember to allocate enough space for the destination string.

Try to do the preprocessing "by hand":
CAT is supposed to take 2 input variables, and print them one after the other, with a space between. So... if we preprocess your code, it becomes:
void main()
{
char *string_1 = "s1", *string_2 = "s2";
printf("s1" "s2");
printf(string_1 string_2);
}
While "s1" "s2" is automatically concatenated to "s1s2" by the compiler, string_1 string_2 is invalid syntax.

Related

string splicing in c language, A stange phenomenon

Recently, I saw a C language code, the following:
printf("%s\n", "1234" "qwer");
// output: 1234qwer
snprintf(buffer, sizeof(buffer), "bvcx" "mju");
// buffer data: bvcxmju
To be honest, it's amazing for me. Before that, I didn't know that the strings can be pasted in "1234" "qwer" format. Why can it run?
then, I try this 'char a[] = "1234" "qwer"', gcc return an error!
so, can someone explain this phenomenon and explain theory?
What you saw has been part of the C language syntax for a long time. A string literal can be split in multiple parts separated only by white space, after preprocessing and comment removal. This syntax enables for example:
writing a long string literal on multiple lines:
char message[] = "This is a long message that can be split on "
"multiple lines for readability";
combining string fragments defined as macros:
printf("The value of i32 is %" PRId32 "\n", i32);
separating string contents that have a different meaning if juxtaposed:
char s1[] = "This is ESC 4: \x1B" "4";
char s2[] = "so is this: \0334 and this: \33""4";
char s3[] = "but not this: \334";
char s4[] = "nor this: \x1B4";
combining stringified macro arguments
Adjacent string literals are always concatenated into a single one as part of the translation phases. See C17 6.4.5/5:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence.
Formally, translation phase 6 happens after macro expansion but before preprocessor tokens are converted to tokens. Meaning for example that
sizeof "hello " "world" yields the result 12, equivalent to:
sizeof "hello world"
Practically, this is convenient when writing various "stringification" macros, example:
#include <stdio.h>
#define STRINGIFY(x) #x
#define STRINGIFY_CONCAT(a,b) STRINGIFY(a) " " STRINGIFY(b)
int main (void)
{
puts(STRINGIFY_CONCAT(hello,world));
}
It's also a useful feature whenever you have to use hex escape sequences and need to terminate them, since C allows them to be of variable length: puts("\xABBA") vs puts("\xAB" "BA") will give different outputs.

strcat() add extra characters at the beginning

I would like to concatenate 2 strings in C using strcat(), but it is always add some random characters to the beginning of the new concatenated string.
Could someone please tell me why is this happening and how to solve it?
This is my code:
#include "stdio.h"
#include "string.h"
int main(void)
{
char text[100];
strcat(text, "Line 1");
strcat(text, "Line 2");
printf("%s", text);
return 0;
}
When I execute this I get the following output:
??? Line1Line2
I would appreciate any help.
Thank you.
The line
strcat(text, "Line 1");
has undefined behavior, because both arguments of the strcat function call must point to strings, i.e. to null-terminated sequences of characters. However, the first argument is not guaranteed to point to a string, because text consists of indeterminate values, because it is not initialized.
You have two options to fix this bug. You can either
before the first call to strcat, set the first character of text to a terminating null character, i.e. text[0] = '\0';, so that text contains a valid (empty) string, or
replace the first function call to strcat with strcpy, because strcpy does not require that the first function argument points to a valid string.

Beginning C: Behavior Printing Strings

I'm learning C and am currently experimenting with storing strings in variables. I put together the following to try different stuff.
#include <stdio.h>
int main() {
char *name = "Tristan";
char today[] = "January 1st, 2016";
char newyear[] = {'H','a','p','p','y',' ','N','e','w',' ','Y','e','a','r','!','\n'};
printf("Hello world!\n");
printf("My name is %s.\n", name);
printf("Today is: %s.\n", today);
printf(newyear);
return 0;
}
After compiling this code and running it, I get the following results:
Hello world!
My name is Tristan.
Today is: January 1st, 2016.
Happy New Year!
January 1st, 2016
Now this is pretty much what I would expect, by why would "January 1st, 2016" get printed out again at the end of the program's output?
If I take the "\n" out of the "newyear" array, it will not do this.
Would someone please explain why this is?
newyear misses a trailing null byte, so printfing it is undefined behavior.
Only string literals implicitly append a null byte. You explicitly initialize every character, so no null byte is appended.
Undefined behavior means that something the standard does not define in this occasion will happen. That includes nothing happening, you bursting into tears, or, yes, printing some string twice.
Just add an additional character, i.e., a null byte to the array to resolve the problem:
char newyear[] = {'H','a','p','p','y',' ','N','e','w',' ','Y','e','a','r','!','\n', '\0'};
Note that no sane person initializes an automatic char array with a string like that. Just stick to string literals! (I think you did it just for learning purposes, though.)
Remember that strings in C are terminated by the special '\0' character.
Not having this terminator at the end of data that is treated as a string will lead to undefined behavior as the string functions pass the end of the data searching for the terminator.
This because you are defining newyear directly as a char array and not through the string literal "" syntax. This prevents the compiler from adding a trailing \0 character which is required to mark the end of a string.
Since both newyear and today reside on stack, in this case they have contiguous storage there so printf keeps after the \n of newyear and prints contents of memory until a \0 is found.
newyear should finish with a '\0' instead of the newline, to be a C string. You can then put the newline in the printf statement, like the others:
char newyear[] = {'H','a','p','p','y',' ','N','e','w',' ','Y','e','a','r','!','\0'};
//...
printf("%s.\n", newyear);
Or, you can add the string terminator to the array, and use the printf as you did:
char newyear[] = {'H','a','p','p','y',' ','N','e','w',' ','Y','e','a','r','!','\n','\0'};
//...
printf(newyear);
In your first two examples, a string defined as "my string" automatically has the '\0' appended, by the compiler.

How does this quine work?

I just came across this quine question, but no one really went into how it works: C/C++ program that prints its own source code as its output
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,34,s,34);}
What I especially don't understand is the following has the same output even though I changed the ints:
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,5,s,11);}
It still prints the 34s! Can someone walk me through this step by step?
Let's start by formatting the code to span multiple lines. This breaks the fact that it's a quine, but makes it easier to see what's happening:
char* s = "char*s=%c%s%c;main(){printf(s,34,s,34);}";
main() {
printf(s, 34, s, 34);
}
Essentially, this is a declaration of a string s that's a printf-formatted string, followed by a declaration of a function main that invokes printf on four arguments. (This definition of main uses the old-fashioned "implicit int" rule in C where functions are assumed to have int as a return type unless specified otherwise. I believe this is currently deprecated in C and know for certain that this is not legal C++ code.)
So what exactly is this printf call doing? Well, it might help to note that 34 is the ASCII code for a double-quote, so the line
printf(s, 34, s, 34);
is essentially
printf(s, '"', s, '"');
This means "print the string s with arguments ", s, and "." So what's s? It's shown here:
char* s = "char*s=%c%s%c;main(){printf(s,34,s,34);}";
This follows a common self-reference trick. Ignoring the %c%s%c part, this is basically a string representation of the rest of the program. The %c%s%c part occurs at the point where it becomes self-referential.
So what happens if you call printf(s, '"', s, '"')? This will fill in the placeholder %c%s%c with "char*s=%c%s%c;main(){printf(s,34,s,34);}", which is the string contents of the string s. Combined with the rest of the string s, this therefore proints
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";main(){printf(s,34,s,34);";
which is the source code of the program. I think this is kinda neat - the closest English translation of a general Quine program I know is "print this string, the second time in quotes" (try it - see what happens!), and this basically does exactly that.
You were asking why changing the numbers to 5 and 11 didn't change that 34 was being printed. That's correct! The string literal s has 34 hard-coded into it, so changing the 5 and 11 in the call to printf won't change that. What it will do is no longer print the quotation marks around the inside of the string s and instead print non-printing characters.

concatenating/printing a string literal

I'm running into an odd problem with concatenating or printing strings. I have a char * that can be set to one of a few values of string literal.
char *myStrLiteral = NULL;
...
if(blah)
myStrLiteral = "foo";
else if(blahblah)
myStrLiteral = "bar";
And I have some other strings that I get from library functions or that are concatenations of input - they're either malloc'ed or stack variables. When I try to print (or concatenate using strcpy() and strcat(), the result is the same), even though I print the string literal last, it prints over the initial characters of the entire string I'm constructing or printing.
/* otherString1 contains "hello", otherString2 contains "world" */
printf("%s %s %s\n", otherString1, otherString2, myStrLiteral);
/* prints "barlo world" */
Am I misunderstanding something about string literals in C?
Check that the literals you're receiving contain only the bytes you expect:
void PrintBytes(const char *s)
{
while (*s) {printf("%d ", *s); s++;}
}
PrintBytes(otherString1);
PrintBytes(otherString2);
PrintBytes(myStrLiteral);
My suspicion is that one of them contains an embedded control character.
If you don't care about finding out which control character is involved, you could simply print the length of each string. If it's longer than it ought to be, there's a control character in there somewhere:
printf("%d\n%s\n", strlen(otherString1), otherString1);
The only thing I can think of is that the otherString2 contains a carriage return, but not a line feed.
to find out
You can strlen otherString2 and see if it matches what you see
You can look at otherString2 with a debugger and see if 0x0D is before the 0x00 terminating the string.

Resources