Why does C not recognize strings across multiple lines? - c

(I am very new to C.)
Visual newlines seem to be unimportant in C.
For instance:
int i; int j;
is same as
int i;
int j;
and
int k = 0 ;
is same as
int
k
=
0
;
so why is
"hello
hello"
not the same as
"hello hello"

It is because a line that contains a starting quote character and not an ending quote character was more likely a typing mistake or other error than an attempt to write a string across multiple lines, so the decision was made that string literals would not span source lines, unless deliberately indicated with \ at the end of a line.
Further, when such an error occurs, the compile would be faced with reading possibly thousands of lines of code before determining there was no closing quote character (end of file was reached) or finding what was intended as an opening quote character for some other string literal and then attempting to parse the contents of that string literal as C code. In addition to burdening early compilers with limited compute resources, this could result in confusing error messages in a part of the source code far removed from the missing quote character.
This choice is effected in C 2018 6.4.5 1, which says that a string literal is " s-char-sequenceopt ", where s-char-sequence is any member of the character set except the quote character, a backslash, or a new-line character (and a string literal may also have an encoding prefix, a u8, u, U, or L before the first ").

Strings can be continued over newlines by putting a backslash immediately before the newline:
"hello \
hello"
Or (better), using string concatenation:
"hello "
"hello"
Note that the space has been carefully preserved so that these are equivalent to "hello hello" except for the line numbering in the file after the appearance.
The backslash-newline line elimination is done very early in the translation process — in phase 2 of the conceptual translation phases.
Note that there is no stripping leading blanks or anything. If you write:
printf("Some long string with maybe an integer %d in it\
and some more data on the next line\n", i);
Then the string has a sequence of (at least) 8 blanks in it between in it and and some. The count of 8 assumes that the printf() statement is aligned in the left margin; if it is indented, you'd need to add the extra white space corresponding to the indentation.

1- using double quotes for each string :
char *str = "hello "
"hello" ;
** One problem with this approach is that we need to escape specially characters such as quotation mark " itself.
2- Using - \ :
char *str = "hello \
hello" ;
** This form is a lot easier to write, we don't need to write quotation marks for each line.

We can think of a C program as a series of tokens: groups of characters that can't be split up without changing their meaning. Identifiers and keywords are tokens. So are operators like + and -, punctuation marks such as the comma
and semicolon, and string literals.
For example, the line
int i; int j;
consists of 6 tokens: int, i, ;, int, j and ;. Most of the time, and particularly in this case, the amount of space (space, tab and newline characters) is not critical. That's why the compiler will treat
int i
;int
j;
The same.
Writing
"Hello
Hello"
Is like writing
un signed
and hope that the compiler treat it as
unsigned
Just like space is not allowed between a keyword, newline character is not allowed in a string literal token. But it can be included using the newline escape '\n' when needed.
To write strings across lines use string concatenation method
"Hello"
"Hello"
Although the above method is recommended, you can also use a backslash
"Hello \
Hello"
With the backslash method, beware of the beginning space in a new line. The string will include everything in that line until it finds a closing quote or another backslash.

Related

Different behavior for \" in C

I have a strange problem when using string function in C.
Currently I have a function that sends string to UART port.
When I give to it a string like
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=SM"); //not a valid AT command, but without quotes just for test
it works well and sends string to UART. But when I use such call:
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
uart0_putstr(buf);
//or
uart0_putstr("AT+CPMS=\"SM\"");
it doesn't print to UART anything.
Maybe you can explain me what the difference between strings in first and second/third cases?
First the C language part:
String literals: All C string literals include an implicit null byte at the end; the C string literal "123" defines a 4 byte array with the values 49,50,51,0. The null byte is always there even if it is never mentioned and enables strlen, strcat etc. to find the end of the string. The suggestion strcpy(buf, "AT+CPMS=\"SM\"\0"); is nonsensical: The character array produced by "AT+CPMS=\"SM\"\0" now ends in two consecutive zero bytes; strcpy will stop at the first one already. "" is a 1 byte array whose single element has the value 0. There is no need to append another 0 byte.
strcat, strcpy: Both functions always add a null byte at the end of the string. There is no need to add a second one.
Escaping: As you know, a C string literal consists of characters book-ended by double quotes: "abc". This makes it impossible to have simple double quotes as part of the string because that would end the string. We have to "escape" them. The C language uses the backslash to give certain characters a special meaning, or, in this case, suppress the special meaning. The entire combination of backslash and subsequent source code character are transformed into a single character, or byte, in the compiled program. The combination \n is transformed into a single byte with the value 13 (usually interpreted as a newline by output devices), \r is 10 (usually carriage return), and \" is transformed into the byte 34, usually printed as the " glyph. The string Between the arrows is a double quote: ->"<- must be coded as "Between the arrows is a double quote: ->\"<-" in C. The middle double quote doesn't end the string literal because it is "escaped".
Then the UART part: The internet makes me believe that the command you want to send over the UART looks like AT+CPMS="SM", followed by a carriage return. The corresponding C string literal would be "AT+CPMS=\"SM\"\r".
The page I linked also inserts a delay between sending commands. Sending too quickly may cause errors that appear only sometimes.
The things to note are :
The AT command syntax probably demands that SM be surrounded by quotes on both sides.
Additionally, the protocol probably demands that a command end in a carriage return.
This ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM");
strcat(buf, "\"");
... produces the same contents in buf as this ...
char buf[32];
strcpy(buf, "AT+CPMS=\"SM\"");
... does, up to and including the string terminator at index 12. I fully expect an immediately following call to ...
uart0_putstr(buf);
... to have the same effect in each case unless uart0_putstr() looks at bytes past the terminator or its behavior is sensitive to factors other than its argument.
If it does look past the terminator, however, then that might explain not only a difference between those two, but also a difference with ...
uart0_putstr("AT+CPMS=\"SM\"");
... because in this last case, looking past the string terminator would overrun the bounds of the array, producing undefined behavior.
Thanks all. Finally It was resolved with adding NULL char to the end of string.

Escape sequence in C in string

In this code:
int main()
{
char str[]= "geeks\nforgeeks";
char *ptr1, *ptr2;
ptr1 = &str[3];
ptr2 = str + 5;
printf("%c", ++*str - --*ptr1 + *ptr2 + 2);
printf("%s", str);
getchar();
return 0;
}
Why does the compiler interpret \n as an escape sequence and not as two characters viz, \ and n?
On the other hand, this program does not comment out hello.
int main()
{
char str[]= "geeks/*hello*/geeks";
printf ("%s",str);
return 0;
}
Why does the compiler interpret \n as an escape sequence and not as
two characters viz, \ and n?
By the definition of the escape sequence.
The C Standard (5.2.2 Character display semantics)
2 Alphabetic escape sequences representing nongraphic characters in
the execution character set are intended to produce actions on display
devices as follows:
//...
\n (new line) Moves the active position to the initial position
of the next line.
If you want to have two separate characters \ and n then you should write for example
char str[]= "geeks\\nforgeeks";
Now there are two separate characters one of which is represented by the escape sequence '\\' and other by the symbol 'n'.
As for the second your question
On the other hand, this program does not comment out hello.
char str[]= "geeks/*hello*/geeks";
Then within a string literal symbols /* and */ do not form a comment. They are elements of the string literal.
The C Standard (6.4.9 Comments)
1 Except within a character constant, a string literal, or a
comment, the characters /* introduce a comment. The contents of such a
comment are examined only to identify multibyte characters and to find
the characters */ that terminate it.
But that is inside a string, so how does the compiler know?
The below quotes and links are sufficient enough to understand what is an escape sequence.
From Wikipedia
In C, all escape sequences consist of two or more characters, the first of which is the backslash, \ (called the "Escape character");
Also from C11 Standard
In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters. A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.

Reason behind the following output

#include <stdio.h>
char *prg = "char *prg = %c%s%c;main(){printf(prg,34,prg,34);} " ;
void main (){
printf(prg,34,prg,34);
}
Reason behind the following output
char *prg = "char *prg = %c%s%c;main(){printf(prg,34,prg,34);} ";main(){printf(prg,34,prg,34);}
the content in *prg Is printed but in place of "%c%s%c" the content is replaced, why is it embedded
That's the way printf() works - in the format (the first prg argument), each conversion specification (%c, %s) converts a subsequent argument (34, prg) according to the conversion specifier (c, s) and writes the result to the standard output.
That program is a classical example of a program that prints its source code to stdout. Ascii character 34 is the ascii literal for " which is needed to be able to print the delimiters of a C string without incurring in an infinite recursive problem. There are some characters that cannot be used literally, as they are transformed by the compiler, one of these is ", which dissapears when compiled in a string literal, others are the escaped char literals \n, \t,... depending on that, convert into a different char literal. This is the reason the source code must be all in one line (control chars transform themselves by the compiler), no #include ... statements must be allowed (because it ends in a newline), and other things like this.
Compile it and rearrange it (you have modified it, for clarity, on posting) so you can output the exact same source code as you give the compiler.
Note
The program code, to mimic exactly its source form in the output, must be written as:
char*p="char*p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
without a trailing newline at the end of the line.
If you substitute 34 by '\"', and p by its value, you'll get:
...
printf(
"char*p=%c%s%c;main(){printf(p,34,p,34);}",
'\"',
"char *p=%c%s%c;main(){printf(p,34,p,34);}",
'\"');
that will form the original string, putting the right delimiters at the proper positions. (note: the %c and %s in the third parameter are not further expanded)
Note2
This program depends on ASCII encoding for ". It should'n work on EBCDIC encodings (you must use the encoding for " instead of the number 34)

C program to store special characters in strings

Basically, I can't figure this out, I want my C program to store the entire plaintext of a batch program then insert in a file and then run.
I finished my program, but holding the contents is my problem. How do I insert the code in a string and make it ignore ALL special characters like %s \ etc?
You have to escape special characters with a \, you can escape backslash itself with another backslash (i.e. \\).
As Ian previously mentioned, you can escape characters that aren't allowed in normal C strings with \; for instance, newline becomes \n, double-quote becomes \", and backslash becomes \\.
If you're unable or unwilling to do this for whatever reason, then you may be out of luck if you're solution must be in C. However, if you're willing to switch to C++, then you can use raw strings:
const char* s1 = R"foo(
Hello
World
)foo";
This is equivalent to
const char* s2 = "\nHello\nWorld\n";
A raw string must begin with R" followed by an arbitrary delimiter (made of any source character but parentheses, backslash and spaces; can be empty; and at most 16 characters long), then (, and must end with ) followed by the delimiter and ". The delimiter must be chosen such that the termination substring (), delimiter, ") does not appear within the string.

Writing printf(#"") with multiple line string

Is it possible to write something like this:
printf(#"
-
-
-
-
");
I can do it in C#, but can't in C. It gives me an error in CodeBlocks. Am I allowed to do such ?
Error message: error: stray '#' in program.
No. That syntax doesn't exist in C.
If you want a multiple-line string, write it as multiple double-quoted strings with no other tokens in between them. They will be combined.
printf(
"some string"
"more of the string"
"even more of the string"
);
(You will, of course, need to add a \n at the end of each line if that's what you want.)
No that's not a syntax that C understands, C doesn't have raw literals.
You can use \ as the last character to continue on the next line:
const char *str = "hello\n\
world";
Also, consecutive string literals will be concatenated. So you can do e.g.
const char *str = "Hello\n"
"world\n";
C#'s verbatim strings are not available in C. If you have some characters to escape, like " or \, escape them with '\', there is no there option in this language.
If you want to embed multiple lines in a string literal, you can either insert \n at the appropriate location in your string, or escape the return character as well:
printf("Here's\
a multiline\
string litteral");
Line continuation with \ at the end of the line.
printf("\
\
-\
-\
-\
-\
");
String literals in C may not contain newlines. You have two workarounds:
Use implicit string concatenation (done by the compiler).
printf("The quick brown"
" fox jumps over"
" the sleazy dog.");
Escape the newline by placing a backslash in front of it.
printf("The quick brown\
fox jumps over\
the sleazy dog.");
Personally, I prefer the first form since the second looks ugly (my opinion) and forces you to ruin your code indentation.
In either case, the string will simply not contain the newlines. So if you really meant for them to be there, you'll have to add them via \n.

Resources