Concatenating two characters to create escape sequence - c

Idea is to try to create escape sequences 'human' way. For example, I use two characters to create '\n', the '\' and 'n'.
What I'm thinking about is char array[3]={'\\','n','\0'};
so I can change 'n' character and still use it as an escape sequence.
When I printf(array) it now prints:
\n
and I'd like it to go to next line.
For example:
what if I wanted to check manually what every letter in alphabet does when used as escape sequence with a loop?
for(char='a';char<='z';char++)
{
/* create escape sequence with that letter */
/* print that escape sequence and see what it does */
}
It's not an assignment,has no practical use (at least not yet), but just a theoretical question that I couldn't find answer anywhere, nor figure it out myself.

The escape sequence represents a single character and is evaluated at compile time. You cannot have a literal string interpreted as an escape sequence at run time.
For example '\n' is a newline (or line-feed character - 0x0A in ASCII)
Note that:
char array[3]={'\\','n','\0'};
is equivalent to:
char array[3] = "\\n" ;
so perhaps unsurprisingly when you printf(array) it prints \n - that is what you have asked it to do.
Undefined escape sequences simply won't compile, so you might simply:
char = '\a' ;
char = '\b' ;
... // etc.
and see which lines the compiler baulks at. However that is not the complete story because some escape sequences require operands, for example \x on its own has no meaning, whereas \xab is the character represented by 0xab (171 decimal). Others are not even letters. Most are related to white-space, and their effect may be dependent on the terminal or console capabilities of the execution platform. So a naive investigation may not generate the results you seek, because it does not account for the language semantics or platform capabilities.
All supported escape sequences are in fact well defined - you'll find few surprises except perhaps those related to platform capabilities (for example if your target has no means to generate a beep, \a will have no useful effect):
\a Beep
\b Backspace
\f Form-feed
\n Newline
\r Carriage return
\t Horizontal tab
\v Vertical tab
\\ Backslash
\' Single quotation mark
\" Double quotation mark
\0 ASCII 0x00 (null terminator)
\ooo Octal representation
\xdd Hexadecimal representation

What about writing your own printf()?
Where you can check for a '\' followed by a 'n' and than only print from char[0] to '\''n'. Finally add "printf("\n");
mfg

Related

\\\ prints just one backslash?

In C, on mentioning three backslashes, like so:
#include <stdio.h>
int main() {
printf(" \\\ ");
}
prints out just one backslash in the output. Why and how does this work?
That sequence is:
A space
A double backslash, which encodes a single backslash in the runtime string
A backslash followed by a space, which is not a standard escape sequence and should give you a diagnostic
The C11 draft says (in note 77):
The semantics of these characters were discussed in 5.2.2. If any other
character follows a backslash, the result is not a token and a diagnostic is
required.
On godbolt.org I got:
<source>:8:14: warning: unknown escape sequence '\ ' [-Wunknown-escape-sequence]
So you seem to be using a non-conforming compiler, which chooses to implement undefined backslash sequences by just letting the character through.
That is printing:
space
slash
escaped space
The 3rd slash is being interpreted as "slash space"
C11; 6.4.4.4 Character constants:
The double-quote " and question-mark ? are representable either by themselves or by the
escape sequences \" and \?, respectively, but the single-quote ' and the backslash \
shall be represented, respectively, by the escape sequences \' and \\.
So, To represent a single backslash, it’s necessary to place double backslashes \\ in the source code. To print two \\ you need four backslash \\\\. In your code extra \ is a space character, which isn't valid.
It is indeed a very simple operation in c. A \ is just a escape sequences. Hence below statement will print two slash.
printf(" \\\\ ");
For example some characters in c are represented with a slash like end of a line character \n or end of a string character \0 etc. But if you want to print such a character as it is what will you do? Hence you need to add a escape sequence character in front of it:
printf("\\n"); // will print \n
But
printf("\n"); // will print end of character hence you don't see anything in output

Multi character constant warning for escaped \t

If I write putchar('\\t'); while trying to print "\t" instead of an actual tab, I get the multi character constant warning. On the other hand, if I write putchar('\\'); I get no warning. Upon looking in the ASCII table, there is no character '\\', only '\'. So why is there no warning? Why is '\\' one character but '\\t' is more than one? Can a backslash only be used to escape one following character?
You cannot print \ and t with one putchar invocation, since putchar puts one and exactly only one character into the standard output. Use 2:
putchar('\\');
putchar('t');
Another option would be to use the string "\\t" with fputs:
fputs("\\t", stdout);
There is no warning for '\\' because that is one way how you enter the character literal for the character \. On ASCII this is synonymous with '\134' and '\x5c'.
From C11 6.4.4.4 paragraphs 2 and 4:
2
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'. [...] With a few exceptions detailed later, the elements of the sequence are any members of the source character set; they are mapped in an implementation-defined manner to members of the execution character set.
[...]
4
The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.
The reason why you get a warning for this is that the behaviour is wholly implementation-defined. In C11 J.3.4 the following is listed as implementation-defined behaviour:
The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character (6.4.4.4).
Since '\\' contains an escape sequence that maps to a single-byte execution character \, there is no implementation-defined pitfalls, and nothing to warn about; but \\t contains 2 characters: \ and t, and it wouldn't do what you want portably.
\\ is one character, t is one character, so that is clearly two characters.
\\ is an escape sequence, just like \t; it means \.
If you want to print the two characters \ and t, you clearly need either two calls to putch() or a function that takes a string argument "\\t".
https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences

sizeof(string) not including a "\" sign

I've been going through strlen and sizeof for strings (char arrays) and I don't quite get one thing.
I have the following code:
int main() {
char str[]="gdb\0eahr";
printf("sizeof=%u\n",sizeof(str));
printf("strlen=%u\n",strlen(str));
return 0;
}
the output of the code is:
sizeof=9
strlen=3
At first I was pretty sure that 2 separate characters \ followed by 0 wouldn't actually act as a NUL (\0) but I managed to figure that it does.
The thing is that I have no idea why sizeof shows 9 and not 10.
Since sizeof counts the amount of used bytes by the data type why doesn't it count the byte for the \?
In a following example:
char str[]="abc";
printf("sizeof=%u\n",sizeof(str));
that would print out "4" because of the NUL value terminating the array so why is \ being not counted?
In a character or string constant, the \ character marks the beginning of an escape sequence, used to represent character values for which there isn't a symbol in the source character set. For example, the escape sequence \n represents the newline character, \b represents the backspace character, \0 represents the zero-valued character (which is also the string terminator), etc.
In the string literal "gdb\0eahr", the escape sequence \0 maps to a single 0-valued character; the actual contents of str are {'g', 'd', 'b', 0, 'e', 'a', 'h', 'r', 0}.
It seems you already have the answer:
At first I was pretty sure that 2 separate characters "\" followed by "0" wouldn't actually act as a NULL "\0" but I managed to figure that it does.
The sequence \0 is an octal escape sequence for the byte 0. So while there are two characters in the code to denote this, it translated to a single byte in the string.
So you have 7 alphabetic characters, a null byte in the middle, and a null byte at the end. That's 9 bytes.
Why should char str[]="gdb\0eahr"; be 10 bytes with sizeof operator? It is 9 bytes because there are 8 string elements + trailing zero.
\0 is only 1 character, not 2. \'s purpose is to escape characters, therefore you might see some of these: \t, \n, \\ and others.
Strlen returns 3 because you have string termination at position str[3].
Single sequence of \ acts as escape character and is not part of string size. If you want to literally use \ in your string, you have to write it twice in sequence like \\, then this is single char of \ printable char.
The C compiler scans text strings as a part of compiling the source code and during the scan any special, escape sequences of characters are turned into a single character. The symbol backslash (\) is used to indicate the start of an escape sequence.
There are several formats for escape sequences. The most basic is a backslash followed by one of several special letters. These two characters are then translated into a single character. Some of these are:
'\n' is turned into a line feed character (0x0A or decimal 10)
'\t' is turned into a tab character (0x09 or decimal 9)
'\r' is turned into a carriage return character (0x0D or decimal 13)
'\\' is turned into a backslash character (0x5C)
This escape sequence idea was used back in the old days so that when a line of text was printed to a teletype machine or printer or a CRT terminal, the programmer could use these and other special command code characters to set where the next character would be printed or to cause the device to do some physical action like ring a bell or feed the paper to the next line.
The escape character also allowed you to embed a double quote (") or a single quote (') into a text string so that you could print text that contain quote marks.
In addition to the above special sequences of backslash followed by a letter there was also a way to specify any character by using a backslash followed by one up to three octal digits (0 through 7). So you could specify a line feed character by either using '\n' or you could use '\12' where 12 is the octal representation of the hexadecimal value A or the decimal value 10.
Then the ability to use a hexadecimal escape sequence was introduced with the backslash followed by the letter x followed by one or more hexadecimal digits. So you can write a line feed character with '\n' or '\12' or '\xa'.
See also Escape sequences in C in Wikipedia.

What is the char in C for the int value 10? Where I can look up this?

I have a character in a char-Array which I get with fputs(). But it contains a char which is getting count by the function strlen(). I decide to give me out the int value of this char to see where the problem is.
As char I can see nothing. Thought its a Whitespace but not sure. Would like if someone could tell me what it is and explain why it is there.
printf("%d",(int) input[6]); //--> give me the value of 10 out.
The value 10 is the ASCII value for the newline character (LF, or linefeed). Closely related is character 13, which is CR, or carriage return, which, on Windows systems, often precedes the LF character. I would suggest getting a copy of the ASCII table (they're all over the web) and referencing it from time to time.
Character 10 can be represented by '\n' in C code, as well as '\012', '\x0a', and '\u000a'
Character 13 (carriage return) can be represented by '\r', '\015', '\x0d', and '\u000d'.
It is the newline (LF (NL line feed, new line)) in ASCII. See all of the values here.
As already pointed out by the others, the character 10 in ASCII is LF (line feed).
If you wanted printf to output the character (not see its ordinal value), you could use the %c format specifier to pass a single character.
Example:
printf("-%c-", input[6]);
should yield:
-
-
I.e. two dashes separated by a line feed. Please keep in mind that the outcome on Windows depends on how your C runtime handles a single LF without CR as on Windows a line break is customarily represented by CRLF instead of just LF which is the standard on unixoid systems. The only exception to that rule were old Mac systems which used to use only CR to encode a line break.

C program to store special characters in strings

Basically, I can't figure this out, I want my C program to store the entire plaintext of a batch program then insert in a file and then run.
I finished my program, but holding the contents is my problem. How do I insert the code in a string and make it ignore ALL special characters like %s \ etc?
You have to escape special characters with a \, you can escape backslash itself with another backslash (i.e. \\).
As Ian previously mentioned, you can escape characters that aren't allowed in normal C strings with \; for instance, newline becomes \n, double-quote becomes \", and backslash becomes \\.
If you're unable or unwilling to do this for whatever reason, then you may be out of luck if you're solution must be in C. However, if you're willing to switch to C++, then you can use raw strings:
const char* s1 = R"foo(
Hello
World
)foo";
This is equivalent to
const char* s2 = "\nHello\nWorld\n";
A raw string must begin with R" followed by an arbitrary delimiter (made of any source character but parentheses, backslash and spaces; can be empty; and at most 16 characters long), then (, and must end with ) followed by the delimiter and ". The delimiter must be chosen such that the termination substring (), delimiter, ") does not appear within the string.

Resources