Understanding output of printf containing backslash (\012) - c

Can you please help me to understand the output of this simple code:
const char str[10] = "55\01234";
printf("%s", str);
The output is:
55
34

The character sequence \012 inside the string is interpreted as an octal escape sequence. The value 012 interpreted as octal is 10 in decimal, which is the line feed (\n) character on most terminals.
From the Wikipedia page:
An octal escape sequence consists of \ followed by one, two, or three octal digits. The octal escape sequence ends when it either contains three octal digits already, or the next character is not an octal digit.
Since your sequence contains three valid octal digits, that's how it's going to be parsed. It doesn't continue with the 3 from 34, since that would be a fourth digit and only three digits are supported.
So you could write your string as "55\n34", which is more clearly what you're seeing and which would be more portable since it's no longer hard-coding the newline but instead letting the compiler generate something suitable.

\012 is an escape sequence which represents octal code of symbol:
012 = 10 = 0xa = LINE FEED (in ASCII)
So your string looks like 55[LINE FEED]34.
LINE FEED character is interpreted as newline sequence on many platforms. That is why you see two strings on a terminal.

\012 is a new line escape sequence as others stated already.
(What might be, as chux absolute correct commented, different if ASCII isn't the used charset. But anyway it is in this notation an octal digit.)
this is meant by standard as it says for c99 in ISO/IEC 9899
for:
6.4.4.4 Character constants
[...]
3 The single-quote ', the double-quote ", the question-mark ?, the backslash \, and
arbitrary integer values are representable according to the following table of escape
sequences:
single quote' \'
double quote" \"
question mark? \?
backslash\ \
octal character \octal digits
hexadecimal character \x hexadecimal digits
And the range it gets bound to:
Constraints
9 The value of an octal or hexadecimal escape sequence shall be in the range of
representable values for the type unsigned char for an integer character constant, or
the unsigned type corresponding to wchar_t for a wide character constant.

Related

char array initialization with octal constant

I saw a comment that said initialization of a char array with "\001" would put a nul as the first character. I have seen where \0 does set a nul.
The unedited comment:
char input[SIZE] = ""; is sufficient initialization. while ( '\001' == input[0]) doesn't do what you think it is doing if you have initialized input[SIZE] = "\001"; (which creates an empty-string with the nul-character as the 1st character.)
This program
#include <stdio.h>
#define SIZE 8
int main ( void) {
char input[SIZE] = "\001";
if ( '\001' == input[0]) {//also tried 1 == input[0]
printf ( "octal 1\n\n");
}
else {
printf ( "empty string\n");
}
return 0;
}
running on Linux, compiled with gcc, outputs:
octal 1
so the first character is 1 rather than '\0'.
Is this the standard behavior or just something with Linux and gcc? Why does it not set a nul?
Is this the standard behavior or just something with Linux and gcc? Why does it not set a nul?
The behavior of the code you present is as required by the standard. In both string literals and integer character constants, octal escapes may contain one, two, or three digits, and the C standard specifies that
Each octal [...] escape sequence is the longest sequence of
characters that can constitute the escape sequence.
(C2011, 6.4.4.4/7)
In this context it is additionally relevant that \0 is an octal escape sequence, not a special, independent code for the null character. The wider context of the above quotation will make that clear.
In the string literal "\001", the backslash is followed by three octal digits, and an octal escape can have three digits, therefore the escape sequence consists of the backslash and all three digits. The first character of the resulting string is the one with integer value 1.
If for some reason you wanted a string literal consisting of a null character followed by the decimal digits 0 and 1, then you could either express the null with a full three-digit escape,
"\00001"
or split it up like so:
"\0" "01"
C will join adjacent string literals to produce the wanted result.
I saw a comment that said initialization of a char array with "\001" would put a nul as the first character.
That comment was in error.
From 6.4.4.1 Integer constants, paragraph 3, emphasis mine:
An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only.
But what we are looking at here is not an integer constant at all. What we have here is, actually, an octal escape sequence. And that is defined as follows (in 6.4.4.4 Character constants):
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
The definition -- both for integer constants as well as character constants -- is "greedy", as elaborated by paragraph 7:
Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.
That means, if the first octal digit is followed by something that could be an octal digit, that next character is considered an octal digit belonging to that constant (to a maximum of three in the case of character constants -- not so for integer constants!).
Hence, your "\001" is, indeed, a character with the value 1.
Note that, while octal character constants run up to three characters maximum (making such a constant quite safe to use if padded with leading zeroes as necessary to get a length of three digits), hexadecimal character constants run as long as there are hexadecimal digits (potentially overflowing the char type they are meant to initialize).
See http://c0x.coding-guidelines.com/6.4.4.4.html
Octal sequence is defined as:
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
and item 873:
The octal digits that follow the backslash in an octal escape sequence
are taken to be part of the construction of a single character for an
integer character constant or of a single wide character for a wide
character constant.
also item 877:
Each octal or hexadecimal escape sequence is the longest sequence of
characters that can constitute the escape sequence.
Therefore the behaviour is correct. "\001" should not have null byte at position 0.

What do the characters starting with '\' and followed by a number e.g '\234' mean?

I have been looking at the source of an app when I came across these characters e.g '\233', '\234', '\235' and when I print them, I get garbage characters.
\233 is the character with the octal code 233.
In decimal this is 2×82 + 3×8 + 3 = 155
The meaning depends on the characterset being used. Codes beyond 127 are not defined in 7-bit ASCII.
As advertised by DevSolar:
http://rootdirectory.de/chrome/site/encoding.html might be helpful
They are octal-escape-sequences, which are used to represent specific byte values in a character constant or string literal.
C11, 6.4.4.4 Character constants:
character-constant:
' c-char-sequence '
L' c-char-sequence '
u' c-char-sequence '
U' c-char-sequence '
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except the single-quote ', backslash \, or new-line character
escape-sequence
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
universal-character-name
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
An octal escape sequence is defined as a backslash followed by one to three octal digits (0-7).
To avoid getting a following decimal digit interpreted as part of the octal sequence, it is common practice to pad an octal escape sequence with leading zeroes. As opposed to octal integer constants, though, a leading zero is not required.
Note that the semantic meaning of such an escape sequence depends on the context. I could write "Fu\303\237", and it could mean "Fuß" (in UTF-8) or "Fuß" (in CP-1252), depending of what encoding I am assuming the string to be in. What I can not do, portably, is writing either of those strings in the source directly, because the interpretation of any character not in the source character set (i.e., ASCII-7 without dollar, at-sign, and backtick) is implementation-defined. While most compilers today can be made to interpret string literals as UTF-8, octal escape sequences are the portable way.
FWIW, there are also hexadecimal escape sequences; however they are not as well-defined: They greedily gobble as many "hex digits" as they can get, even beyond what a char can hold; so if the next character in the string literal is one of [0-9a-fA-F], you have no way of "terminating" the hex escape before that (1); this is why octal sequences are preferred by some.
(1): As M.M pointed out, you could split your string literal in two ("\xAB" "CD").
As for what the various character values could stand for, in which encoding, I recommend a good code table. This one I whipped up myself, as I could not find any existing one listing all the information I needed in one page.
It's an escape sequence, for octal values. The syntax is \nnn.
You can read more about escape sequences in c here.
Garbage is printed, because 233 in octal is 155 in decimal, 234 is 156 and 235 is 157. They do not represent any ascii character.
That notation is octal-escape-sequence which represents octal number representation for a char literal (char constant).
Quoting C11, chapter §6.4.4.4, Character constants
The single-quote ', the double-quote ", the question-mark ?, the backslash \, and
arbitrary integer values are representable according to the following table of escape
sequences:
...
octal character \octal digits
and, regarding the values,
The octal digits that follow the backslash in an octal escape sequence are taken to be part
of the construction of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of the octal integer so
formed specifies the value of the desired character or wide character.

How to escape from hex to decimal

I apologise if this is an obvious question. I've been searching online for an answer to this and cannot find one. This isn't relevant to my code per se, it's a curiosity on my part.
I am looking at testing my function to read start and end bytes of a buffer.
If I declare a char array as:
char *buffer;
buffer = "\x0212\x03";
meaning STX12ETX - switching between hex and decimal.
I get the expected error:
warning: hex escape sequence out of range [enabled by default]
I can test the code using all hex values:
"\x02\x31\x32\x03"
I am wanting to know, is there a way to escape the hex value to indicate that the following is a decimal value?
will something like this work for you ?
char *buffer;
buffer = "\x02" "12" "\x03";
according to standard:
§ 5.1.1.2 6. Adjacent string literal tokens are concatenated.
§ 6.4.4.4 3. and 7. Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.
the escape characters:
\' - single quote '
\" - double quote "
\? - question mark ?
\ - backslash \
\octal digits
\xhexadecimal digits
So the only way to do it is concatenation of strings with the precompiler concatenation ( listing them one after another).
if you want to know more how the literals are constructed by compiler look at §6.4.4.4 and §6.4.5 they describe how to construct the character literals and string literals respectively.
You can write
"\b12"
to represent a decimal value. Altough you need to use space after hex values for it to work.
buffer = "\x02 \b12\x03";
Or just 12
buffer = "\x02 12\x03";
Basically you need to add a blank character after your hex values to indicate that it's a new value and not the same one
No, there's no way to end a hexadecimal escape except by having an invalid (for the hex value) character, but then that character is of course interpreted in its own right.
The C11 draft says (in 6.4.4.4 14):
[...] a hexadecimal escape sequence is terminated only by a non-hexadecimal character.
Octal escapes don't have this problem, they are limited to three octal digits.
You can always use the octal format. Octal code is always 3 digits.
So to get the character '<-' you simple type \215

Octal representation inside a string in C

In the given program:
int main() {
char *p = "\0777";
printf("%d %d %d\n",p[0],p[1],p[2]);
printf("--%c-- --%c-- --%c--\n",p[0],p[1],p[2]);
return 0;
}
It is showing the output as:
63 55 0
--?-- --7-- ----
I can understand that it is converting the first two characters after \0 (\077) from octal to decimal but can any one explain me why 2 characters, why not 1 or 3 or any other ?
Please explain the logic behind this.
char *p = "\07777";
Here a string literal assigned to a pointer to a char.
"\07777"
In this string literal octal escape sequence is used so first three digits represents a octal number.because rules for octal escape sequence is---
You can use only the digits 0 through 7 in an octal escape sequence. Octal escape sequences can never be longer than three digits and are terminated by the first character that is not an octal digit. Although you do not need to use all three digits, you must use at least one. For example, the octal representation is \10 for the ASCII backspace character and \101 for the letter A, as given in an ASCII chart.
SO your string literal stored in memory like
1st byte as a octal number 077 which is nothing but 63 in decimal and '?' in character
2nd and 3rd byte as a characters '7' and '7' respectively
and a terminating character '\0' in last.
so your answer are as expected 1st,2nd,3d byte of the string literal.
for more explanation you can visit this web site
http://msdn.microsoft.com/en-us/library/edsza5ck.aspx
It's just the way the language defines octal escape sequences.
An octal escape sequence, which can be part of a character constant or string literal, consists of a \ followed by exactly 1, 2, or 3 octal digits ('0' .. '7').
In "\07777", the backslash is followed by 3 octal digits (0, 7, 7), which represents a character with the value 077 in octal, or 63 in decimal. In ASCII or an ASCII-derived encoding, that happens to be a question mark '?'.
So the literal represents a string with a length of 3, consisting of '?', '7', '7'.
But there must be a typo in your question. When I run your program, the output I get is:
63 55 55
--?-- --7-- --7--
If I change the declaration of p to
char *p = "\0777";
I get the output you describe. Note that the final ---- is really two hyphens, followed by a null character, followed by two hyphens. If you're on a Unix-like system, try piping the program's output through cat -v or cat -A.
When you post code, it's very important to copy-and-paste it, not retype it.
(And you're missing the #include <stdio.h> at the top.)

Size of escaped characters in C

Why does the following program output 5?
#include <stdio.h>
main()
{
char str[]="S\065AB";
printf("\n%d", sizeof(str));
}
Short answer: See David Heffernan's answer.
Long answer:
§ 6.4.4.4 of the C(99) standard specifies "character constants", which (among others) include simple escape sequences (e.g. '\n', '\\'), octal escape sequences (e.g. '\0'), hexadecimal escape sequences (e.g. '\x0f'), and universal character names (e.g. '\u0112').
The backslash in your example introduces such an escape / octal / hex / universal constant. The following octal digit ([0-7]) makes it an octal constant (hex would be '\x', universal would be '\u', escape sequence would be '\['"?\abfnrtv]').
That octal constant is terminated once three octal digits are consumed, or a non-octal-digit is encountered.
I.e., '\065' is equivalent to '\x35' or (decimal) 53, which is (coincidentally) '5' on the ASCII table - a single character, anyway.
It's the size of the array which has five elements: S, \065, A, B, \0

Resources