Can someone explain this question to me? I don't understand how the book arrived at its values or how one would arrive at the answer.
Here is the question:
Suppose that ch is a type char variable. Show how to assign the carriage-return character to ch by using an escape sequence, a decimal value, an octal character constant, and a hex character constant. (Assume ASCII code values.)
Here is the answer:
Assigning the carriage-return character to ch by using:
a) escape sequence: ch='\r';
b) decimal value: ch=13;
c) an octal character constant: ch='\015';
d) a hex character constant: ch='\xd';
I understand the answer to part a, but am completely lost for parts b, c, and d. Can you explain?
Computers represent characters using character encondings, such as ascii, utf-8, utf-16, iso-8859 (http://en.wikipedia.org/wiki/ISO/IEC_8859-1), as well as others. The carriage return character was used by early computers as a printer instruction to return the printhead to the leftmost position. And the linefeed character was used to index the paper to a new line (thus why DOS uses CRLF for lines, it worked better with dot matrix printers). Anyway the CR character is stored internally as a numeric value in either a single 8-bit byte/octet or a 16-bit pair of two bytes/octets, depending upon your language.
The common ascii characterset is found here: http://www.asciitable.com/ and you can find that CR, '\r', 13, 0xD, et al are different representations for the same value.
Strings are just sequences of characters stored either as an array of characters with a marker at the end (terminator), or stored with a count of the current string length.
From wiki:
Computers and communication equipment represent characters using a
character encoding that assigns each character to something — an
integer quantity represented by a sequence of bits, typically — that
can be stored or transmitted through a network. Two examples of usual
encodings are ASCII and the UTF-8 encoding for Unicode.
For your question b,c,d - all values are 13 (in decimal). Run this code to understand what's happening:
char ch1='\r';
printf("Ascii value of carriage return is %d", ch1);
There are two parts to explaining answers b-d.
You need to know that the ASCII code point for 'carriage return' or CR (also known as Control-M) is 13. You can find that out from various sources. It might not be obvious that the Unicode standard is one of those places (but it is) and U+000D is CARRIAGE RETURN (CR). Unicode code points U+0000..U+007F are identical to ASCII; Unicode code points U+0000..U+00FF are identical to ISO 8859-1 (Latin 1).
You need to know that C can use decimal numbers, or octal or hexadecimal escapes when assigning to characters. Notations such as '\15' or '\015' are octal character constants, and octal 15 is decimal 13. Notations such as '\xD' or '\x0D' (or, indeed, '\x0000000000000D' and all stops en route) are hexedecimal constants and hex D is also decimal 13. (Note that octal escapes are limited to 1-3 digits, but hex escapes are not so limited, but values larger than '\xFF' typically have implementation defined representations.)
Related
Does computet convert every single ascii digit (in binary) to its numerical equivalent (in binary) ?
Let's say if 9 is given as input then its ascii value will be 00111001 and we know that binary of 9 is 1001 then how computer will convert ascii value of 9 to binary of 9.
It is only when doing arithmetic that a bit pattern represents a numeric value to a digital computer. (It would be possible to create a digital computer that doesn't even do arithmetic.)
It is a human convenience to describe bit patterns as numbers. Hexadecimal is the most common form because it is compact, represents each bit in an easily discernable way and aligns well with storage widths (such as multiples of 8 bits).
How a bit pattern is interpreted depends on the context. That context is driven by programs following conventions and standards, the vast majority of which are beyond the scope of the computer hardware itself.
Some bit patterns are programs. Certain bits may identify an operation, some a register, some an instruction location, some a data location and only some a numeric value.
If you have a bit pattern that you intend represents the character '9' then it does that as long as it flows through a program where that interpretation is built-in or carried along. For convenience, we call the bit pattern for a character, a "character code".
You could write a program that converts the bit pattern for the character '9' to the bit pattern for a particular representation of the numeric value 9. What follows is one way of doing that.
C requires that certain characters are representable, including digits '0' to '9', and that the character codes for those characters, when interpreted as numbers, are consecutive and increasing.
Subtraction of two numbers on a number line is a measure the distance between them. So, in C, subtracting the character code for '0' from the character for any decimal digit character is the distance between the digit and '0', which is the numeric value of the digit.
'9' - '0'
equals the 9 because of the requirements in C for the bit patterns for character codes and the bit patterns for integer.
Note: A binary representation is not very human-friendly in general. It is used when hexadecimal would obscure the details of the discussion.
Note: C does not require ASCII. ASCII is simply one character set and character encoding that satisfies C's requirements. There are many character sets that are supersets of and compatible with ASCII. You are probably using one of them.
Try this sample program, it shows how ASCII input is converted to a binary integer and back again.
#include <stdio.h>
int main()
{
int myInteger;
printf("Enter an integer: ");
scanf("%d",&myInteger);
printf("Number = %d",myInteger);
return 0;
}
It's a bit crude and doesn't handle invalid input in any way.
Can you please help me to understand the output of this simple code:
const char str[10] = "55\01234";
printf("%s", str);
The output is:
55
34
The character sequence \012 inside the string is interpreted as an octal escape sequence. The value 012 interpreted as octal is 10 in decimal, which is the line feed (\n) character on most terminals.
From the Wikipedia page:
An octal escape sequence consists of \ followed by one, two, or three octal digits. The octal escape sequence ends when it either contains three octal digits already, or the next character is not an octal digit.
Since your sequence contains three valid octal digits, that's how it's going to be parsed. It doesn't continue with the 3 from 34, since that would be a fourth digit and only three digits are supported.
So you could write your string as "55\n34", which is more clearly what you're seeing and which would be more portable since it's no longer hard-coding the newline but instead letting the compiler generate something suitable.
\012 is an escape sequence which represents octal code of symbol:
012 = 10 = 0xa = LINE FEED (in ASCII)
So your string looks like 55[LINE FEED]34.
LINE FEED character is interpreted as newline sequence on many platforms. That is why you see two strings on a terminal.
\012 is a new line escape sequence as others stated already.
(What might be, as chux absolute correct commented, different if ASCII isn't the used charset. But anyway it is in this notation an octal digit.)
this is meant by standard as it says for c99 in ISO/IEC 9899
for:
6.4.4.4 Character constants
[...]
3 The single-quote ', the double-quote ", the question-mark ?, the backslash \, and
arbitrary integer values are representable according to the following table of escape
sequences:
single quote' \'
double quote" \"
question mark? \?
backslash\ \
octal character \octal digits
hexadecimal character \x hexadecimal digits
And the range it gets bound to:
Constraints
9 The value of an octal or hexadecimal escape sequence shall be in the range of
representable values for the type unsigned char for an integer character constant, or
the unsigned type corresponding to wchar_t for a wide character constant.
I apologise if this is an obvious question. I've been searching online for an answer to this and cannot find one. This isn't relevant to my code per se, it's a curiosity on my part.
I am looking at testing my function to read start and end bytes of a buffer.
If I declare a char array as:
char *buffer;
buffer = "\x0212\x03";
meaning STX12ETX - switching between hex and decimal.
I get the expected error:
warning: hex escape sequence out of range [enabled by default]
I can test the code using all hex values:
"\x02\x31\x32\x03"
I am wanting to know, is there a way to escape the hex value to indicate that the following is a decimal value?
will something like this work for you ?
char *buffer;
buffer = "\x02" "12" "\x03";
according to standard:
§ 5.1.1.2 6. Adjacent string literal tokens are concatenated.
§ 6.4.4.4 3. and 7. Each octal or hexadecimal escape sequence is the longest sequence of characters that can constitute the escape sequence.
the escape characters:
\' - single quote '
\" - double quote "
\? - question mark ?
\ - backslash \
\octal digits
\xhexadecimal digits
So the only way to do it is concatenation of strings with the precompiler concatenation ( listing them one after another).
if you want to know more how the literals are constructed by compiler look at §6.4.4.4 and §6.4.5 they describe how to construct the character literals and string literals respectively.
You can write
"\b12"
to represent a decimal value. Altough you need to use space after hex values for it to work.
buffer = "\x02 \b12\x03";
Or just 12
buffer = "\x02 12\x03";
Basically you need to add a blank character after your hex values to indicate that it's a new value and not the same one
No, there's no way to end a hexadecimal escape except by having an invalid (for the hex value) character, but then that character is of course interpreted in its own right.
The C11 draft says (in 6.4.4.4 14):
[...] a hexadecimal escape sequence is terminated only by a non-hexadecimal character.
Octal escapes don't have this problem, they are limited to three octal digits.
You can always use the octal format. Octal code is always 3 digits.
So to get the character '<-' you simple type \215
From Chapter 2(Sub section 2.3 named Constants) of K&R book on C programming language:
Certain characters can be represented in character and string
constants by escape sequences like \n (newline); these sequences look
like two characters, but represent only one. In addition, an arbitrary
byte-sized bit pattern can be specified by
′\ooo′
where ooo is one to three octal digits (0...7) or by
′\xhh′
where hh is one or more hexadecimal digits (0...9, a...f, A...F). So
we might write
#define VTAB ′\013′ /* ASCII vertical tab */
#define BELL ′\007′ /* ASCII bell character */
or, in hexadecimal,
#define VTAB ′\xb′ /* ASCII vertical tab */
#define BELL ′\x7′ /* ASCII bell character */
The part that confuses me is the following wordings(emphasis mine): where ooo is one to three octal digits (0...7). If there are three octal digits the the number of bits required will be 9(3 for each digit) which exceeds the byte length required for characters. Surely I am missing something here. What is it that I am missing?
\ooo (3 octal digits) does indeed allow a specification of 9-bit values of 0 to 111111111 (binary) or 511. If this is allowed is dependent on the char size.
Assignments such as below generate a warning on many environments because a char is 8 bits in those environments. Typically the highest octal sequence allowed is \377. But a char needs not be 8 bits. OP's "9... exceeds the byte length required for characters" is incorrect.
char *s = "\777"; //warning "Octal sequence out of range"
char c = '\777'; //warning
int i = '\777'; //warning
The 3 octal digit constant '\141' is the same as 'a' in a typically environment where ASCII is used. But in an alternate character set, 'a' could be different. Thus if one wanted a portable bit pattern assignment of 01100001, one could use '\141' instead of 'a'. One could accomplish the same by assigning '\x61'. In some context, an octal pattern may be preferred.
C11 6.4.4.4.9 If no prefix used, "The value of an octal or hexadecimal escape sequence shall be in the range of representable values for the corresponding type: unsigned char"
The range of code numbers of characters is not defined in K&R, as far as I can remember. In the early days, it was usually the ASCII range 0...127. Nowadays it is often an 8-bit range, 0...255, but it could be wider, too. In any case, the implementation-defined limits on the char data type imply restrictions on the escape notations, too.
For example, if the range is 0...127, then \177 is the largest allowed octal escape.
The first octal digit is only allowed to go to 3 (two bits), not 7 (three bits), if we're talking about eight bit bytes. If we're talking about ASCII (7 bit values), the first digit can only be zero or one.
If K&R says otherwise, their description is either incomplete or incorrect.
Example:
char arr[] = "\xeb\x2a";
BTW, are the following the same?
"\xeb\x2a" vs. '\xeb\x2a'
\x indicates a hexadecimal character escape. It's used to specify characters that aren't typeable (like a null '\x00').
And "\xeb\x2a" is a literal string (type is char *, 3 bytes, null-terminated), and '\xeb\x2a' is a character constant (type is int, 2 bytes, not null-terminated, and is just another way to write 0xEB2A or 60202 or 0165452). Not the same :)
As other have said, the \x is an escape sequence that starts a "hexadecimal-escape-sequence".
Some further details from the C99 standard:
When used inside a set of single-quotes (') the characters are part of an "integer character constant" which is (6.4.4.4/2 "Character constants"):
a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
and
An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
So the sequence in your example of '\xeb\x2a' is an implementation defined value. It's likely to be the int value 0xeb2a or 0x2aeb depending on whether the target platform is big-endian or little-endian, but you'd have to look at your compiler's documentation to know for certain.
When used inside a set of double-quotes (") the characters specified by the hex-escape-sequence are part of a null-terminated string literal.
From the C99 standard 6.4.5/3 "String literals":
The same considerations apply to each element of the sequence in a character string literal or a wide string literal as if it were in an integer character constant or a wide character constant, except that the single-quote ' is representable either by itself or by the escape sequence \', but the double-quote " shall be represented by the escape sequence \".
Additional info:
In my opinion, you should avoid avoid using 'multi-character' constants. There are only a few situations where they provide any value over using an regular, old int constant. For example, '\xeb\x2a' could be more portably be specified as 0xeb2a or 0x2aeb depending on what value you really wanted.
One area that I've found multi-character constants to be of some use is to come up with clever enum values that can be recognized in a debugger or memory dump:
enum CommandId {
CMD_ID_READ = 'read',
CMD_ID_WRITE = 'writ',
CMD_ID_DEL = 'del ',
CMD_ID_FOO = 'foo '
};
There are few portability problems with the above (other than platforms that have small ints or warnings that might be spewed). Whether the characters end up in the enum values in little- or big-endian form, the code will still work (unless you're doing some else unholy with the enum values). If the characters end up in the value using an endianness that wasn't what you expected, it might make the values less easy to read in a debugger, but the 'correctness' isn't affected.
When you say:
BTW,are these the same:
"\xeb\x2a" vs '\xeb\x2a'
They are in fact not. The first creates a character string literal, terminated with a zero byte, containing the two characters who's hex representation you provide. The second creates an integer constant.
It's a special character that indicates the string is actually a hexadecimal number.
http://www.austincc.edu/rickster/COSC1320/handouts/escchar.htm
The \x means it's a hex character escape. So \xeb would mean character eb in hex, or 235 in decimal. See http://msdn.microsoft.com/en-us/library/6aw8xdf2.aspx for ore information.
As for the second, no, they are not the same. The double-quotes, ", means it's a string of characters, a null-terminated character array, whereas a single quote, ', means it's a single character, the byte that character represents.
\x allows you to specify the character by its hexadecimal code.
This allows you to specify characters that are normally not printable (some of which have special escape sequences predefined such as '\n'=newline and '\t'=tab '\b'=bell)
A useful website is here.
And I quote:
x Unsigned hexadecimal integer
That way, your \xeb is like 235 in decimal.