Purpose of using octal for ASCII - c

Why would a C programmer use escape sequences (oct/hex) for ASCII values rather than decimal?
Follow up: does this have to do with either performance or portability?
Example:
char c = '\075';

You use octal or hexadecimal because there isn't a way to specify decimal codes inside a character literal or string literal. Octal was prevalent in PDP-11 code. These days, it probably makes more sense to use hexadecimal, though '\0' is more compact than '\x0' (so use '\0' when you null terminate a string, etc.).
Also, beware that "\x0ABad choice" doesn't have the meaning you might expect, whereas "\012007 wins" probably does. (The difference is that a hex escape runs on until it comes across a non-hex digit, whereas octal escapes stop after 3 digits at most. To get the expected result, you'd need "\x0A" "Bad choice" using 'adjacent string literal concatenation'.)
And this has nothing to do with performance and very little to do with portability. Writing '\x41' or '\101' instead of 'A' is a way of decreasing the portability and readability of your code. You should only consider using escape sequences when there isn't a better way to represent the character.

No it does not have anything to do with performance and portability. It is just one convenient way to define character literals and to use in string literal specially for non-printable characters.

It has nothing to do with performance nor portability. In fact, you don't need any codes at all, instead of this:
char c = 65;
You can simply write:
char c = 'A';
But some characters are not so easy to type, e.g. ASCII SOH, so you might write:
char c = 1; // SOH
Or any other form, hexadecimal, octal, depending on your preference.

It has nothing to do with performance nor with portability. It is simply that the ASCII character set (as are its derivatives up to UTF) is organized in bytes and bits. For example, the 32 first characters are the control characters, 32 = 040 = 0x20, ASCII code of 'A' is 65 = 0101 = 0x41 and 'a' is 97 = 0141 = 0x61, ASCII code of '0' is 48 = 060 = 0x30.
I do not know for you, but for me '0x30' and 0x'41' are easier to remember and use in manual operations than 48 and 65.
By the way a byte represents exactly all value between 0 and 255 that is 0 and 0xFF ...

I didn't know this works.
But I got immediatly a pretty usefull idea for it.
Imagin you have got a low memory enviroment and have to use a permission system like the unix folder permissions.
Lets say there are 3 groups and for each group 2 different options which can be allowed or denied.
0 means none of both options,
1 means first option allowed,
2 means second option allowed and
3 means both allowed.
To store the permissions you could do it like:
char* bar = "213"; // first group has allowed second option, second group has first option allowed and third has full acces.
But there you have four byte storage for that information.
Ofc you could just convert that to decimal notation. But thats less readable.
But now as I know this....
doing:
char bar = '\0213';
Is pretty readable and also saving memory!
I love it :D

Related

How does atof.c work? Subtracting an ASCII zero from an ASCII digit makes it an int? Am I missing something?

So as part of my C classes, for our first homework we are supposed to implement our own atof.c function, and then use it for some tasks. So, being the smart stay-at-home student I am I decided to look at the atof.c source code and adapt it to meet my needs. I think i'm on board with most of the operations that this function does, like counting the digits before and after the decimal point, however there is one line of code that I do not understand. I'm assuming this is the line that actually converts the ASCII digit into a digit of type int. Posting it here:
frac1 = 10*frac1 + (c - '0');
in the source code, c is the digit that they are processing, and frac1 is an int that stores some of the digits from the incoming ASCII string. but why does c- '0' work?? And as a followup, is there another way of achieving the same result?
There is no such thing as "text" in C. Just APIs that happen to treat integer values as text information. char is an integer type, and you can do math with it. Character literals are actually ints in C (in C++ they're char, but they're still usable as numeric values even there).
'0' is a nice way for humans to write "the ordinal value of the character for zero"; in ASCII, that's the number 48. Since the digits appear in order from 0 to 9 in all encodings I'm aware of, you can convert from the ordinal value in the encoding (e.g. ASCII) to actual numeric values by subtracting away '0' to get actual int values from 0 to 9.
You could just as easily subtract 48 directly (when compiled, it would be impossible to tell which option you used; 48 and ASCII '0' are indistinguishable), it would just be less obvious what you were doing to other people reading your source code.
The ASCII value of '0' is the 48'th character in code page 437 (IBM default character set). Similarly, '1' is the 49'th etc. Subtracting '0' instead of a magic number such as 48 is much clearer as far as self-documentation goes.

Interpreting a hex number as decimal

I have an embedded system running C-code which works pretty straightforwardly; it reads characters via a serial connection, interprets the received chars as hexadecimal numbers and depending on what was received, proceeds to do something else. However, there is one very special case where the chars received are decimal instead of hex. Since this case is very rare (kind of an error case of an error case), I don't wish to modify the actual character reception to decide whether to interpret the received value as dec or hex, but rather to add a quick algorithm into the case handling where I change the number into decimal.
What would you say is the fastest (as in most efficient processor-wise) way of doing this? Since the software is running on a small MCU, any C library functions are not an option since I don't wish to add any more unnecessary #include's, so a purely mathematical algorithm is what I'm searching for.
Just to be clear, I'm not asking the quickest way to do a basic hex-to-dec- conversion as in 0x45 -> dec 69, but what I want to do is to transform eg. 0x120 into decimal 120.
Thanks in advance!
EDIT: Sorry, I'll try to explain in more detail. The actual code is way too long, and I think pasting it here is unnecessary. So here's what happens:
First I read a received number from the serial line, let's say "25". Then I turn it into hex number, so I have a variable with the read value, let's say X = 0x25. This works already fine, and I don't want to do modifications to this. What I would like to do now in this very special case is just to change the interpretation of the variable so that instead of X == 0x25, X==25. Hexadecimal 0x25 turns into decimal 25. There has to be some kind of mathematical formula for such a change, without the need of any processor-specific instructions or library functions?
If I'm understanding correctly, you've already converted a stream of ASCII characters into a char/int variable, assuming them to be a stream of hex-digits. In some cases, they were actually a stream of decimal digits (e.g. you received 45 and, treating this as hex, got a variable with value 69 when -- in one special case -- you actuially want its value to be 45.
Assuming two-characters, (00-ff in general, but for "was meant to be decimal" we're talking 00-99) then:
int hexVal = GetHexStringFromSerialPort() ;
int decVal = 10*(hexVal >> 4) + (hexVal & 0x0f) ;
should do the trick. If you've got longer strings, you'll need to extend the concept further.
Just do a simple while loop like this, supposing onum and dnum are unsigned integers
dnum = 0;
while (onum) {
digit = onum & 0xF;
dnum = dnum*10 + digit;
onum >>= 4;
}
this supposes that onum is really of the form that you describe (no hexdigits that are >9). It just succs the least significant hexdigit out of your number and adds it to your decimal.
Checking if your string starts with 0x characters and removing them should do the trick.

Special char Literals

I want to assign a char with a char literal, but it's a special character say 255 or 13.I know that I can assign my char with a literal int that will be cast to a char: char a = 13;I also know that Microsoft will let me use the hex code as a char literal: char a = '\xd'
I want to know if there's a way to do this that gcc supports also.
Writing something like
char ch = 13;
is mostly portable, to platforms on which the value 13 is the same thing as on your platform (which is all systems which uses the ASCII character set, which indeed is most systems today).
There may be platforms on which 13 can mean something else. However, using '\r' instead should always be portable, no matter the character encoding system.
Using other values, which does not have character literal equivalents, are not portable. And using values above 127 is even less portable, since then you're outside the ASCII table, and into the extended ASCII table, in which the letters can depend on the locale settings of the system. For example, western European and eastern European language settings will most likely have different characters in the 128 to 255 range.
If you want to use a byte which can contain just some binary data and not letters, instead of using char you might be wanting to use e.g. uint8_t, to tell other readers of your code that you're not using the variable for letters but for binary data.
The hexidecimal escape sequence is not specific to Microsoft. It's part of C/C++: http://en.cppreference.com/w/cpp/language/escape
Meaning that to assign a hexidecimal number to a char, this is cross platform code:
char a = '\xD';
The question already demonstrates assigning a decimal number to a char:
char a = 13;
And octal numbers can also be assigned as well, with only the escape switch:
char a = '\023';
Incidentally, '\0' is common in C/C++ to represent the null-character (independent of platform). '\0' is not a special character that can be escaped. That's actually invoking the octal escape sequence.

How long can a char be?

Why does int a = 'adf'; compile and run in C?
The literal 'adf' is a multi-byte character constant. Its value is platform dependent. Don't use it.
For example, one some platform a 32-bit unsigned integer could take the value 0x00616466, and on another it could be 0x66646100, and on yet another it could be 0x84860081...
This, as Kerrek said, is a multi-byte character constant. It works because each character takes up 8 bits. 'adf' is 3 characters, which is 24 bits. An int is usually large enough to contain this.
But all of the above is platform dependent, and could be different from architecture to architecture. This kind of thing is still used in ancient Apple code, can't quite remember where, although file creator codes ring a bell.
Note the difference in syntax between " and '.
char *x = "this is a string. The value assigned to x is a pointer to the string in memory"
char y = '!' // the value assigned to y is the numerical character value of the character '!'
char z = 'asd' // the value of z is the numerical value of the 'string' data, which can in theory be expressed as an int if it's short enough
It works just because "adf" is 3 ASCII characters and thus 3 bytes long and your platform is a 24 bit or larger system. It would fail on a 16bit system for instance.
Its also worth remembering that although sizeof(char) will always return 1, dependending on platform and compiler more than 1 byte of memory space could be assigned to a char hence for
struct st
{
int a;
char c;
};
when you:
sizeof(st) a number of 32 bit systems will return 8. This is because the system will pad out the single byte for char c to 4 bytes.
ASCII. Every character has a numerical value. Halfway through this tutorial is a description if you need more information http://en.wikibooks.org/wiki/C_Programming/Variables
Edit_______________________________________
char letter2 = 97; /* in ASCII, 97 = 'a' */
This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing – the letter "a", but the first method is clearer, easier to debug, and much more straightforward.
One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' is not equal to 1.

Endianness -- why do chars put in an Int16 print backwards?

The following C code, compiled and run in XCode:
UInt16 chars = 'ab';
printf("\nchars: %2.2s", (char*)&chars);
prints 'ba', rather than 'ab'.
Why?
That particular implementation seems to store multi-character constants in little-endian format. In the constant 'ab' the character 'b' is the least significant byte (the little end) and the character 'a' is the most significant byte. If you viewed chars as an array, it'd be chars[0] = 'b' and chars[1] = 'a', and thus would be treated by printf as "ba".
Also, I'm not sure how accurate you consider Wikipedia, but regarding C syntax it has this section:
Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.
So it appears the 'ab' multi-character constant format should be avoided in general.
It depends on the system you're compiling/running your program on.
Obviously on your system, the short value is stored in memory as 0x6261 (ba): the little endian way.
When you ask to decode a string, printf will read byte by byte the value you have stored in memory, which actually is 'b', then 'a'. Thus your result.
Multicharacter character literals are implementation-defined:
C99 6.4.4.4p10: "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."
gcc and icl print ba on Windows 7. tcc prints a and drops the second letter altogether...
The answer to your question can be found in your tags: Endianness. On a little endian machine the least significant byte is stored first. This is a convention and does not affect efficiency at all.
Of course, this means that you cannot simply cast it to a character string, since the order of characters is wrong, because there are no significant bytes in a character string, but just a sequence.
If you want to view the bytes within your variable, I suggest using a debugger that can read the actual bytes.

Resources