bitwise-and with HEX and CHAR in C - c

I'm really getting frustrated here. Trying to implement the CRC-CCITT algorithm and I found a very nice example on an Internet site.
There is one line whose output I completely don't understand:
unsigned short update_crc_ccitt( unsigned short crc, char c){
[...]
short_c = 0x00ff & (unsigned short) c;
[...]
}
I want to calculate the CRC of the "test" string "123456789". So in the first run the char 'c' is 1. From my understanding short_c from the first run should be equal to 1 as well, but when I print it to the console, I get short_c = 49 for c = 1. How?
0x00ff in binary is: 1 1 1 1 1 1 1 1
char 1 in binary is: 0 0 0 0 0 0 0 1
bitand should be : 0 0 0 0 0 0 0 1
Where is my mistake?

The character 1 has ASCII code 0x31 = 49. This is different from the character with ASCII code 1 (which is ^A).

You are confusing characters and numbers, basically. The first letter in the string "123456789"is the character '1', whose decimal value on most typical computers is 49.
This value is decided by the encoding of the characters, which describes how each character is assigned a numerical value which is what your computer stores.
C guarantees that the encoding for the 10 decimal digits will be in a compact sequence with no gaps, starting with '0'. So, you can always convert a character to the corresponding number by doing:
const int digitValue = digit - '0';
This will convert the digit '0' to the integer 0, and so on for all the digits up to (and including) '9'.

Related

Converting char to int using typecasting makes 1s into 49 and 0s into 48 in C?

I have a one byte char array consisting of a binary value and I am trying to split it into a two-dimensional int array (low nybble and high nybble). This is my code:
int nybbles[2][4]; //[0][] is low nybble, [1][] is high nybble.
for (int i = 0; i < 4; i++) {
nybbles[0][i] = (int)binarr[i];
nybbles[1][i] = (int)binarr[4 + i];
printf("%c%d ", binarr[i], nybbles[0][i]);
printf("%c%d\n", binarr[4 + i], nybbles[1][i]);
}
The output of this is:
048
149
048
048
149
048
048
048
I can easily fix this by adding a "- 48" to the end of both lines of code, as such:
nybbles[0][i] = (int)binarr[i] - 48;
nybbles[1][i] = (int)binarr[4 + i] - 48;
However, I see this as a very brute force solution. Why does this problem exist anyway? Are there better fixes than mine?
The values 48 and 49 are the ASCII codes for the characters '0' and '1'.
Rather than subtracting 48, subtract '0'. This makes it more clear what you're doing.
nybbles[0][i] = binarr[i] - '0';
nybbles[1][i] = binarr[4 + i] - '0';
Characters are encoded using numeric values, in ASCII, '0' code is 48, '1' is 49 and so on, what you are doing is using this to deduce the integer values of these characters.
Example:
'0' - '0' is equal to 0, why? Because 48 - 48 is equal to 0, just as '1' - '0' is equal to 1, meaning 49 - 48 is equal to 1, you see the pattern.
There is no brute force, just character arithmetic, it's quite common. I would just use:
nybbles[0][i] = binarr[i] - '0';
Besides being clearer, it's more portable given that ASCII is not the only encoding in existance, but all of them have contiguous digit encoding.
The number zero is not the same as the digit "0". On your platform, the code for the digit "0" happens to be 48.
Fundamentally, values and representations are different. Ten is the number of fingers I have. It can be represented as "X", "10", "0x0A", "ten", or "..........".
The digit "0" can be used to represent the value zero. Or it can be used for other purposes. Regardless of what you use it for, on your platform that character is represented by the character code whose value is forty-eight.
It is absolutely vital for programmers to understand that values and representations are different things. Use '0' when you need the value your platform uses to represented the character "0".
This is because of the conversion between ASCII and byte values. The computer just sees bytes (numbers), and those numbers mean different things depending on the context. When we're talking about characters and strings we're normally talking about ASCII text encoding. It has its own system. For instance, the value 0 is null in ASCII. The value 48 is 0 in ASCII. 57 is 9 in ASCII. 65 is A in ASCII. Interesting right? So, to get the correct ASCII digit from a byte value, you must subtract the ASCII offset (the numerical digits start at number 48, therefore subtract 48). The capital letters have an offset of 65, on the other hand.
Doing a search with terms such as "ascii byte binary table values comparison" will bring you to pages like this IBM-provided table.
The best solution in code is to use a to_string function, which is available in most languages. In C, you might have to look for a good external library, or just live with the arithmetic.
You experience this behavior because the digits you get from the original string are characters encoded in ASCII format.
So, 48 = 0x30 and 49 = 0x31 are the ASCII values to represent '0' and '1' characters respectively.
For the record, the "brute" force subtraction is how usually digit characters are converted to the corresponding integer values. The following expression is just to understand how it works
char charDigit = '4';
int digit = charDigit - 48; // digit is equal to integer 4, because '4' is ecoded by 52 (0x34 in hex)
Even better (it is the expression that is actually commonly used):
char charDigit = '4';
int digit = charDigit - '0'; // digit is equal to integer 4
It is better because it works not only for ASCII encoding.

Confusion regarding getchar() in c [duplicate]

Can someone explain why this works?
char c = '9';
int x = (int)(c - '0');
Why does subtracting '0' from an ascii code of a char result the number that that char is representing?
Because the char are all represented by a number and '0' is the first of them all.
On the table below you see that:
'0' => 48
'1' => 49
'9' => 57.
As a result: ('9' - '0') = (57 − 48) = 9
Source: http://www.asciitable.com
char is an integer type, just like int and family. An object of type char has some numerical value. The mapping between characters that you type in a character literal (like '0') and the value that the char object has is determined by the encoding of that character in the execution character set:
C++11 §2.14.3:
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
C99 §6.4.4.4:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
[...]
An integer character constant has type int.
Note that the int can be converted to a char.
The choice of execution character set is up to the implementation. More often than not, the choice is ASCII compatible, so the tables posted in other answers have the appropriate values. However, the character set does not need to be ASCII compatible. There are some restrictions, though. One of them is as follows (C++11 §2.3, C99 §5.2.1):
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
[...]
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
This means that whatever value the character '0' has, the character '1' has value one more than '0', and character '2' has value one more than that, and so on. The numeric characters have consecutive values. You can summarise the mapping like so:
Character: 0 1 2 3 4 5 6 7 8 9
Corresponding value: X X+1 X+2 X+3 X+4 X+5 X+6 X+7 X+8 X+9
All of the digit characters have values offset from the value of '0'.
That means, if you have a character, let's say '9' and subtract '0' from it, you get the "distance" between the value of '9' and the value of '0' in the execution character set. Since they are consecutive, the distance will be 9.
Because the C standard guarantees that the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are always in this order regarding their numerical character code. So, if you subtract the char code of '0' from another digit, it will give its position relative to 0, which is its value...
From the C standard, Section 5.2.1 Character sets:
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous
Because, the literals are arranged in sequence.
So if 0 was 48, 1 will be 49, 2 will be 50 etc.. in ASCII, then x would contain, ascii value of '9' minus the ascii value of '0' which means, ascii value of '9' would be 57 and hence, x would contain 57 - 48 = 9.
Also, char is an integral type.
the code ascii of numeric chars are ordered in the order '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' as indicated in the ascii table
so if we make difference beween asii of '9' and ascii of '0' we will get 9
In the ASCII-table the Digits are aligned sequentially, starting with the lowest code for 0. If you subtract a higher number from 0, you create the difference of the two ASCII-values.
So, 9 has value 57 and 0 has 48, so if you subtract 48 from 57 you get 9.
Just have a look at the ASCII-table.
Look here.
Look at the ASCII TABLE:
'9' in ASCII = 57 //in Decimal
'0' in ASCII = 48 //in Decimal
57 - 48 = 9
First, try:
cout << (int)'0' << endl;
now try:
cout << (int)'9' << endl;
the charictors represent numbers in text form, but have a different value in when taken as a number.
Windows uses a Number to decide which charictor to print. So the number 0x30 represents the charictor 0 in the windows OS. The number 0x39 represents the charictor 9. After all, all a computer can recognize is numbers, it does'nt know what a "char" is.
Unfortunatly (int)('f' - '0') does not equal 15, though.
This gives you the various charictors and the number windows uses to represent them.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd375731(v=vs.85).aspx
If you need to find that for another OS, you can search: Virtual Key Codes <OSname> in Google. to see what other OS's have as their codes.

Why it takes only the last 4 bits in my following example?

#include<stdio.h>
int main()
{
char c=48;
int i, mask=01;
for(i=1; i<=5; i++)
{
printf("%c", c|mask);
mask = mask<<1;
}
return 0;
}
I am learning for an exam, and this was a solved exercise with the answer: 12480 because %c takes only the last 4 bits. I don't understand why, as I know that sizeof char is 1 byte.
Let's look at the binary representation of 48:
2^ 7 6 5 4 3 2 1 0
---------------
0 0 1 1 0 0 0 0
The last 4 bits are not set. On the first 4 iterations of the loop, the mask sets one of those bits in the result, and the result reflects the change. On the last iteration, the mask is setting a bit which is already set, so there is no change for the last character printed.
As for what is being printed, in ASCII 48 is the character code for 0. The following digits are in order after that one. When you mask in the values for the first 4 iterations, because none of the bits in the value 48 are set in the mask, it is effectively the same as adding. So the first 4 characters printed have ASCII values 49 (1), 50 (2), 52 (4), and 56 (8).
Character code 48 is usually 0. When it's or'd with one, it prints digit 1. Then it prints the digits 2, 4, 8. Finally, 48|16 = 48, so it prints 0.
It takes them all, however the last time around you or that 1 with a bit that already is 1 (48 is 110000 in binary)

BCD to Ascii and Ascii to BCD

I'm trying to convert a BCD to ascii and vice versa and saw a solution similar to this while browsing, but don't fully understand it. Could someone explain it?
void BCD_Ascii(unsigned char src, char *dest) {
outputs = "0123456789"
*dest++ = outputs[src>>4];
*dest++ = outputs[src&0xf];
*dest = '\0';
}
Regarding your first question: Explaining the method:
You are right about src>>4, this shifts the character value 4 bits to the right which means it is returning the value of the higher hexdecimal digit.
e.g. if src is '\30' then src>>3 will evaluate to 3 or '\3'.
src&0xf is getting the lower hexdecimal digit by ANDing the src value with 0xF with is the binary value 1111 (not 11111111). e.g. if src is '\46' then src&0xf will evaluate to 6.
There are two important notes here while trying to understand the method:
First: The method cannot handle input when src has either the two digits above 9. i.e. if src was equal to '\3F' for instance, the method will overrun buffer.
Second: beware that this method adds the two digit characters at a certain location in a string and then terminates the string. The caller logic should be responsible for where the location is, incrementing the pointer, and making sure the output buffer allows three characters (at least) after the input pointer location.
Regarding your second question:
A reverse method could be as following:
unsigned char Ascii_BCD(const char* src) {
return (unsigned char)((src[0] - 0x30) * 0x10 + src[1] - 0x30);
}
[Edit: adding explanation to the reverse method]
The two ASCII digits at location 0 and 1 are subtracted of by 0x30 or '0' to convert from ascii to binary. E.g. the digit '4' is represented by the ascii code 0x34 so subtracting 0x30 will evaluate to 4.
Then the first digit which is the higher is multiplied by 0x10 to shift the value by 4 bits to the left.
The two values are added to compose the BCD value.
The opposite function can be:
BYTE ASC_BCD( char * asc )
{
return (BYTE)( ( asc[0] & 15 ) << 4 ) | ( asc[1] & 15 );
}
Char codes '0'..'9' can be converted to hex with & 15 or & 0x0F. Then make shift and | to combine.
The function converts a character in binary-coded decimal into a string.
First the upper 4 bits of src are obtained:
src>>4
The function then assumes the values those bits represent are in the range 0-9. Then that value is used to get an index in the string literal outputs:
outputs[src>>4];
The value is written into address which is pointed to by dest. This pointer is then incremented.
*dest++ = outputs[src>>4];
Then the lower 4 bits of src are used:
src&0xf
Again assuming the values of those bits, are representing a value in range 0-9. And the rest is the same as before:
*dest++ = outputs[src&0xf];
Finally a 0 is written into dest, to terminate it.

How is this bitwise AND operator masking the lower seven order bits of the number?

I am reading The C Programming Language by Brian Kernigan and Dennis Ritchie. Here is what it says about the bitwise AND operator:
The bitwise AND operator & is often used to mask off some set of bits, for example,
n = n & 0177
sets to zero all but the low order 7 bits of n.
I don't quite see how it is masking the lower seven order bits of n. Please can somebody clarify?
The number 0177 is an octal number representing the binary pattern below:
0000000001111111
When you AND it using the bitwise operation &, the result keeps the bits of the original only in the bits that are set to 1 in the "mask"; all other bits become zero. This is because "AND" follows this rule:
X & 0 -> 0 for any value of X
X & 1 -> X for any value of X
For example, if you AND 0177 and 0545454, you get
0000000001111111 -- 0000177
0101010101010101 -- 0545454
---------------- -------
0000000001010101 -- 0000154
In C an integer literal prefixed with 0 is an octal number so 0177 is an octal number.
Each octal digit (of value 0 to 7) is represented with 3 bits and 7 is the greatest value for each digit. So a value of 7 in octal means 3 bits set.
Since 0177 is an octal literal and each octal number is 3 three bits you have, the following binary equivalents:
7 = 111
1 = 001
Which means 0177 is 001111111 in binary.
It is already explained that the first '0' used for octal representation of a number in ANSI C. Actually, the number 0177 (octal) is same with 127 (in decimal), which is 128-1 and also can be represented as 2^7-1, and 2^n-1 in binary representation means take n 1's and put all the 1's to the right.
0177 = 127 = 128-1
which is a bitmask;
0000000000000000000000001111111
You can check the code down below;
Demo
#include <stdio.h>
int main()
{
int n = 0177; // octal representation of 127
printf("Decimal:[%d] : Octal:[%o]\n", n, n, n);
n = 127; // decimal representation of 127
printf("Decimal:[%d] : Octal:[%o]\n", n, n, n);
return 0;
}
Output
Decimal:[127] : Octal:[177]
Decimal:[127] : Octal:[177]
0177 is an octal value each digit is represented by 3 bits form the value 000 to 111 so 0177 translates to 001111111 (i.e 001|111|111) which if you consider in 32 bit binary ( can be 64 bit too except the remainder of the digits are populated as per the MSB i.e sign bit in this case value 0) form is 0000000000000000000000001111111 and and performing a bitwise with it for a given number, will output the lower 7 bits of the number turning of rest of the digits in the n-bit number to 0.
(since x&0 =0 & x&1=x e.g 0&0=0 ,1&0=0, 1&1=1 0&1=1)

Resources