I am reading a book, "Network Programming with C" by Lewis Van Winkle and I discovered a line of code in Chapter 5, page 153 that I do not understand the value of. The context is around DNS resolutions through a header. What is the value or practical application of the following code:
/* msg[] is unsigned char*/
const int qdcount = (msg[4] << 8) + msg[5]
I understand that int is a wider type than char but why is the extra precision of value, for hardware interfacing at the firmware level if you choose to go deeper or what? To me, it appears that the grouped object would resolve to a value of the external object since shifting 8 bits is equal to one byte thus making it effectively just a product of itself with two, but why even use this method instead of just using:
2 * msg[5]
Not sure where to start looking for a solution because the problem is not clear.
msg[4] contains the most significant eight bits out of a 16-bit number. msg[5] contains the least significant eight bits.
msg[4] << 8 takes byte containing the most significant of 16 and shifts it left eight positions so they are in the positions of the most significant eight bits of 16. Then + msg[5] puts the byte containing the least significant bits in the low eight bits of the 16. This reconstructs the 16-bit number that msg[4] and msg[5] came from.
msg[4] and msg[5] generally could not be loaded in a single load of a 16-bit integer because C code may execute in implementations in which the more significant byte is in the higher-addressed byte, not the lower-addressed byte and also because of type and aliasing rules in the C standard.
Related
There is a 2 byte packet. The first 5 bits represent the version, the next 5 bits represent the type, and the last 6 bits represent the value. I am passing the packet as a char* to a function (print_pak). In that function, I want to print the version, type, and packet value. How can I do that?
void print_pak(char* pak)
{
}
You need to use bit masking and shifting for this. Also, when doing bit manipulation, I believe it's preferred to work with unsigned integer types since the binary representation of negative integers is not defined by the C standard.
void
print_pak(const unsigned char *pak)
{
unsigned char version, type, value;
version = (pak[0] >> 3);
type = ((pak[0]&0x07) << 2) + (pak[1] >> 6);
value = pak[1]&0x3f;
printf("Version = %u, Type = %u, Value = %u\n", version, type, value);
}
Here's how this works. The << and >> operators are for bit shifting. Suppose you have an unsigned char, x, which holds the bits 10111010. Shifting it right by 3 bits is done by (x >> 3) (operator precedence always trips me up when doing bit manipulation so I throw in parentheses to be safe). The right three bits can't go anywhere and so they fall off. The result is 00010111. Shifting to the left works the same (sort of).
Bit masking, &, implements binary "and". That is, if x is 10101011 and y is 00011111, then x&y only has 1's where x and y share them. So, it would be 00001011.
Let's take all of this and tackle your problem. The version is the first five bits of the first byte. Therefore, we want to right shift the low three bits off.
The type is the last three bits of the first byte followed by the first two bits of the second byte. That first group can be acquired by masking off the high five bits of the first byte (0x07 is 00000111 in binary). The second group can be acquired by right shifting the second byte by six bits. Then, to put them together, you need to left shift the first group by two bits to make room for the second group.
Finally, the value is the low six bits of the second byte which can be acquired by a simple masking of those bits (0x3f is 00111111 in binary).
The ISO C standard states that A "plain" int object has the natural size suggested by the architecture of the execution environment
However, it is also guaranteed that int is at least as large as short, which is at least 16 bits in size.
The natural size suggested by an 8-bit processor, such as a 6502 or 8080, would seem to be an 8-bit int, however that would make int shorter than 16 bits.
So, how large would int be on one of these 8 bit processors?
The 6502 had only the instruction pointer as 16 bit register, the 16 bit integers were handled with 8 bits with multiple statements, e.g. if you do in 16 bits c = a + b
clc ; clear carry bit
lda A_lo ; lower byte of A into accumulator
adc B_lo ; add lower byte of B to accumulator, put carry to carry bit
sta C_lo ; store the result to lower byte of C
lda A_hi ; higher byte of A into accumulator
adc B_hi ; add higher byte of B using carry bit
sta C_hi ; store the result to higher byte of C
8080 and Z80 CPUs at that time had 16 bit registers as well.
The Z80 CPU was still 8 bit architecture. It's 16 bit registers were eventually pairing two 8 bit registers, like BC, DE. The operations with them were much slower then with 8 bit registers because the CPU architecture was 8 bit, but this way 16 bit registers and 16 operations were provided.
8088 architecture was mixed, because it also had 8 bit data bus, but it had 16 bit registers, AX, BX, etc., lower and higher bytes also separately usably as 8 bit registers, AL, AH, etc.
So there were different solutions to use 16 bit integers but 8 bit is simply not a useful integer. That's why C and C++ used also 16 bit for int.
From Section 6.2.5 Types, p5
5 An object declared as type signed char occupies the same amount of storage as a ''plain'' char object. A ''plain'' int object has the natural size suggested by the architecture of the execution environment (large enough to contain any value in the range INT_MIN to INT_MAX as defined in the header <limits.h>).
And 5.2.4.2.1 Sizes of integer types <limits.h> p1
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
...
minimum value for an object of type int
INT_MIN -32767 // -(215 - 1)
maximum value for an object of type int
INT_MAX +32767 // 215 - 1
Then in those platforms, int must be at least 16 bits
Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.
To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.
As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.
I have a card game where I need to display the value of the card after I shuffle I display the value of the card using values (x >> 1)& 0xf where x iterates through the list of 13 cards this is found as bit 1-4 is the value of the card
the above is card type
But when I come across finding the highest pair in the card it only works when I use values[(afterfindingpairs[a]&0xf0)>>4].
This is worked out as 0-4 bits are the no of pairs whereas the 4-7 bits are the values of the pair in the byte of pair type.
It just displays the highest pair as Ace when I use
values[(afterfindingpairs[a]&0xf)>>4].
I'm confused wouldnt the hexadecimal 0xf0 deal with 8 bits rather than the 4 bit between 4-7 of the pair type which would be found by values[(afterfindingpairs[a]&0xf)>>4] which is incorrect.
Explanation as to why this happens would be much appreciated.
You appear to want to manipulate 8-bit values, extracting various ranges of bits. However in some cases you're doing so in such a way as to discard all the bits.
The 8 bits are arranged from least significant (bit 0, which is '1' in decimal), to the most significant (bit 7, which is '128' in decimal).
So if we had the binary number 10010110, this would represent the number (128 + 16 + 4 + 2), or 150, or 0x96 in hex.
If you apply a right-shift to such a number, the bits will be moved to the right by the appropriate number of places. So if we did >>4 to the number above, the result will be 00001001 - or 9. I have assumed we are dealing with unsigned values here, so the upper bits will be filled in with '0'. Note that the result is that the original bits 4-7 are now bits 0-3, and the original bits 0-3 have been discarded.
If you and two numbers, the result is that only bits which are set in both will be set in the result. So effectively this is masking bits. If you mask with 0xf0, this is in binary 11110000, so only the upper bits, 4-7 will remain in the result, and the lower bits 0-3 will be set to zero.
Take your statement:
values[(afterfindingpairs[a]&0xf0)>>4]
The expression afterfindingpairs[a]&0xf0, as per my explanation above, will simply set bits 0-3 to zero, retaining bits 4-7.
The next part of the expression, >>4 will shift those remaining bits down so they become bits 0-3 of the result. Note that this also discards the original bits 0-3, making the previous mask operation redundant (unless we are not dealing with 8-bit values...)
Your other statement:
values[(afterfindingpairs[a]&0xf)>>4]
Is more problematic. You first apply a mask (0xf) retains only bits 0-3, setting all others to zero. Then you apply a shift which throws away bits 0-3, by shifting bits 4-7 (which are already zero) down into their place.
In other words, this latter expression is always zero.
I'm trying to read a binary file into a C# struct. The file was created from C and the following code creates 2 bytes out of the 50+ byte rows.
unsigned short nDayTimeBitStuffed = atoi( LPCTSTR( strInput) );
unsigned short nDayOfYear = (0x01FF & nDayTimeBitStuffed);
unsigned short nTimeOfDay = (0x01F & (nDayTimeBitStuffed >> 9) );
Binary values on the file are 00000001 and 00000100.
The expected values are 1 and 2, so I think some bit ordering/swapping is going on but not sure.
Any help would be greatly appreciated.
Thanks!
The answer is 'it depends' - most notably on the machine, and also on how the data is written to the file. Consider:
unsigned short x = 0x0102;
write(fd, &x, sizeof(x));
On some machines (Intel), the low-order byte (0x02) will be written before the high-order byte (0x01); on others (PPC, SPARC), the high-order byte will be written before the low-order one.
So, from a little-endian (Intel) machine, you'd see the bytes:
0x02 0x01
But from a big-endian (PPC) machine, you'd see the bytes:
0x01 0x02
Your bytes appear to be 0x01 and 0x04. Your calculation for 0x02 appears flawed.
The C code you show doesn't write anything. The value in nDayOfYear is the bottom 9 bits of the input value; the nTimeOfDay appears to be the next 5 bits (so 14 of the 16 bits are used).
For example, if the value in strInput is 12141 decimal, 0x2F6D, then the value in nDayOfYear would be 365 (0x16D) and the value in nTimeOfDay would be 23 (0x17).
It is a funny storage order; you can't simply compare the two values whereas if you packed the day of year in the more significant portion of the value and time into the less significant, then you could compare values as simple integers and get the correct comparison.
The expected file contents are very much related to the processor and compiler used to create the file, if it's binary.
I'm assuming a Windows machine here, which uses 2 bytes for a short and puts them in little endian order.
Your comments don't make much sense either. If it's two bytes then it should be using two chars, not shorts. The range of the first is going to be 1-365, so it definitely needs more than a single byte to represent. I'm going to assume you want the first 4 bytes, not the first 2.
This means that the first byte will be bits 0-7 of the DayOfYear, the second byte will be bits 8-15 of the DayOfYear, the third byte will be bits 0-7 of the TimeOfDay, and the fourth byte will be bits 8-15 of the TimeOfDay.