Buffer Parsing in C

Buffer Parsing in C - c

i am looking at the CoAP implementation in ContikiOS, particularly at the header parsing and i am having some trouble understanding two operations. The code is bellow.
coap_pkt->version = (COAP_HEADER_VERSION_MASK & coap_pkt->buffer[0]) >> COAP_HEADER_VERSION_POSITION;
coap_pkt->type = (COAP_HEADER_TYPE_MASK & coap_pkt->buffer[0]) >> COAP_HEADER_TYPE_POSITION;
The coap_pkt the structure that houses the packet and the parsed values.
the version is an uint8_t (unsigned char) type
The buffer houses the packet
The COAP_HEADER_VERSION_MASK value is 0xC0
COAP_HEADER_VERSION_POSITION is 6
the type value is an enum structure with 4 values
COAP_HEADER_TYPE_MASK value is 0x30
COAP_HEADER_TYPE_POSITION value is 4
Now according to CoAP RFC 7252 the both the version and the type occupy two bits and so the mask operations and used values make sense to me.
My question is why are the shifting values different in the two operations ?,does it have something to do with one using an unsigned char and another the enum structure?

Basically this shifts both of the values down to the least significant bits.
The version bits are at the most significant bits, let's call them bits 7 and 6. So, the four possible values with the mask applied are 0xC0, 0x80, 0x40 and 0x00. For later use, e.g. for version comparisons, changing the range to 3, 2, 1 and 0 might be more useful, so shifting by six bits moves the value to bits 1 and 0. (In fact the mask is unnecessary in this case because the shifting eliminates all but the most significant two bits.)
It's the same story with the type bits, but those are bits 5 and 4, giving you 0x30, 0x20, 0x10 and 0x00 after applying the mask. Shifting by four bits moves those bits to 1 and 0.

Related

Bitwise Shift to MSB in One Byte Added to Itself

I am reading a book, "Network Programming with C" by Lewis Van Winkle and I discovered a line of code in Chapter 5, page 153 that I do not understand the value of. The context is around DNS resolutions through a header. What is the value or practical application of the following code:
/* msg[] is unsigned char*/
const int qdcount = (msg[4] << 8) + msg[5]
I understand that int is a wider type than char but why is the extra precision of value, for hardware interfacing at the firmware level if you choose to go deeper or what? To me, it appears that the grouped object would resolve to a value of the external object since shifting 8 bits is equal to one byte thus making it effectively just a product of itself with two, but why even use this method instead of just using:
2 * msg[5]
Not sure where to start looking for a solution because the problem is not clear.

msg[4] contains the most significant eight bits out of a 16-bit number. msg[5] contains the least significant eight bits.
msg[4] << 8 takes byte containing the most significant of 16 and shifts it left eight positions so they are in the positions of the most significant eight bits of 16. Then + msg[5] puts the byte containing the least significant bits in the low eight bits of the 16. This reconstructs the 16-bit number that msg[4] and msg[5] came from.
msg[4] and msg[5] generally could not be loaded in a single load of a 16-bit integer because C code may execute in implementations in which the more significant byte is in the higher-addressed byte, not the lower-addressed byte and also because of type and aliasing rules in the C standard.

Swapping an integer with a short using a generic function

Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.

To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.

As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.

Binary notation and Endianness

Can we say that our 'traditional' way of writing in binary
is Big Endian?
e.g., number 1 in binary:
0b00000001 // Let's assume its possible to write numbers like that in code and b means binary
Also when I write a constant 0b00000001 in my code, this will always refer to integer 1 regardless if machine is big endian or little endian right?
In this notation the LSB is always written as the last element from the right, and MSB is always written as the left most element right?

Yes, humans generally write numerals in big-endian order (meaning that the digits written first have the most significant value), and common programming languages that accept numerals interpret them in the same way.
Thus, the numeral “00000001” means one; it never means one hundred million (in decimal) or 128 (in binary) or the corresponding values in other bases.
Much of C semantics is written in terms of the value of a number. Once a numeral is converted to a value, the C standard describes how that value is added, multiplied, and even represented as bits (with some latitude regarding signed values). Generally, the standard does not specify how those bits are stored in memory, which is where endianness in machine representations comes into play. When the bits representing a value are grouped into bytes and those bytes are stored in memory, we may see those bytes written in different orders on different machines.
However, the C standard specifies a common way of interpreting numerals in source code, and that interpretation is always big-endian in the sense that the most significant digits appear first.

If you want to put it that way, then yes, we humans write numerals in Big-Endian order.
But I think you have a misunderstanding in terms of your target runnign with big or little endian.
In your actual C-Code, it does not matter which endianess your target machine uses. For example these lines will always display the same, no matter the endianess of your system:
uint32 x = 0x0102;
printf("Output: %x\n",x); // Output: 102
or to take your example:
uint32 y = 0b0001;
printf("Output: %d\n",y); // Output: 1
However the storage of the data in your memory differs between Little and Big Endian.
Big Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x01 0x02 0x03 0x04
Little Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x04 0x03 0x02 0x01
Both times the actualy value is 0x01020304 (and this is what you assign in your C-Code).
You only have to worry about it, if you do memory operations. If you have a 4-Byte (uint8) array, which represents a 32-Bit integer and you want to copy it into a uint32 variable you need to care.
uint8 arr[4] = {0x01, 0x02, 0x03, 0x04};
uint32 var;
memcpy(&var,arr,4);
printf("Output: %x\n",var);
// Big Endian: Output: 0x01020304
// Little Endian: Output: 0x04030201

displaying values of a card using bits in C

I have a card game where I need to display the value of the card after I shuffle I display the value of the card using values (x >> 1)& 0xf where x iterates through the list of 13 cards this is found as bit 1-4 is the value of the card
the above is card type
But when I come across finding the highest pair in the card it only works when I use values[(afterfindingpairs[a]&0xf0)>>4].
This is worked out as 0-4 bits are the no of pairs whereas the 4-7 bits are the values of the pair in the byte of pair type.
It just displays the highest pair as Ace when I use
values[(afterfindingpairs[a]&0xf)>>4].
I'm confused wouldnt the hexadecimal 0xf0 deal with 8 bits rather than the 4 bit between 4-7 of the pair type which would be found by values[(afterfindingpairs[a]&0xf)>>4] which is incorrect.
Explanation as to why this happens would be much appreciated.

You appear to want to manipulate 8-bit values, extracting various ranges of bits. However in some cases you're doing so in such a way as to discard all the bits.
The 8 bits are arranged from least significant (bit 0, which is '1' in decimal), to the most significant (bit 7, which is '128' in decimal).
So if we had the binary number 10010110, this would represent the number (128 + 16 + 4 + 2), or 150, or 0x96 in hex.
If you apply a right-shift to such a number, the bits will be moved to the right by the appropriate number of places. So if we did >>4 to the number above, the result will be 00001001 - or 9. I have assumed we are dealing with unsigned values here, so the upper bits will be filled in with '0'. Note that the result is that the original bits 4-7 are now bits 0-3, and the original bits 0-3 have been discarded.
If you and two numbers, the result is that only bits which are set in both will be set in the result. So effectively this is masking bits. If you mask with 0xf0, this is in binary 11110000, so only the upper bits, 4-7 will remain in the result, and the lower bits 0-3 will be set to zero.
Take your statement:
values[(afterfindingpairs[a]&0xf0)>>4]
The expression afterfindingpairs[a]&0xf0, as per my explanation above, will simply set bits 0-3 to zero, retaining bits 4-7.
The next part of the expression, >>4 will shift those remaining bits down so they become bits 0-3 of the result. Note that this also discards the original bits 0-3, making the previous mask operation redundant (unless we are not dealing with 8-bit values...)
Your other statement:
values[(afterfindingpairs[a]&0xf)>>4]
Is more problematic. You first apply a mask (0xf) retains only bits 0-3, setting all others to zero. Then you apply a shift which throws away bits 0-3, by shifting bits 4-7 (which are already zero) down into their place.
In other words, this latter expression is always zero.

How are the values stored in the C unsigned shorts?

I'm trying to read a binary file into a C# struct. The file was created from C and the following code creates 2 bytes out of the 50+ byte rows.
unsigned short nDayTimeBitStuffed = atoi( LPCTSTR( strInput) );
unsigned short nDayOfYear = (0x01FF & nDayTimeBitStuffed);
unsigned short nTimeOfDay = (0x01F & (nDayTimeBitStuffed >> 9) );
Binary values on the file are 00000001 and 00000100.
The expected values are 1 and 2, so I think some bit ordering/swapping is going on but not sure.
Any help would be greatly appreciated.
Thanks!

The answer is 'it depends' - most notably on the machine, and also on how the data is written to the file. Consider:
unsigned short x = 0x0102;
write(fd, &x, sizeof(x));
On some machines (Intel), the low-order byte (0x02) will be written before the high-order byte (0x01); on others (PPC, SPARC), the high-order byte will be written before the low-order one.
So, from a little-endian (Intel) machine, you'd see the bytes:
0x02 0x01
But from a big-endian (PPC) machine, you'd see the bytes:
0x01 0x02
Your bytes appear to be 0x01 and 0x04. Your calculation for 0x02 appears flawed.
The C code you show doesn't write anything. The value in nDayOfYear is the bottom 9 bits of the input value; the nTimeOfDay appears to be the next 5 bits (so 14 of the 16 bits are used).
For example, if the value in strInput is 12141 decimal, 0x2F6D, then the value in nDayOfYear would be 365 (0x16D) and the value in nTimeOfDay would be 23 (0x17).
It is a funny storage order; you can't simply compare the two values whereas if you packed the day of year in the more significant portion of the value and time into the less significant, then you could compare values as simple integers and get the correct comparison.

The expected file contents are very much related to the processor and compiler used to create the file, if it's binary.
I'm assuming a Windows machine here, which uses 2 bytes for a short and puts them in little endian order.
Your comments don't make much sense either. If it's two bytes then it should be using two chars, not shorts. The range of the first is going to be 1-365, so it definitely needs more than a single byte to represent. I'm going to assume you want the first 4 bytes, not the first 2.
This means that the first byte will be bits 0-7 of the DayOfYear, the second byte will be bits 8-15 of the DayOfYear, the third byte will be bits 0-7 of the TimeOfDay, and the fourth byte will be bits 8-15 of the TimeOfDay.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight