I'm trying to read a binary file into a C# struct. The file was created from C and the following code creates 2 bytes out of the 50+ byte rows.
unsigned short nDayTimeBitStuffed = atoi( LPCTSTR( strInput) );
unsigned short nDayOfYear = (0x01FF & nDayTimeBitStuffed);
unsigned short nTimeOfDay = (0x01F & (nDayTimeBitStuffed >> 9) );
Binary values on the file are 00000001 and 00000100.
The expected values are 1 and 2, so I think some bit ordering/swapping is going on but not sure.
Any help would be greatly appreciated.
Thanks!
The answer is 'it depends' - most notably on the machine, and also on how the data is written to the file. Consider:
unsigned short x = 0x0102;
write(fd, &x, sizeof(x));
On some machines (Intel), the low-order byte (0x02) will be written before the high-order byte (0x01); on others (PPC, SPARC), the high-order byte will be written before the low-order one.
So, from a little-endian (Intel) machine, you'd see the bytes:
0x02 0x01
But from a big-endian (PPC) machine, you'd see the bytes:
0x01 0x02
Your bytes appear to be 0x01 and 0x04. Your calculation for 0x02 appears flawed.
The C code you show doesn't write anything. The value in nDayOfYear is the bottom 9 bits of the input value; the nTimeOfDay appears to be the next 5 bits (so 14 of the 16 bits are used).
For example, if the value in strInput is 12141 decimal, 0x2F6D, then the value in nDayOfYear would be 365 (0x16D) and the value in nTimeOfDay would be 23 (0x17).
It is a funny storage order; you can't simply compare the two values whereas if you packed the day of year in the more significant portion of the value and time into the less significant, then you could compare values as simple integers and get the correct comparison.
The expected file contents are very much related to the processor and compiler used to create the file, if it's binary.
I'm assuming a Windows machine here, which uses 2 bytes for a short and puts them in little endian order.
Your comments don't make much sense either. If it's two bytes then it should be using two chars, not shorts. The range of the first is going to be 1-365, so it definitely needs more than a single byte to represent. I'm going to assume you want the first 4 bytes, not the first 2.
This means that the first byte will be bits 0-7 of the DayOfYear, the second byte will be bits 8-15 of the DayOfYear, the third byte will be bits 0-7 of the TimeOfDay, and the fourth byte will be bits 8-15 of the TimeOfDay.
Related
I am reading a book, "Network Programming with C" by Lewis Van Winkle and I discovered a line of code in Chapter 5, page 153 that I do not understand the value of. The context is around DNS resolutions through a header. What is the value or practical application of the following code:
/* msg[] is unsigned char*/
const int qdcount = (msg[4] << 8) + msg[5]
I understand that int is a wider type than char but why is the extra precision of value, for hardware interfacing at the firmware level if you choose to go deeper or what? To me, it appears that the grouped object would resolve to a value of the external object since shifting 8 bits is equal to one byte thus making it effectively just a product of itself with two, but why even use this method instead of just using:
2 * msg[5]
Not sure where to start looking for a solution because the problem is not clear.
msg[4] contains the most significant eight bits out of a 16-bit number. msg[5] contains the least significant eight bits.
msg[4] << 8 takes byte containing the most significant of 16 and shifts it left eight positions so they are in the positions of the most significant eight bits of 16. Then + msg[5] puts the byte containing the least significant bits in the low eight bits of the 16. This reconstructs the 16-bit number that msg[4] and msg[5] came from.
msg[4] and msg[5] generally could not be loaded in a single load of a 16-bit integer because C code may execute in implementations in which the more significant byte is in the higher-addressed byte, not the lower-addressed byte and also because of type and aliasing rules in the C standard.
I'm trying to get a random number Long Int (4 bytes) into the last 4 bytes of a byte array (Dim MybyteArray (1 to 16) As Byte). Can't see a simple way to do this. Any ideas?
Windows uses little endian data. Each byte is normal, big to leftmost and small to rightmost. But the bytes are in reverse order. by LSBit to MSBit
00000000, 11111111, 22222222, 33333333
LSByte MSByte
Or In Hex
&h01, &h23, &h45, &h67 = &h67452301
Intel uses Little endian and Motorola uses big endian.
I found this program during an online test on c programming I tried it on my level but I cannot figure it out that why the output of this program comes out to be 64.
Can anyone explain the concept behind this?
#include <iostream>
#include <stdio.h>
using namespace std;
int main()
{
int a = 320;
char *ptr;
ptr = (char *)&a;
printf("%d",*ptr);
return 0;
}
output:
64
Thankyou.
A char * points to one byte only. Assuming a byte on your system is 8 bits, the number 320 occupies 2 bytes. The lower byte of those is 64, the upper byte is 1, because 320 = 256 * 1 + 64. That is why you get 64 on your computer (a little-endian computer).
But note that on other platforms, so called big-endian platforms, the result could just as well be 1 (the most significant byte of a 16 bit/2 byte value) or 0 (the most significant byte of a value larger than 16 bit/2 bytes).
Note that all this assumes that the platform has 8-bit bytes. If it had, say 10-bit bytes, you would get a different result again. Fortunately, most computers have 8-bit bytes nowadays.
You won't be able to understand this unless you know about:
hex/binary represenation, and
CPU endianess.
Type out the decimal number 320 in hex. Split it up in bytes. Assuming int is 4 bytes, you should be able to tell which parts of the number that goes in which bytes.
After that, consider the endianess of the given CPU and sort the bytes in that order. (MS byte first or LS byte first.)
The code accesses the byte allocated at the lowest address of the integer. What it contains depends on the CPU endianess. You'll either get hex 0x40 or hex 0x00.
Note: You shouldn't use char for these kind of things, because it has implementation-defined signedness. In case the data bytes contains values larger than 0x7F, you might get some very weird bugs, that inconsistently appear/disappear across multiple compilers. Always use uint8_t* when doing any form of bit/byte manipulation.
You can expose this bug by replacing 320 with 384. Your little endian system may then either print -128 or 128, you'll get different results on different compilers.
What #Lundin said is enough.
BTW, maybe some basic knowledge is helpful. 320 = 0x0140. a int = 4 char. So when print the first byte, it output 0x40 = 64 because of cpu endianess.
ptr is char pointer of a. Thus *ptr will give char value of a. char occupies only 1 byte thus it repeats its values after 255. That is 256 becomes 0, 257 becomes 1 and so on. Thus 320 becomes 64.
Int is four byte data byte while char is one byte data byte, char pointer can keep the address one byte at time. Binary value of 320 is 00000000 00000000 00000001 01000000. So, char pointer ptr is pointing to only first byte.
*ptr i.e. content of first byte is 01000000 and its decimal value is 64.
Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.
To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.
As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.
#include <stdio.h>
union bits_32{
unsigned int x;
struct {char b4,b3,b2,b1;} byte;
} ;
int main(int argc, char **argv){
union bits_32 foo;
foo.x=0x100000FA;
printf("%x",foo.byte.b4 & 0xFF);
}
This will output FA. Why doesn't it output 10 since b4 occupies the first space?
It's depends on endianess of your machine. If your machine is little endian it prints FA(Your's is little endian right?). If your machine is big endian it prints 10.
Storing Words in Memory
We've defined a word to mean 32 bits. This is the same as 4 bytes. Integers, single-precision floating point numbers, and MIPS instructions are all 32 bits long. How can we store these values into memory? After all, each memory address can store a single byte, not 4 bytes.
The answer is simple. We split the 32 bit quantity into 4 bytes. For example, suppose we have a 32 bit quantity, written as 90AB12CD16, which is hexadecimal. Since each hex digit is 4 bits, we need 8 hex digits to represent the 32 bit value.
So, the 4 bytes are: 90, AB, 12, CD where each byte requires 2 hex digits.
It turns out there are two ways to store this in memory.
Big Endian
In big endian, you store the most significant byte in the smallest address. Here's how it would look:
Address Value
1000 90
1001 AB
1002 12
1003 CD
Little Endian
In little endian, you store the least significant byte in the smallest address. Here's how it would look:
Address Value
1000 CD
1001 12
1002 AB
1003 90
Notice that this is in the reverse order compared to big endian. To remember which is which, recall whether the least significant byte is stored first (thus, little endian) or the most significant byte is stored first (thus, big endian).
Notice I used "byte" instead of "bit" in least significant bit. I sometimes abbreciated this as LSB and MSB, with the 'B' capitalized to refer to byte and use the lowercase 'b' to represent bit. I only refer to most and least significant byte when it comes to endianness.