What does it mean "bytes numbered from 0 (LSB) to 3 (MSB)"? - c

I should extract byte n from word x.
Example: getByte(0x12345678,1) = 0x56.
And there is written, that bytes numbered from 0(LSB) to 3(MSB), the meaning of which I can't understand.
Thank you.

Consider your 32 bit word (0x12345678) as 4 bytes:
Word : 12 34 56 78 (hex)
Byte #: 3 2 1 0
MSB<-----LSB
MSB = Most Significant Byte
LSB = Least Signficant Byte

It means that you are supposed to consider x composed of bytes as x = &Sum;n&in;[0,4) bn × 256n, and given x you are supposed to compute bn. That is, b0 is the least-significant byte and b3 is the most-significant byte.

MSB and LSB mean Most Significative Byte and Least Significative Byte, respectively. A byte being a 8-bit number that can be directly represented by 2 hexadecimal positions. So, the number 0x12345678 is a word containing 4 bytes, 12 34 56 78. The rightmost is the LSB, and the leftmost is the MSB. So you are taking the byte 1 that is the SECOND byte from right to left.

Related

Bits of the primitive type in C

Well, I'm starting my C studies and I was left with the following question, how are the bits of the primitive types filled in, for example, the int type, for example, has 4 bytes, that is 32 bits, which fits up to 4294967296. But if for example , I use a value that takes only 1 byte, how do the other bits stay?
#include <stdio.h>
int main(void) {
int x = 5; // 101 how the rest of the bits are filled
// which was not used?
return 0;
}
All leading bits will be set to 0, otherwise the value wouldn't be 5. A bit, in today computers, has only two states so if it's not 0 then it's 1 which would cause the value stored to be different. So assuming 32 bits you have that
5 == 0b00000000 00000000 00000000 00000101
5 == 0x00000005
The remaining bits are stored with 0.
int a = 356;
Now let us convert it to binary.
1 0 1 1 0 0 1 0 0
Now you get 9 bit number. Since int allocates 32 bits, fill the remaining 23 bits with 0.
So the value stored in memory is
00000000 00000000 00000001 01100100
The type you have picked determines how large the integer is, not the value you store inside the variable.
If we assume that int is 32 bits on your system, then the value 5 will be expressed as a 32 bit number. Which is 0000000000000000000000000101 binary or 0x00000005 hex. If the other bits had any other values, it would no longer be the number 5, 32 bits large.

Compress a struct into a binary file? [C]

This is part of my homework that I'm having difficults to solve.
I have a simple structure:
typedef struct Client {
char* lname;
unsigned int id;
unsigned int car_id;
} Client;
And the exercise is:
Create a text file named as the company name and then branch number with txt extention.
the file contain all clients' details.
The file you created in exercise 1 will be compressed. as a result, a binary file be created with .cmpr extention.
I don't really have an idea how to implement 2.
I remember at the lectures that the professor said we have to use "all" the variable, with binary operators (<< , >> , | , &, ~), but I don't know how to used it.
I'm using Ubuntu, under GCC and Eclipse. I'm using C.
I'd be glad to get helped. thanks!
Let's say the file from step 1 looks like:
user1798362
2324
462345
where the three fields were simply printed on three lines. Note that the above is the text/readable (i.e. ASCII) representation of that file.
Looking at the contents of this file in hex(adecimal) representation we get (with the ASCII character printed below each byte value):
75 73 65 72 31 37 39 38 33 36 32 0a 32 33 32 34 0a 34 36 32 33 34 35 0a
u s e r 1 7 9 8 3 6 2 nl 2 3 2 4 nl 4 6 2 3 4 5 nl
here nl is of course the newline character. You can count that there are 24 bytes.
In step 2 you have to invent another format that saves as many bits as possible. The simplest way to do this is to compress each of the three fields individually.
Similar to where the text format uses a nl to mark the end of a field, you also need a way to define where a binary field begins and ends. A common way is to put a length in front of the binary field data. As a first step we could replace the nl's with a length and get:
58 75 73 65 72 31 37 39 38 33 36 32 20 32 33 32 34 30 34 36 32 33 34 35
-- u s e r 1 7 9 8 3 6 2 -- 2 3 2 4 -- 4 6 2 3 4 5
For now we simply take a whole byte for the length in bits. Note that 58 is the hex representation of 77 (i.e. 11 characters * 8 bits), the bit length of lname',20hex equals 4 * 8 = 32, and30is 6 * 8 = 48. This does not compress anything, as it's still 24 bytes in total. But we already got a binary format because58,20and30` got a special meaning.
The next step would be to compress each field. This is where it gets tricky. The lname field consists of ASCII character. In ASCII only 7 of the 8 bits are needed/used; here's a nice table For example the letter u in binary is 01110101. We can safely chop off the leftmost bit, which is always 0. This yields 1110101. The same can be done for all the characters. So you'll end up with 11 7-bit values -> 77 bits.
These 77 bits now must be fit in 8-bit bytes. Here are the first 4 bytes user in binary representation, before chopping the leftmost bit off:
01110101 01110011 01100101 01110010
Chopping off a bit in C is done by shifting the byte (i.e. unsigned char) to the left with:
unsigned char byte = lname[0];
byte = byte << 1;
When you do this for all characters you get:
1110101- 1110011- 1100101- 1110010-
Here I use - to indicate the bits in these bytes that are now available to be filled; they became available by shifting all bits one place to the left. You now use one or more bit from the right side of the next byte to fill up these - gaps. When doing this for these four bytes you'll get:
11101011 11001111 00101111 0010----
So now there's a gap of 4 bits that should be filled with the bit from the character 1, etc.
Filling up these gaps is done by using the binary operators in C which you mention. We already use the shift left <<. To combine 1110101- and 1110011- for example we do:
unsigned char* name; // name MUST be unsigned to avoid problems with binary operators.
<allocated memory for name and read it from text file>
unsigned char bytes[10]; // 10 is just a random size that gives us enough space.
name[0] = name[0] << 1; // We shift to the left in-place here, so `name` is overwritten.
name[1] = name[1] << 1; // idem.
bytes[0] = name[0] | (name[1] >> 7);
bytes[1] = name[1] << 1;
With name[1] >> 7 we have 1110011- >> 7 which gives: 00000001; the right most bit. With the bitwise OR operator | we then 'add' this bit to 1110101-, resulting in 111010111.
You have to do things like this in a loop to get all the bits in the correct bytes.
The new length of this name field is 11 * 7 = 77, so we've lost a massive 11 bits :-) Note that with a byte length, we assume that the lname field will never be more than 255 / 7 = 36 characters long.
As with the bytes above, you can then coalesce the second length against the final bits of the lname field.
To compress the numbers you first read 'em in with (fscanf(file, %d, ...)) in an unsigned int. There will be many 0s at the left side in this 4-byte unsigned int. The first field for example is (shown in chunks of 4 bit only for readability):
0000 0000 0000 0000 0000 1001 0001 0100
which has 20 unused bits at the left.
You need to get rid of these. Do 32 minus the number of zero's at the left, and you get the bit-length of this number. Add this length to the bytes array by coalescing its bits against those of previous field. Then only add the significant bits of the number to the bytes. This would be:
1001 0001 0100
In C, when working with the bits of an 'int' (but also 'short', 'long', ... any variable/number larger than 1 byte), you must take byte-order or endianness into account.
When you do the above step twice for both numbers, you're done. You then have a bytes array you can write to a file. Of course you must have kept where you were writing in bytes in the steps above; so you know the number of bytes. Note that in most cases there will be a few bits in the last byte that are not filled with data. But that doesn't hurt and it simply unavoidable waste of the fact that files are stored in chunks of 8 bits = 1 byte minimally.
When reading the binary file, you'll get a reverse process. You'll read in a unsigned char bytes array. You then know that the first byte (i.e. bytes[0]) contains the bit-length of the name field. You then fill in the bytes of the 'lname' byte-by-byte by shifting and masking. etc....
Good luck!

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

c language bitwise trick

Here is code i saw in a C program , i knew this piece of code is to set a bit in the bit ASCII bit map corresponding to the character c.
field[ (c & 0x7f) >> 3 ] |= 1 << (c & 0x07);
field is an array of 16 characters, each character is 8 bits.
for example '97' is lower case 'a', if we set c to 97, then bit position 97 will be set to 1.
any one know why above code will set bit map corresponding to the character c?
and what are those magic number 0x7f, 0x07, 3 and 1 for?
If your array is 16 bytes long, it has 128 bits (16 x 8). So the first mask (0x7f) guarantees that you are only interested in the first 128 bits. Once you shift it 3 bits right, you have 4 bits left that are used to address your bitfield (the number ((c & 0x7F) >> 3 is a number between 0 and 15). So this part uses the upper 4 bits to address the byte.
Now, you need to address the bit in the byte, so you use the mask 0x07 to limit the value to the range 0 - 7 (corresponding to the bits 0 to 7). You use this number to shift the 1 so many positions.
At the end, you have a bit set in a position 0 to 127 (16 bytes of 8 bits). I hope this helps!
First, to clear up the magic numbers
0x7f is 0111 1111 in binary. This means the lower 7 bits of c are significant. This is then shifted by 3 so that only the original 0xxx x000 (4) bits are significant. But since these bits are shifted by 3 they count 0 to 15.
0x07 is 0000 0111 in binary. This means only the lower 3 bits are significant. The number 1 is shifted left by the value in these 3 bits, resulting a bit set in bit positions 0 to 7 within the byte.
In the end, the function only uses the lower 7 bits in the byte, which are the only significant bits in an ascii character. It uses the upper 4 for addressing the byte in the array and the bottom 3 to address the bit in the addressed byte.

Decomposition of an IP header

I have to do a sniffer as an assignment for the security course. I am using C and the pcap library. I got everything working well (since I got a code from the internet and changed it). But I have some questions about the code.
u_int ip_len = (ih->ver_ihl & 0xf) * 4;
ih is of type ip_header, and its currently pointing the to IP header in the packet.
ver_ihl gives the version of the IP.
I can't figure out what is: & 0xf) * 4;
& is the bitwise and operator, in this case you're anding ver_ihl with 0xf which has the effect of clearing all the bits other than the least signifcant 4
0xff & 0x0f = 0x0f
ver_ihl is defined as first 4 bits = version + second 4 = Internet header length. The and operation removes the version data leaving the length data by itself. The length is recorded as count of 32 bit words so the *4 turns ip_len into the count of bytes in the header
In response to your comment:
bitwise and ands the corresponding bits in the operands. When you and anything with 0 it becomes 0 and anything with 1 stays the same.
0xf = 0x0f = binary 0000 1111
So when you and 0x0f with anything the first 4 bits are set to 0 (as you are anding them against 0) and the last 4 bits remain as in the other operand (as you are anding them against 1). This is a common technique called bit masking.
http://en.wikipedia.org/wiki/Bitwise_operation#AND
Reading from RFC 791 that defines IPv4:
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The first 8 bits of the IP header are a combination of the version, and the IHL field.
IHL: 4 bits
Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data. Note that
the minimum value for a correct header is 5.
What the code you have is doing, is taking the first 8 bits there, and chopping out the IHL portion, then converting it to the number of bytes. The bitwise AND by 0xF will isolate the IHL field, and the multiply by 4 is there because there are 4 bytes in a 32-bit word.
The ver_ihl field contains two 4-bit integers, packed as the low and high nybble. The length is specified as a number of 32-bit words. So, if you have a Version 4 IP frame, with 20 bytes of header, you'd have a ver_ihl field value of 69. Then you'd have this:
01000101
& 00001111
--------
00000101
So, yes, the "&0xf" masks out the low 4 bits.

Resources