Why are decimal numbers used in bitmasks? - c

This is a pretty basic question, and I'm sure that there's an easy answer to it, but I don't know the search term I should be using to look for an answer. Here it goes:
I'm trying to understand how bitmasks work. On Linux systems there's:
struct stat
that has a st_mode member that's used to determine whether the file being inspected is a regular file, a directory, a symbolic link, and others. So, it's possible to write a simple function that you can pass a name to and get whether or not the name represents a directory:
16 int isadir( char *name )
17 /*
18 * calls stat, then masks the st_mode word to obtain the
19 * filetype portion and sees if that bit pattern is the
20 * pattern for a directory
21 */
22 {
23 struct stat info;
24
25 return ( stat(name,&info)!=-1 && (info.st_mode & S_IFMT) == S_IFDIR );
26 }
When I look at the bitmask, I see it's represented as follows:
/* Encoding of the file mode. */
#define __S_IFMT 0170000 /* These bits determine file type. */
I thought bitmasks could only have 0s and 1s. Why is there a 7 in the mask?

Numbers starting with a leading 0 are octal numbers — this is standard C syntax.
And these can be useful for bitmasks, especially to represent Unix permissions.

A byte is 8 bits, and can be expressed in decimal (0 to 255), octal (000 to 377), hexadecimal (00 to FF) or binary (00000000 to 11111111). Let's number the bits, from bit 0 to bit 7:
76543210
Actually a number may be expressed in any base, but mainly octal and hexadecimal are convenient when one want to break down the number into bits ; expressing a byte in octal is easier as
z y x
76543210
x is bits 0 to 2, y is bits 3 to 5 and z is bits 6 and 7.
Thus in your exemple, 017 octal number is
0 1 7
00 001 111
Numbers expressed in octal base (8-base) are easier to be converted to binary. (in hexa that would be 0F).
In C (...), octal literal numbers start with a leading zero (0...), and in hexadecimal they start with leading 0x (0x...). As it is easier to visualize bits of numbers expressed in octal,
022 & 017
gives in binary
"00 010 010" &
"00 001 111"
result can be found out easily
"00 000 010"
In decimal, that would be 18 & 15.

Related

explanation on bitwise operators

I found this piece of code online and it works as part of my project, but I'm not sure why. I don't want to just use it without understanding what it does.
type = (packet_data[12] << 8) | packet_data[13];
if I use it I get the proper type (0x0800 for IPv4) and can use it for comparison on printing out whether it's IPv4 or IPv6. If I don't use it and try something like:
if(packet_data[12] == 08 && packet_data[13] == 00)
print out IPv4
it doesn't work (compiling errors).
Also if I just print out the values like
printf"%02X", packet_data[12];
printf"%02X", packet_data[13];
it prints out the proper value in the form 0800, but I need to print out that it's an IPv4 type. Which is why I need to comparison in the first place. Thanks for any piece of advice or explanation on what this does would be much appreciated. Thanks
if(packet_data[12] == 08 && packet_data[13] == 00)
the right literal operands are seen as octal base literals by the compiler.
Fortunately for you, 8 cannot represent an octal number and you're getting a compilation error.
You mean hexadecimal literals:
if (packet_data[12] == 0x8 && packet_data[13] == 0x0)
this line:
(packet_data[12] << 8) | packet_data[13]
recreates the big endian value (network convention) of the data located at offsets 12 & 13. Both are equivalent in your case, although the latter is more convenient to compare values as a whole.
packet_data[12] << 8 takes the first Ethertype octet and shifts it 8 bits to the left to the upper 8 bits of a 16-bit word.
| packet_data[13] takes the second Ethertype octet and bitwise-ORs it to the previous 16-bit word.
You can then compare it to 0x0800 for IPv4 or 0x86DD for IPv6; see a more complete list on https://en.wikipedia.org/wiki/EtherType#Examples
As has already been pointed out 08 doesn't work since numerals starting with 0 represent octal numbers, and 8 doesn't exist in octal.
type = (packet_data[12] << 8) | packet_data[13];
The << is bitwise shift left. It takes the binary representation of the variable and shifts its 1's to the left, 8 bits in this case.
'0x0800' looks like 100000000000 in binary. So in order for 0x0800 to be the type, it has to end up looking like that after | packet_data[13]. This last part is bitwise OR. It will write a 1 if either the left side or the right side have a 1 in that place and a 0 otherwise.
So after shifting the value in packet_data[12], the only way for it to be type 0x0800 (100000000000) is if packet_data[13] looks like 0x0800 or 0x0000:
type = (0x800) <==> ( 100000000000 | 100000000000 )
type = (0x800) <==> ( 100000000000 | 000000000000 )
Also, to get the 0x out from printf() you need a to add the %# format specifier. But to get 0x0800 you need to specify a .04 which means 4 characters including leading zeros. However this won't output the 0x if the type is 0. For that you'd need to hardcode the literal 0x into printf().
printf("%#02x\n", data);
printf("%#.04x\n", data);
printf("0x%.04x\n", data=0);
Output
0x800
0x0800
0x0000

How is this bitwise AND operator masking the lower seven order bits of the number?

I am reading The C Programming Language by Brian Kernigan and Dennis Ritchie. Here is what it says about the bitwise AND operator:
The bitwise AND operator & is often used to mask off some set of bits, for example,
n = n & 0177
sets to zero all but the low order 7 bits of n.
I don't quite see how it is masking the lower seven order bits of n. Please can somebody clarify?
The number 0177 is an octal number representing the binary pattern below:
0000000001111111
When you AND it using the bitwise operation &, the result keeps the bits of the original only in the bits that are set to 1 in the "mask"; all other bits become zero. This is because "AND" follows this rule:
X & 0 -> 0 for any value of X
X & 1 -> X for any value of X
For example, if you AND 0177 and 0545454, you get
0000000001111111 -- 0000177
0101010101010101 -- 0545454
---------------- -------
0000000001010101 -- 0000154
In C an integer literal prefixed with 0 is an octal number so 0177 is an octal number.
Each octal digit (of value 0 to 7) is represented with 3 bits and 7 is the greatest value for each digit. So a value of 7 in octal means 3 bits set.
Since 0177 is an octal literal and each octal number is 3 three bits you have, the following binary equivalents:
7 = 111
1 = 001
Which means 0177 is 001111111 in binary.
It is already explained that the first '0' used for octal representation of a number in ANSI C. Actually, the number 0177 (octal) is same with 127 (in decimal), which is 128-1 and also can be represented as 2^7-1, and 2^n-1 in binary representation means take n 1's and put all the 1's to the right.
0177 = 127 = 128-1
which is a bitmask;
0000000000000000000000001111111
You can check the code down below;
Demo
#include <stdio.h>
int main()
{
int n = 0177; // octal representation of 127
printf("Decimal:[%d] : Octal:[%o]\n", n, n, n);
n = 127; // decimal representation of 127
printf("Decimal:[%d] : Octal:[%o]\n", n, n, n);
return 0;
}
Output
Decimal:[127] : Octal:[177]
Decimal:[127] : Octal:[177]
0177 is an octal value each digit is represented by 3 bits form the value 000 to 111 so 0177 translates to 001111111 (i.e 001|111|111) which if you consider in 32 bit binary ( can be 64 bit too except the remainder of the digits are populated as per the MSB i.e sign bit in this case value 0) form is 0000000000000000000000001111111 and and performing a bitwise with it for a given number, will output the lower 7 bits of the number turning of rest of the digits in the n-bit number to 0.
(since x&0 =0 & x&1=x e.g 0&0=0 ,1&0=0, 1&1=1 0&1=1)

Compress a struct into a binary file? [C]

This is part of my homework that I'm having difficults to solve.
I have a simple structure:
typedef struct Client {
char* lname;
unsigned int id;
unsigned int car_id;
} Client;
And the exercise is:
Create a text file named as the company name and then branch number with txt extention.
the file contain all clients' details.
The file you created in exercise 1 will be compressed. as a result, a binary file be created with .cmpr extention.
I don't really have an idea how to implement 2.
I remember at the lectures that the professor said we have to use "all" the variable, with binary operators (<< , >> , | , &, ~), but I don't know how to used it.
I'm using Ubuntu, under GCC and Eclipse. I'm using C.
I'd be glad to get helped. thanks!
Let's say the file from step 1 looks like:
user1798362
2324
462345
where the three fields were simply printed on three lines. Note that the above is the text/readable (i.e. ASCII) representation of that file.
Looking at the contents of this file in hex(adecimal) representation we get (with the ASCII character printed below each byte value):
75 73 65 72 31 37 39 38 33 36 32 0a 32 33 32 34 0a 34 36 32 33 34 35 0a
u s e r 1 7 9 8 3 6 2 nl 2 3 2 4 nl 4 6 2 3 4 5 nl
here nl is of course the newline character. You can count that there are 24 bytes.
In step 2 you have to invent another format that saves as many bits as possible. The simplest way to do this is to compress each of the three fields individually.
Similar to where the text format uses a nl to mark the end of a field, you also need a way to define where a binary field begins and ends. A common way is to put a length in front of the binary field data. As a first step we could replace the nl's with a length and get:
58 75 73 65 72 31 37 39 38 33 36 32 20 32 33 32 34 30 34 36 32 33 34 35
-- u s e r 1 7 9 8 3 6 2 -- 2 3 2 4 -- 4 6 2 3 4 5
For now we simply take a whole byte for the length in bits. Note that 58 is the hex representation of 77 (i.e. 11 characters * 8 bits), the bit length of lname',20hex equals 4 * 8 = 32, and30is 6 * 8 = 48. This does not compress anything, as it's still 24 bytes in total. But we already got a binary format because58,20and30` got a special meaning.
The next step would be to compress each field. This is where it gets tricky. The lname field consists of ASCII character. In ASCII only 7 of the 8 bits are needed/used; here's a nice table For example the letter u in binary is 01110101. We can safely chop off the leftmost bit, which is always 0. This yields 1110101. The same can be done for all the characters. So you'll end up with 11 7-bit values -> 77 bits.
These 77 bits now must be fit in 8-bit bytes. Here are the first 4 bytes user in binary representation, before chopping the leftmost bit off:
01110101 01110011 01100101 01110010
Chopping off a bit in C is done by shifting the byte (i.e. unsigned char) to the left with:
unsigned char byte = lname[0];
byte = byte << 1;
When you do this for all characters you get:
1110101- 1110011- 1100101- 1110010-
Here I use - to indicate the bits in these bytes that are now available to be filled; they became available by shifting all bits one place to the left. You now use one or more bit from the right side of the next byte to fill up these - gaps. When doing this for these four bytes you'll get:
11101011 11001111 00101111 0010----
So now there's a gap of 4 bits that should be filled with the bit from the character 1, etc.
Filling up these gaps is done by using the binary operators in C which you mention. We already use the shift left <<. To combine 1110101- and 1110011- for example we do:
unsigned char* name; // name MUST be unsigned to avoid problems with binary operators.
<allocated memory for name and read it from text file>
unsigned char bytes[10]; // 10 is just a random size that gives us enough space.
name[0] = name[0] << 1; // We shift to the left in-place here, so `name` is overwritten.
name[1] = name[1] << 1; // idem.
bytes[0] = name[0] | (name[1] >> 7);
bytes[1] = name[1] << 1;
With name[1] >> 7 we have 1110011- >> 7 which gives: 00000001; the right most bit. With the bitwise OR operator | we then 'add' this bit to 1110101-, resulting in 111010111.
You have to do things like this in a loop to get all the bits in the correct bytes.
The new length of this name field is 11 * 7 = 77, so we've lost a massive 11 bits :-) Note that with a byte length, we assume that the lname field will never be more than 255 / 7 = 36 characters long.
As with the bytes above, you can then coalesce the second length against the final bits of the lname field.
To compress the numbers you first read 'em in with (fscanf(file, %d, ...)) in an unsigned int. There will be many 0s at the left side in this 4-byte unsigned int. The first field for example is (shown in chunks of 4 bit only for readability):
0000 0000 0000 0000 0000 1001 0001 0100
which has 20 unused bits at the left.
You need to get rid of these. Do 32 minus the number of zero's at the left, and you get the bit-length of this number. Add this length to the bytes array by coalescing its bits against those of previous field. Then only add the significant bits of the number to the bytes. This would be:
1001 0001 0100
In C, when working with the bits of an 'int' (but also 'short', 'long', ... any variable/number larger than 1 byte), you must take byte-order or endianness into account.
When you do the above step twice for both numbers, you're done. You then have a bytes array you can write to a file. Of course you must have kept where you were writing in bytes in the steps above; so you know the number of bytes. Note that in most cases there will be a few bits in the last byte that are not filled with data. But that doesn't hurt and it simply unavoidable waste of the fact that files are stored in chunks of 8 bits = 1 byte minimally.
When reading the binary file, you'll get a reverse process. You'll read in a unsigned char bytes array. You then know that the first byte (i.e. bytes[0]) contains the bit-length of the name field. You then fill in the bytes of the 'lname' byte-by-byte by shifting and masking. etc....
Good luck!

What does 0xFE mean in a C program?

In the given program below,
void main()
{
int x=0xFE;
int y=0xF3;
.....
}
What values are assigned by x=0xFE and y=0xF3? What does 0x represent?
Writing a value with 0x before means it is written in hexadecimal notation, where the numbers 0-9 and additional "number" A-F are used to get a number system with the base 16. The big advantage of this is that each hexadecimal digit represents exactly 4 bits.
0xFE = 254
0xF3 = 243
So x = 254 and y = 243.
Numbers can be represented in different ways. In C, you can use decimal (e.g. 243), octal (e.g. 0363) or hexadecimal (0xF3).
If you write 243, you mean
243 = 2*10^2 + 4*10^1 + 3*10^0 = 2*100 + 4*10 + 3 = 243
where ^ means "to the power of". That means our normal numbers are base 10, or decimal. Hexadecimal uses base 16, and the digits are 0123456789ABCDEF, where 0=0, ... 9=9, A=10, B=11, ... F=15. So 243 can also be written as
0xF3 = 15*16^1 + 3*16^0 = 15*16 + 3 = 243
That is what you see. In other words, 0xF3 is just another way to write 243, and 0xFE is another way to write 254 (15*16 + 14 = 254).
The advantage is that each hexadecimal digit represents 4 bits, so 2 hexadecimal digits can be used to display a byte. If you know that 0x0 is 0000 in binary, and so on, up to 0xF, which is 1111 in binary, an experienced user can easily "see" the bits in a byte.
FWIW, octal is in base 8, so the only digits are 01234567. I have always found it a bit awkward to use.
The values are in hexadecimal since they are preceded by the 0x identifier.
To convert the values, you can use your computer calculator on programmer mode or google.
http://screensnapr.com/v/aQEPYk.png

Converting Decimal to Hexadecimal and Octal

Show how to write a constant in C, whose decimal value is 65 as
a. a hexadecimal constant
65/16 = 1 r1
1/16 = 0 r1
Hexadecimal constant = 11
b. an octal constant (in C)
65/8 = 8 r1
8/8 = 1 r0
1/8 = 0 r1
Octal constant = 101
Is this the right way to convert constants in C?
You just need a while loop and a string. As this is homework, I do not think I should say more than that.
The method is to divide by the base until the result is less than the base.
So 65/8 gives 8 r1 but you don't stop there because the result is 8 not less than 8
You divide by 8 again and get 1
It should be
65/64 = 10 r 1 where 64 = 8x8 = octal 10
I don't think I've said too much
Maybe I am misunderstanding the questions, but it seems like you are being asked how hex and oct constants are represented in C, not how to implement an algorithm to convert dec to hex and oct.
If that is the case:
hex numbers are represented by a preceding 0x or 0X
oct numbers are represented by a preceding 0
int hex = 0x41;
int oct = 0101;
Of course, you can verify this by printing our the values in decimal:
printf("%d\n", hex);
printf("%d\n", oct);

Resources