Confused with network byte order and host byte order - c

I'm getting terribly confused with host byte order and network byte order. I know network byte order is big endian. And I know host byte order in my case is little endian.
So, if I'm printing data I would need to convert to host byte order in order to get the correct value right?
My problem is I am trying to print the value of data returned by htonl. Here is my example:
#include <stdio.h>
#include <netinet/in.h>
int main(int argc, char *argv[])
{
int bits = 12;
char *ip = "132.89.39.0";
struct in_addr addr;
uint32_t network, netmask, last_addr;
uint32_t total_hosts;
inet_aton(ip, &addr);
printf("Starting IP:\t%s\n", inet_ntoa(addr.s_addr));
netmask = (0xFFFFFFFFUL << (32 - bits)) & 0xFFFFFFFFUL;
netmask = htonl(netmask);
printf("Netmask:\t%s\n", inet_ntoa(netmask));
network = addr.s_addr & netmask;
printf("Network:\t%s\n", inet_ntoa(network));
printf("Total Hosts:\t%d\n", ntohl(netmask));
return 0;
}
printf("Total Hosts:\t%d\n", ntohl(netmask)); prints the correct value but it prints with a minus sign.If I use %uI get the wrong value.
Where am I going wrong?
With %d output is:
Starting IP: 132.89.39.0
Netmask: 255.240.0.0
Network: 132.80.0.0
Total Hosts: -1048576
With %u output is:
Starting IP: 132.89.39.0
Netmask: 255.240.0.0
Network: 132.80.0.0
Total Hosts: 4293918720
I've been stuck on this for 2 days. Something seemingly so simple has threw me off completely. I don't want anyone to solve the problem, but a push in the right direction would be very helpful.

If you see, the prototype of htonl() is
uint32_t htonl(uint32_t hostlong);
so, it returns an uint32_t, which is of unsigned type. Printing that value using %d , (which expects an argument of type signed int) is improper.
At least, you need to use %u for getting the unsigned value. Generally, if possible, try to use PRIu32 MACRO for printing fixed-width (32) unsigned integers.

There are currently a variety of systems that can change between little-endian and bigendian
byte ordering, sometimes at system reset, sometimes at run-time.
We must deal with these byte ordering differences as network programmers because
networking protocols must specify a network byte order. For example, in a TCP segment, there
is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the
receiving protocol stack must agree on the order in which the bytes of these multibyte fields
will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte
integers.
In theory, an implementation could store the fields in a socket address structure in host byte
order and then convert to and from the network byte order when moving the fields to and from
the protocol headers, saving us from having to worry about this detail. But, both history and
the POSIX specification say that certain fields in the socket address structures must be
maintained in network byte order. Our concern is therefore converting between host byte order
and network byte order. We use the following four functions to convert between these two byte
orders.
#include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue) ;
uint32_t htonl(uint32_t host32bitvalue) ;
Both return: value in network byte order
uint16_t ntohs(uint16_t net16bitvalue) ;
uint32_t ntohl(uint32_t net32bitvalue) ;
Both return: value in host byte order
In the names of these functions, h stands for host, n stands for network, s stands for short,
and l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX
implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or
UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit
Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on
32-bit values.
When using these functions, we do not care about the actual values (big-endian or littleendian)
for the host byte order and the network byte order. What we must do is call the
appropriate function to convert a given value between the host and network byte order. On
those systems that have the same byte ordering as the Internet protocols (big-endian), these
four functions are usually defined as null macros.
We will talk more about the byte ordering problem, with respect to the data contained in a
network packet as opposed to the fields in the protocol headers,
We have not yet defined the term "byte." We use the term to mean an 8-bit quantity since
almost all current computer systems use 8-bit bytes. Most Internet standards use the term
octet instead of byte to mean an 8-bit quantity. This started in the early days of TCP/IP
because much of the early work was done on systems such as the DEC-10, which did not use
8-bit bytes.
Another important convention in Internet standards is bit ordering. In many Internet
standards, you will see "pictures" of packets that look similar to the following (this is the first
32 bits of the IPv4 header from RFC 791):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL| TYPE OF SERCVICE | TOTAL LENGTH |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This represents four bytes in the order in which they appear on the wire; the leftmost bit is the
most significant. However, the numbering starts with zero assigned to the most significant bit.
This is a notation that you should become familiar with to make it easier to read protocol
definitions in RFCs.
A common network programming error in the 1980s was to develop code on Sun
workstations (big-endian Motorola 68000s) and forget to call any of these four functions.
The code worked fine on these workstations, but would not work when ported to littleendian
machines (such as VAXes).

The problem here is not your conversion between network and host order. That part of your code works perfectly.
The problem is your belief that the netmask, interpreted as an integer, is the number of hosts which match that mask. That is exactly the inverse of the truth.
Consider your 12-bit netmask, 255.240.0.0. Or, in binary:
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
As your code indicates, for a host address to match a network address with this netmask, the two addresses need to be identical where the netmask has a 1 bit. Bit positions corresponding to a 0 in the netmask can be freely chosen. The number of such addresses can be determined by considering only the 0 bits. But of course we can't leave those bits as 0s; to count the number of qualifying addresses, we need to prepend a 1. So the count, in this case, is (exactly as you suspect) 1,048,576:
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
One way to compute this value would be to invert the netmask and add 1:
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (bitwise invert)
1 (+ 1)
---------------------------------------------------------------
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
In 2's-complement arithmetic, this is precisely the same as arithmetic negation. So it is not suprising that when you print the netmask out as a signed integer, you'll see the negative of the expected count. (Printing an unsigned uint32_t as a signed int is technically undefined behaviour but will probably work as expected on 2's-complement machines with 32-bit ints.)
In short, what you should do to compute the number of qualifying addresses from the netmask is:
uint32_t address_count = ~ntohl(netmask) + 1;
(which most compilers will optimize to a unary negation opcode, if available.)

Related

Where is header and payload information about a calloc stored in heap?

I want to know how this structure & calloc are allocated in heap. I mean how glibc keep information in memory like this is structure this is its data type so on.
#include<stdio.h>
struct student{
int rollno;
char level;
};
int main(){
struct student *p = calloc(1,sizeof(struct student));
p->rollno=767;
p->level='A';
printf("addr of p is =%p\n",p);
printf("student_struct_size=%ld\n",sizeof(
}
In my system
addr of p is =0x555555756260
(gdb) x/16b 0x555555756260
0x555555756260: -1 2 0 0 65 0 0 0
0x555555756268: 0 0 0 0 0 0 0 0
I can understand why 65 is coming but where is 767, also where is header information about calloc (what is boundary of calloc)
If i Do x/16b 0x555555756260-8 i get 33 , is 33 is size of all payload + header information , can u justify why 33 is coming
(gdb) x/16b 0x555555756260-8
0x555555756258: 33 0 0 0 0 0 0 0
0x555555756260: -1 2 0 0 65 0 0 0
"Where is 767?"
Since you are printing the memory in (signed) bytes, and your int apparently has 4 bytes in little endian and two's complement, you can calculate this by:
least significant byte: (signed byte) -1 = (hex) 0xFF
next significant byte: (signed byte) 2 = (hex) 0x02
next bytes are both zero
4-byte value is 0x000002FF = 767
"Where is header information?"
One possible implementation of memory management stores its management data before the memory block returned to you.
You might want to look in even lower memory bytes. Expect some pointers there.
To understand the entries, you will need to obtain the sources of your library and to study them. Then you might know what the value 33 represents.
where is 767,
-1 and 2 are 767, little endian. -1 + 2 * 256 = 0xff + 2 * 256 = 255 + 2*256 = 767.
where is header information about calloc
Those 8 bytes with 33 are all *alloc needs, there's nothing really more.
can u justify why 33 is coming
From glibc/malloc.c:
| Size of chunk, in bytes |A|M|P|
33 is 0b100001:
100001
AMP
^^ <- SIZE
Bit P is set, which means Previous chunk is used.
Bit M is not set, which means the region is not Mmap()-ed.
Bit A is not set, which means the chunk is in the main Area.
The rest is size. 0b100000 is 32 - the allocated chunk has 32 bytes. 8 bytes for a size_t number to store that 33 and malloc_usable_size will return 32 - 8 = 24.

Bits of the primitive type in C

Well, I'm starting my C studies and I was left with the following question, how are the bits of the primitive types filled in, for example, the int type, for example, has 4 bytes, that is 32 bits, which fits up to 4294967296. But if for example , I use a value that takes only 1 byte, how do the other bits stay?
#include <stdio.h>
int main(void) {
int x = 5; // 101 how the rest of the bits are filled
// which was not used?
return 0;
}
All leading bits will be set to 0, otherwise the value wouldn't be 5. A bit, in today computers, has only two states so if it's not 0 then it's 1 which would cause the value stored to be different. So assuming 32 bits you have that
5 == 0b00000000 00000000 00000000 00000101
5 == 0x00000005
The remaining bits are stored with 0.
int a = 356;
Now let us convert it to binary.
1 0 1 1 0 0 1 0 0
Now you get 9 bit number. Since int allocates 32 bits, fill the remaining 23 bits with 0.
So the value stored in memory is
00000000 00000000 00000001 01100100
The type you have picked determines how large the integer is, not the value you store inside the variable.
If we assume that int is 32 bits on your system, then the value 5 will be expressed as a 32 bit number. Which is 0000000000000000000000000101 binary or 0x00000005 hex. If the other bits had any other values, it would no longer be the number 5, 32 bits large.

bitwise-and with HEX and CHAR in C

I'm really getting frustrated here. Trying to implement the CRC-CCITT algorithm and I found a very nice example on an Internet site.
There is one line whose output I completely don't understand:
unsigned short update_crc_ccitt( unsigned short crc, char c){
[...]
short_c = 0x00ff & (unsigned short) c;
[...]
}
I want to calculate the CRC of the "test" string "123456789". So in the first run the char 'c' is 1. From my understanding short_c from the first run should be equal to 1 as well, but when I print it to the console, I get short_c = 49 for c = 1. How?
0x00ff in binary is: 1 1 1 1 1 1 1 1
char 1 in binary is: 0 0 0 0 0 0 0 1
bitand should be : 0 0 0 0 0 0 0 1
Where is my mistake?
The character 1 has ASCII code 0x31 = 49. This is different from the character with ASCII code 1 (which is ^A).
You are confusing characters and numbers, basically. The first letter in the string "123456789"is the character '1', whose decimal value on most typical computers is 49.
This value is decided by the encoding of the characters, which describes how each character is assigned a numerical value which is what your computer stores.
C guarantees that the encoding for the 10 decimal digits will be in a compact sequence with no gaps, starting with '0'. So, you can always convert a character to the corresponding number by doing:
const int digitValue = digit - '0';
This will convert the digit '0' to the integer 0, and so on for all the digits up to (and including) '9'.

A "dynamic bitfield" in C

In this question, assume all integers are unsigned for simplicity.
Suppose I would like to write 2 functions, pack and unpack, which let you pack integers of smaller width into, say, a 64-bit integer. However, the location and width of the integers is given at runtime, so I can't use C bitfields.
Quickest is to explain with an example. For simplicity, I'll illustrate with 8-bit integers:
* *
bit # 8 7 6 5 4 3 2 1
myint 0 1 1 0 0 0 1 1
Suppose I want to "unpack" at location 5, an integer of width 2. These are the two bits marked with an asterisk. The result of that operation should be 0b01. Similarly, If I unpack at location 2, of width 6, I would get 0b100011.
I can write the unpack function easily with a bitshift-left followed by a bitshift right.
But I can't think of a clear way to write an equivalent "pack" function, which will do the opposite.
Say given an integer 0b11, packing it into myint (from above) at location 5 and width 2 would yield
* *
bit # 8 7 6 5 4 3 2 1
myint 0 1 1 1 0 0 1 1
Best I came up with involves a lot of concatinating bit-strings with OR, << and >>. Before I implement and test it, maybe somebody sees a clever quick solution?
Off the top of my head, untested.
int pack(int oldPackedInteger, int bitOffset, int bitCount, int value) {
int mask = (1 << bitCount) -1;
mask <<= bitOffset;
oldPackedInteger &= ~mask;
oldPackedInteger |= value << bitOffset;
return oldPackedInteger;
}
In your example:
int value = 0x63;
value = pack(value, 4, 2, 0x3);
To write the value "3" at an offset of 4 (with two bits available) when 0x63 is the current value.

Decomposition of an IP header

I have to do a sniffer as an assignment for the security course. I am using C and the pcap library. I got everything working well (since I got a code from the internet and changed it). But I have some questions about the code.
u_int ip_len = (ih->ver_ihl & 0xf) * 4;
ih is of type ip_header, and its currently pointing the to IP header in the packet.
ver_ihl gives the version of the IP.
I can't figure out what is: & 0xf) * 4;
& is the bitwise and operator, in this case you're anding ver_ihl with 0xf which has the effect of clearing all the bits other than the least signifcant 4
0xff & 0x0f = 0x0f
ver_ihl is defined as first 4 bits = version + second 4 = Internet header length. The and operation removes the version data leaving the length data by itself. The length is recorded as count of 32 bit words so the *4 turns ip_len into the count of bytes in the header
In response to your comment:
bitwise and ands the corresponding bits in the operands. When you and anything with 0 it becomes 0 and anything with 1 stays the same.
0xf = 0x0f = binary 0000 1111
So when you and 0x0f with anything the first 4 bits are set to 0 (as you are anding them against 0) and the last 4 bits remain as in the other operand (as you are anding them against 1). This is a common technique called bit masking.
http://en.wikipedia.org/wiki/Bitwise_operation#AND
Reading from RFC 791 that defines IPv4:
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The first 8 bits of the IP header are a combination of the version, and the IHL field.
IHL: 4 bits
Internet Header Length is the length of the internet header in 32
bit words, and thus points to the beginning of the data. Note that
the minimum value for a correct header is 5.
What the code you have is doing, is taking the first 8 bits there, and chopping out the IHL portion, then converting it to the number of bytes. The bitwise AND by 0xF will isolate the IHL field, and the multiply by 4 is there because there are 4 bytes in a 32-bit word.
The ver_ihl field contains two 4-bit integers, packed as the low and high nybble. The length is specified as a number of 32-bit words. So, if you have a Version 4 IP frame, with 20 bytes of header, you'd have a ver_ihl field value of 69. Then you'd have this:
01000101
& 00001111
--------
00000101
So, yes, the "&0xf" masks out the low 4 bits.

Resources