Why does an 8-bit field have endianness?

Why does an 8-bit field have endianness? - c

See the definition of TCP header in /netinet/tcp.h:
struct tcphdr
{
u_int16_t th_sport; /* source port */
u_int16_t th_dport; /* destination port */
tcp_seq th_seq; /* sequence number */
tcp_seq th_ack; /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
u_int8_t th_x2:4; /* (unused) */
u_int8_t th_off:4; /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
u_int8_t th_off:4; /* data offset */
u_int8_t th_x2:4; /* (unused) */
# endif
u_int8_t th_flags;
# define TH_FIN 0x01
# define TH_SYN 0x02
# define TH_RST 0x04
# define TH_PUSH 0x08
# define TH_ACK 0x10
# define TH_URG 0x20
u_int16_t th_win; /* window */
u_int16_t th_sum; /* checksum */
u_int16_t th_urp; /* urgent pointer */
};
Why does the 8-bit field have a different order in endianness? I thought only 16-bit and 32-bit fields mattered with byte order, and you could convert between endians with ntohs and ntohl, respectively. What would the function be for handling 8-bit things? If there is none, it seems that a TCP using this header on a little endian machine would not work with a TCP on a big endian machine.

There are two kind of order. One is byte order, one is bitfield order.
There is no standard order about the bitfield order in C language. It depends on the compiler. Typically, the order of bitfields are reversed between big and little endian.

This is compiler-dependent and non-portable. How bit fields are ordered is implementation dependent, it would be far better here to use an 8-bit field and shift/mask to obtain the subfields.

Its possible that in this machine the endianess also refers to the bit order as well as the byte order. This wikipedia article mentions that this is sometimes the case.

My understanding is that bit ordering and endianness are generally two different things. Structures with bit-fields are generally not portable across compilers/architectures. Sometimes ifdefs can be used to support different bit orderings. In this case the endianness is really irrelevant and it should be an ifdef about the bit ordering. The assumption that some endianesses have a certain bit-order may be true in some cases.

My reading of the comment is that the two one-byte fields together are construed as a two-byte value (or were -- seems like one byte is unused anyway). Rather than declare one two-byte value, they declare two one-byte values but reverse the order of declaration depending on endian-ness.

It may help to know that this is code is only run if "# ifdef __FAVOR_BSD". It is from /usr/include/netinet/tcp.h
# ifdef __FAVOR_BSD
typedef u_int32_t tcp_seq;
/*
* TCP header.
* Per RFC 793, September, 1981.
*/
struct tcphdr

Related

When does GPIO peripheral get its members addresses assigned?

From stm32f446xx.h we have a definition of GPIO_TypeDef
typedef struct
{
__IO uint32_t MODER; /*!< GPIO port mode register, Address offset: 0x00 */
__IO uint32_t OTYPER; /*!< GPIO port output type register, Address offset: 0x04 */
__IO uint32_t OSPEEDR; /*!< GPIO port output speed register, Address offset: 0x08 */
__IO uint32_t PUPDR; /*!< GPIO port pull-up/pull-down register, Address offset: 0x0C */
__IO uint32_t IDR; /*!< GPIO port input data register, Address offset: 0x10 */
__IO uint32_t ODR; /*!< GPIO port output data register, Address offset: 0x14 */
__IO uint32_t BSRR; /*!< GPIO port bit set/reset register, Address offset: 0x18 */
__IO uint32_t LCKR; /*!< GPIO port configuration lock register, Address offset: 0x1C */
__IO uint32_t AFR[2]; /*!< GPIO alternate function registers, Address offset: 0x20-0x24 */
} GPIO_TypeDef;
This is used to initialize some GPIO peripherals. Focusing on GPIOA:
#define PERIPH_BASE ((uint32_t)0x40000000)
#define AHB1PERIPH_BASE (PERIPH_BASE + 0x00020000)
#define GPIOA_BASE (AHB1PERIPH_BASE + 0x0000)
#define GPIOA ((GPIO_TypeDef *) GPIOA_BASE)
So my questions are:
Is the #define directive declaring and initializing the struct and putting it at address pointed to at `GPIOA_BASE? Or is the define statement just declaring the struct but not initializing its members?
How do the members get their address placement correct without it being explicitly defined? How do we know that GPIOA->MODER leads to the same address as GPIOA_BASE? Likewise how do we know GPIOA->ODR leads to the address GPIOA_BASE + offset 0x014? Is it simply because when we declare the struct and tell it that it's located at GPIOA_BASE with the #define GPIOA ((GPIO_TypeDef *) GPIOA_BASE) command all it's member variables are declared and initialized in memory in the order they appear in the struct GPIOA_TypeDef? That would make sense since each takes 4 bytes but I am unsure if this is the case.
If the member variables are not declared in the order they are listed in the struct then how do they get their memory addresses assigned?

The name "C" is used to describe two things:
A family of languages which map source code constructs onto target-machine constructs in a rather consistent fashion. There's often no "standard" documenting how such mapping should be performed, and implementations for unusual target platforms will behave differently from those for more common ones, but compilers for similar platforms will generally behave similarly.
A language which contains only the features mandated by the C11 Standard, and generally omits features which aren't implemented consistently on all platforms, even though C compilers for similar platforms would have processed them similarly.
The code you cite above is designed for compilers that process a languages in the first family, which is adapted to the needs of ARM processors. In that language, each item of a structure will be placed at the lowest offset which satisfies its alignment requirement. Neither the C11 Standard, nor any of its predecessors, requires that compilers guarantee to lay out structures in such fashion, nor do any of them even any convenient way for those that sometimes do otherwise to indicate that fact. From the point of view of the C11 Standard, structure elements may be arbitrarily placed subject subject to relatively few constraints. I think the authors of C89 probably figured that compiler writers would lay things out sequentially, with or without a mandate, except on rare platforms where there might be a compelling reason to do otherwise, and thus saw no reason to demand that they behave as they were going to anyway. Some people, however, seem to believe the lack of a mandate was meant to be an open invitation to arrange things arbitrarily, and any code that would rely upon a particular layout should be viewed as "defective".

What does ((Port *)0x41004400UL) mean here?

I'm working on a developing board that has a 32-bit ARM based microntroller on it (namely the board is Atmel SAM D21J18A). I'm still at the learning phase and I have a lot to go, but I'm really into embedded systems.
I have some background in C. However, it's obviously not enough. I was looking at the codes of an example project by Atmel, and I didn't really get some parts of it. Here is one of them:
#define PORT ((Port *)0x41004400UL) /**< \brief (PORT) APB Base Address */
Port is defined as:
typedef struct {
PortGroup Group[2]; /**< \brief Offset: 0x00 PortGroup groups [GROUPS] */
} Port;
and PortGroup is defined as:
typedef struct {
__IO PORT_DIR_Type DIR; /**< \brief Offset: 0x00 (R/W 32) Data Direction */
__IO PORT_DIRCLR_Type DIRCLR; /**< \brief Offset: 0x04 (R/W 32) Data Direction Clear */
__IO PORT_DIRSET_Type DIRSET; /**< \brief Offset: 0x08 (R/W 32) Data Direction Set */
__IO PORT_DIRTGL_Type DIRTGL; /**< \brief Offset: 0x0C (R/W 32) Data Direction Toggle */
__IO PORT_OUT_Type OUT; /**< \brief Offset: 0x10 (R/W 32) Data Output Value */
__IO PORT_OUTCLR_Type OUTCLR; /**< \brief Offset: 0x14 (R/W 32) Data Output Value Clear */
__IO PORT_OUTSET_Type OUTSET; /**< \brief Offset: 0x18 (R/W 32) Data Output Value Set */
__IO PORT_OUTTGL_Type OUTTGL; /**< \brief Offset: 0x1C (R/W 32) Data Output Value Toggle */
__I PORT_IN_Type IN; /**< \brief Offset: 0x20 (R/ 32) Data Input Value */
__IO PORT_CTRL_Type CTRL; /**< \brief Offset: 0x24 (R/W 32) Control */
__O PORT_WRCONFIG_Type WRCONFIG; /**< \brief Offset: 0x28 ( /W 32) Write Configuration */
RoReg8 Reserved1[0x4];
__IO PORT_PMUX_Type PMUX[16]; /**< \brief Offset: 0x30 (R/W 8) Peripheral Multiplexing n */
__IO PORT_PINCFG_Type PINCFG[32]; /**< \brief Offset: 0x40 (R/W 8) Pin Configuration n */
RoReg8 Reserved2[0x20];
} PortGroup;
So here, we are looking at the address 0x41004400UL, get the data in there, and then what happens?
I looked up for this but couldn't find anything useful. If you have any suggestions (tutorials, books etc.), please let me hear.

Nothing happens, because you only present some declarations. I'm not entirely sure what the question actually is, but to briefly explain that code:
0x41004400UL is obviously an address in I/O space (not regular memory) where a port starts (a set of I/O registers)
This Port consists of two groups with a similar arrangement of single registers
struct PortGroup models these registers exactly in the layout present on the hardware
To know the meaning of the Registers, look up the hardware documentation.

Generally you can access a hardware register in C in this manner:
#define PORT (*(volatile uint8_t*)0x1234)
0x1234 is the register address
uint8_t is the type of the register, in this case 1 byte large.
volatile is required so that the compiler knows it cannot optimize such a variable, but that each read or write to the variable stated in the code must actually be done.
(volatile uint8_t*) casts the integer literal to an address of the desired type.
The left-most * then take the contents of that address, so that the macro can be used just as if PORT was a regular variable.
Note that this does not allocate anything! It just assumes that there is a hardware register present at the given address, which can be accessed by the type specified (uint8_t).
Using the same method you can also have other C data types to correspond directly hardware registers. For example by using a handy struct, you can map the whole register area of a particular hardware peripheral. Such code is however a bit dangerous and questionable, since it must take things like alignment/struct padding and aliasing in account.
As for the specific code in your example, it is a typical awful register map for a particular hardware peripheral (looks like a plain general-purpose I/O port) on a certain microcontroller. One such beast is typically provided with each compiler supporting the MCU.
Such register maps are sadly always written in awful, completely non-portable ways. For example, two underscores __ is a forbidden identifier in C. Neither the compiler nor the programmer is allowed to declare such identifiers (7.1.3).
What's really strange is that they have omitted the volatile keyword. This means that you have one of these scenarios here:
The volatile keyword is hidden beneath the Port definition. Most likely this is the case, or
The register map is full of fatal bugs, or
The compiler is such an awful piece of crap that it doesn't optimize variables at all. Which would make the issues with volatile go away.
I would investigate this further.
As for struct padding and aliasing, the compiler vendor has likely implicitly assumed that only their compiler is to be used. They have no interest in providing you with a portable register map, so that you can switch the the competitor's compiler for the same MCU.

Build vlan header in c

I need to build a vlan header.
I have code that build eh header (struct ether_header) and it work ok.
/* Ethernet header */
memcpy(eh->ether_shost,src_mac_.data(), 6);
memcpy(eh->ether_dhost,socketAddress.sll_addr , 6);
/* Ethertype field */
eh->ether_type = htons(ETH_P_IP);
I didnt find struct for vlan_eth_header , so i create my own and populate it like this:
struct vlan_ethhdr {
u_int8_t ether_dhost[ETH_ALEN]; /* destination eth addr */
u_int8_t ether_shost[ETH_ALEN]; /* source ether addr */
u_int16_t h_vlan_proto;
u_int16_t h_vlan_TCI;
u_int16_t ether_type;
};
/* Ethernet header */
memcpy(eh->ether_shost,src_mac_.data(), 6);
memcpy(eh->ether_dhost,socketAddress.sll_addr , 6);
eh->h_vlan_proto = htons(0x8100);
eh->h_vlan_TCI = htons(VLAN_ID);
/* Ethertype field */
eh->ether_type = htons(ETH_P_IP);
It seems that i did it wrong.
It seems that Wireshak even didnt recognize the packet (the old code sent tcp packet and send them correct).
Any advice?

http://wiki.wireshark.org/VLAN
DstMAC [6 bytes]
SrcMAC [6 bytes]
Type [2 bytes]
0x8100
VLANTag [4 bytes]
Priority [0..2 bit]
0
CFI [3 bit]
0
ID [4..15 bit]
Ethernet Type [16..31]
0x0800 (IP)
My guess is you're not setting the VLAN_ID correctly.
At first you should avoid any structure padding issues by just creating some test packets in a byte[] buffer that way you can be sure you have everything correct. Then you can debug your structure and htons byte ordering with confidence because you know what the correct values should be.

My code to construct VLAN is correct. It dosnt worked for me at the first place , because i forgot to change the size of the packet to be bigger now.
Pay attention that TCI is not only VID its included priority and CFI. In my case they both zero , so i dont need to use mask and padding for TCI.

Are you sure your structure is properly packed? The compiler adds some padding by default. It can be easily disabled.
For example, with GCC:
#pragma pack(push)
#pragma pack(0)
struct vlan_ethhdr {
u_int8_t ether_dhost[ETH_ALEN]; /* destination eth addr */
u_int8_t ether_shost[ETH_ALEN]; /* source ether addr */
u_int16_t h_vlan_proto;
u_int16_t h_vlan_TCI;
u_int16_t ether_type;
};
#pragma pack(pop)

802.1Q 4-byte VLAN tag is generally considered as in the form of 0x8100+pri+cfi+vlan.
The wireshark way of depicting vlan tag as pri+cfi+vlan+etype (excluding 8100, but including etype of the data payload) is somewhat unique.
However, the overall packet format and length is correct either way.

Error in packet parsing

i trying to parse a packet. till the ip header everything is fine(i'm able to retrieve all the values correctly). but for the udp header( checked if the protocol is 17) , the values are coming out to be wrong( all the 4 fields).
I'm trying to do this:
struct udp_header{
uint16_t sport;
uint16_t dport;
uint16_t len;
uint16_t chksum;
};
struct udp_header* udp= (struct udp_header*)(packet + 14 + ip_hdr->ip_hl*4);
Packet is the pointer pointing to the beginning of the packet. 14 is for ethernet header.The header length ip when checked is giving out the correct value. But after performing this operation i'm getting all the fields wrongly. when tried with uint8_t as data type( i know its wrong! ) the destintion port somehow is coming out correct.

You have run into endianness. IP packets have all fields in network byte order (aka "big-endian"), and your host system probably runs little-endian. Look into ntohs() and friends for one approach.
The proper approach is to not copy the structure as-is from the network data, but instead extract each field manually and byte-swap it if necessary. This also works around any issues with padding and alignment, there's no guarantee that your struct is mapped into your computer's memory in exactly the same way as the packet is serialized.
So you would do e.g.:
udp_header.sport = ntohs(*(unsigned short*) (packet + 14 + 4 * ip_hdr->ip_hl));
This is also a bit iffy, since it assumes the resulting address can validly be cast into a pointer to unsigned short. On x86 that will work, but it's not epic.
Even better, in my opinion, is to drop the use of pointers and instead write a function called e.g. unsigned short read_u16(void *packet, size_t offset) that extracts the value byte-by-byte and returns it. Then you'd just do:
udp_header.sport = read_u16(packet, 14 + 4 * ip_hdr->ip_hl);

I always use this struct for IP header:
struct sniff_ip {
u_char ip_vhl; /* version << 4 | header length >> 2 */
u_char ip_tos; /* type of service */
u_short ip_len; /* total length */
u_short ip_id; /* identification */
u_short ip_off; /* fragment offset field */
#define IP_RF 0x8000 /* reserved fragment flag */
#define IP_DF 0x4000 /* dont fragment flag */
#define IP_MF 0x2000 /* more fragments flag */
#define IP_OFFMASK 0x1fff /* mask for fragmenting bits */
u_char ip_ttl; /* time to live */
u_char ip_p; /* protocol */
u_short ip_sum; /* checksum */
struct in_addr ip_src,ip_dst; /* source and dest address */
};
#define IP_HL(ip) (((ip)->ip_vhl) & 0x0f)
#define IP_V(ip) (((ip)->ip_vhl) >> 4)
And to get the UDP struct pointer:
udp = (struct sniff_udp*)(packet + SIZE_ETHERNET + (IP_HL(ip)*4));

As another answer remarked, you have to deal with endianness of your data.
The other thing you need to deal with is byte alignment. For speed, when you define a structure in C like this:
struct udp_header{
uint16_t sport;
uint16_t dport;
uint16_t len;
uint16_t chksum;
};
The C compiler may leave padding bytes between these fields so that member accesses can be done with faster single-instruction memory access assembly instructions. You can check if your c compiler is doing this by printf("struct size is: %u\n", sizeof(struct udp_header));
Assuming you are using GCC, you must disable padding bytes by adding #pragma pack(1) before the structure definition. To re-enable padding for speed you should then use #pragma pack().

structure members are taking wrong data

i coded a small program to show you the casting problem
#include <stdlib.h>
struct flags {
u_char flag1;
u_char flag2;
u_short flag3;
u_char flag4;
u_short flag5;
u_char flag7[5];
};
int main(){
char buffer[] = "\x01\x02\x04\x03\x05\x07\x06\xff\xff\xff\xff\xff";
struct flags *flag;
flag = (struct flags *) buffer;
return 0;
}
my problem is when i cast the flag 5 wrongly takes the "\x06\xff" bytes ignoring the "\x07" and the flag 7 wrongly takes the next 4 "\xff" bytes plus a nul which is the next byte.I also run gdb
(gdb) p/x flag->flag5
$1 = 0xff06
(gdb) p/x flag->flag7
$2 = {0xff, 0xff, 0xff, 0xff, 0x0}
(gdb) x/15xb flag
0xbffff53f: 0x01 0x02 0x04 0x03 0x05 0x07 0x06 0xff
0xbffff547: 0xff 0xff 0xff 0xff 0x00 0x00 0x8a
why this is happening and how i can handle it correctly?
thanks

It seems like structure member alignment issues. Unless you know how your compiler packs structure members, you should not make assumptions about the positions of those members in memory.
The reason that the 0x07 is apparently lost, is because the compiler is probably aligning the flag5 member on a 16-bit boundary, skipping the odd memory location that holds the 0x07 value. That value is lost in the padding. Also, what you are doing is overflowing the buffer, a big no-no. In other words:
struct flags {
u_char flag1; // 0x01
u_char flag2; // 0x02
u_short flag3; // 0x04 0x03
u_char flag4; // 0x05
// 0x07 is in the padding
u_short flag5; // 0x06 0xff
u_char flag7[5]; // 0xff 0xff 0xff 0xff ... oops, buffer overrun, because your
// buffer was less than the sizeof(flags)
};
You can often control the packing of structure members with most compilers, but the mechanism is compiler specific.

The compiler is free to put some unused padding between members of the structure to (for instance) arrange the alignment to it's conveneince. Your compiler may provide a #pragma packed or a command line argument to insure tight structure packing.

How structures are stored is implementation defined, and thus, you can't rely on a specific memory layout for serialization like that.
To serialize your structure to a byte array, write a function which serializes each field in a set order.

You might need to pack the struct:
struct flags __attribute__ ((__packed__)) {
u_char flag1;
u_char flag2;
u_short flag3;
u_char flag4;
u_short flag5;
u_char flag7[5];
};
Note: This is GCC -- I don't know how portable it is.

This has to do with padding. The compiler is adding garbage memory into your struct in order to get it to align with your memory correctly for efficiency.
See the following examples:
http://msdn.microsoft.com/en-us/library/71kf49f1(v=vs.80).aspx
http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight