I'm currently working on a packet sniffer/analyzer for a school project, and I'm having trouble extracting the DNS flags from the DNS header.
A DNS header looks like this :
The DNS header struct looks like this :
struct dnshdr {
uint16_t xid;
uint16_t flags;
uint16_t qdcount;
uint16_t ancount;
uint16_t nscount;
uint16_t arcount;
};
How can I extract the individual flags from the uint16_t ?
you can either define a structure with bitfields, which always sounds the cleanest way on paper but turns out to be a nightmare of implementation-defined features and almost completely unportable, or you do it the simple way with masks and shifts - macros are the common implementation:
#define QR(f) (f & 0x0001)
#define OPCODE(f) ((f & 0x001E) >> 1)
#define AA(f) ((f & 0x0020) >> 5)
...etc
This is of course assuming any necessary endianness correction has been done already (so the two bytes of the uint16_t are in the correct order to be interpreted this way)
Afterthought: the one-bit flags don't really need to be shifted either - once they're masked they're going to be zero or non-zero, which is enough for testing them in C.
Related
I have two codes which work exactly the same:
struct sniff_ip {
u_char ip_vhl; /* version << 4 | header length >> 2 */
...
};
#define IP_HL(ip) (((ip)->ip_vhl) & 0x0f)
#define IP_V(ip) (((ip)->ip_vhl) >> 4)
and
struct sniff_ip {
uint8_t ip_hl:4;
uint8_t ip_ver:4;
...
};
The former is code from http://www.tcpdump.org/pcap.html
Latter is mine
IP version and IP header length change position in these two codes, however the output is the same, why?
what I mean is #define IP_HL(ip) (((ip)->ip_vhl) & 0x0f) looks at second four bits, when uint8_t ip_hl:4 is declared to capture first four bits...
Do not use bitfields for implementing protocols! Exact position depends on ABI and is platform/compiler dependent.
Your assumption
when uint8_t ip_hl:4 is declared to capture first four bits
is wrong resp. is valid for your compiler but can not be generalized. You have to read compiler/ABI documentation very carefully to find out where bits are really placed.
An example how bitfields are defined can be found in the ARM EABI specification http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf "7.1.7 Bit-fields". But this might be completely different for x86 or mips ABIs
EDIT:
Bitfields can be useful to save space (e.g. unsigned int flag:1 vs. bool flag) [this assumption might not hold because checks will need more (and slower) machine code] and to make code more easy to read (e.g. if (a->flags & (1 << 0)) vs. if (a->some_flag)). But you can never rely on exact positions.
I have a structure in C that looks like this:
typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;
typedef struct{
X test;
NN test2[2];
C test3[4];
U test4;
} Test;
I have declared the structure and written values to the fields as follows:
Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca;
I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.
I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.
I think there may be two issues here depending on how you're sending this data over TCP.
Issue 1: Endianness
As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.
Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.
To use these functions you will need to do something like the following...
Sender:
-------
t.test = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4 = htons(0xdeca); // Needs to be swapped
...
sendto(..., &t, ...);
Receiver:
---------
recvfrom(..., &t, ...);
t.test4 = ntohs(0xdeca); // Needs to be swapped
Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).
The following diagram will make this more clear...
If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear...
In this case your code might look something like
Sender:
-------
uint8_t buffer[SOME SIZE];
t.test = 0xde;
t.test2[0] = 0xad; ...
t.test3[0] = 0xbe; ...
t.test4 = 0xdeca;
buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;
...
sendto(..., buffer, ...);
Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);
t.test = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4 = buffer[7] | (buffer[8] << 8);
The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.
However, if you're sending your structure through the socket in this way you should also note the caveat below...
Issue 2: Data alignment
The article "Data alignment: Straighten up and fly right" is a great read for this one...
The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...
struct
{
uint8_t v1;
uint16_t v2;
}
In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.
Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so
struct
{
uint8_t v1;
uint8_t invisible_padding_created_by_compiler;
uint16_t v2;
}
If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.
There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.
In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.
The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.
How do I write to a single bit? I have a variable that is either a 1 or 0 and I want to write its value to a single bit in a 8-bit reg variable.
I know this will set a bit:
reg |= mask; // mask is (1 << pin)
And this will clear a bit:
reg &= ~mask; // mask is (1 << pin)
Is there a way for me to do this in one line of code, without having to determine if the value is high or low as the input?
Assuming value is 0 or 1:
REG = (REG & ~(1 << pin)) | (value << pin);
I use REG instead of register because as #KerrekSB pointed out in OP comments, register is a C keyword.
The idea here is we compute a value of REG with the specified bit cleared and then depending on value we set the bit.
Because you tagged this with embedded I think the best answer is:
if (set)
reg |= mask; // mask is (1 << pin)
else
reg &= ~mask; // mask is (1 << pin)
(which you can wrap in a macro or inline function). The reason being that embedded architectures like AVR have bit-set and bit-clear instructions and the cost of branching is not high compared to other instructions (as it is on a modern CPU with speculative execution). GCC can identify the idioms in that if statement and produce the right instructions. A more complex version (even if it's branchless when tested on modern x86) might not assemble to the best instructions on an embedded system.
The best way to know for sure is to disassemble the results. You don't have to be an expert (especially in embedded environments) to evaluate the results.
One overlooked feature of C is bit packing, which is great for embedded work. You can define a struct to access each bit individually.
typedef struct
{
unsigned char bit0 : 1;
unsigned char bit1 : 1;
unsigned char bit2 : 1;
unsigned char bit3 : 1;
unsigned char bit4 : 1;
unsigned char bit5 : 1;
unsigned char bit6 : 1;
unsigned char bit7 : 1;
} T_BitArray;
The : 1 tells the compiler that you only want each variable to be 1 bit long. And then just access the address that your variable reg sits on, cast it to your bit array and then access the bits individually.
((T_BitArray *)®)->bit1 = value;
® is the address of your variable. ((T_BitArray *)®) is the same address, but now the complier thinks of it as a T_BitArray address and ((T_BitArray *)®)->bit1 provides access to the second bit. Of course, it's best to use more descriptive names than bit1
//Through Macro we can do set resset Bit
#define set(a,n) a|=(1<<n);
#define reset(a,n) a&=(0<<n);
//toggle bit value given by the user
#define toggle(a,n) a^=(1<<n);
int a,n;
int main()
{
printf("Set Reset particular Bit given by User ");
scanf("%d %d",&a,&n);
int b =set(a,n) //same way we can call all the macro
printf("%d",b);
return 0;
}
I think what you're asking is if you can execute a write instruction on a single bit without first reading the byte that it's in. If so, then no, you can't do that. Has nothing to do with the C language, just microprocessors don't have instructions that address single bits. Even in raw machine code, if you want to set a bit you have to read the byte it's in, change the bit, then write it back. There's just no other way to do it.
Duplicate of how do you set, clear, and toggle a single bit and I'll repost my answer too as no-one's mentioned SET and CLEAR registers yet:
As this is tagged "embedded" I'll assume you're using a microcontroller. All of the above suggestions are valid & work (read-modify-write, unions, structs, etc.).
However, during a bout of oscilloscope-based debugging I was amazed to find that these methods have a considerable overhead in CPU cycles compared to writing a value directly to the micro's PORTnSET / PORTnCLEAR registers which makes a real difference where there are tight loops / high-frequency ISR's toggling pins.
For those unfamiliar: In my example, the micro has a general pin-state register PORTn which reflects the output pins, so doing PORTn |= BIT_TO_SET results in a read-modify-write to that register.
However, the PORTnSET / PORTnCLEAR registers take a '1' to mean "please make this bit 1" (SET) or "please make this bit zero" (CLEAR) and a '0' to mean "leave the pin alone". so, you end up with two port addresses depending whether you're setting or clearing the bit (not always convenient) but a much faster reaction and smaller assembled code.
Lets say I have an enum with bitflag options larger than the amount of bits in a standard data type:
enum flag_t {
FLAG_1 = 0x1,
FLAG_2 = 0x2,
...
FLAG_130 = 0x400000000000000000000000000000000,
};
This is impossible for several reasons. Enums are max size of 128 bits (in C/gcc on my system from experimentation), single variables are also of max size 128 bits etc.
In C you can't perform bitwise operations on arrays, though in C++ I suppose you could overload bitwise operators to do the job with a loop.
Is there any way in C other than manually remembering which flags go where to have this work for large numbers?
This is exactly what bit-fields are for.
In C, it's possible to define the following data layout :
struct flag_t
{
unsigned int flag1 : 1;
unsigned int flag2 : 1;
unsigned int flag3 : 1;
(...)
unsigned int flag130 : 1;
(...)
unsigned int flag1204 : 1; // for fun
};
In this example, all flags occupy just one bit. An obvious advantage is the unlimited number of flags. Another great advantage is that you are no longer limited to single-bit flags, you could have some multi-value flags merged in the middle.
But most importantly, testing and attribution would be a bit different, and probably simplified, as far as unit operations are concerned : you no longer need to do any masking, just access the flag directly by naming it. And by the way, use the opportunity to give these flags more comprehensive names :)
Instead of trying to assign absurdly large numbers to an enum so you can have a hundreds-of-bits-wide bitfield, let the compiler assign a normal zero-based sequence of numbers to your flag names, and simulate a wide bitfield using an array of unsigned char. You can have a 1024-bit bitfield using unsigned char bits[128], and write get_flag() and set_flag() accessor functions to mask the minor amount of extra work involved.
However, a far better piece of advice would be to look at your design again, and ask yourself "Why do I need over a hundred different flags?". It seems to me that what you really need is a redesign.
In this answer to a question related to bitflags, Bit Manipulation and Flags, I provided an example of using an unsigned char array that is an approach for very large sets of bitflags which I am moving to this posting.
This source example provides the following:
a set of Preprocessor defines for the bitflag values
a set of Preprocessor macros to manipulate bits
a couple of functions to implement bitwise operations on the arrays
The general approach for this is as follows:
create a set of defines for the flags which specify an array offset and a bit pattern
create a typedef for an unsigned char array of the proper size
create a set of functions that implement the bitwise logical operations
The Specifics from the Answer with a Few Improvements and More Exposition
Use a set of C Preprocessor defines to create a set of bitflags to be used with the array. These bitflag defines specify an offset within the unsigned char array along with the bit to manipulate.
The defines in this example are 16 bit values in which the upper byte contains the array offset and the lower byte contains the bit flag(s) for the byte of the unsigned char array whose offset is in the upper byte. Using this technique you can have arrays up to 256 elements, 256 * 8 or 2,048 bitflags, or by going from a 16 bit define to a 32 bit long you could have much more. (In the comments below bit 0 means least significant bit of a byte and bit 7 means most significant bite of a byte).
#define ITEM_FLG_01 0x0001 // array offset 0, bit 0
#define ITEM_FLG_02 0x0002 // array offset 0, bit 1
#define ITEM_FLG_03 0x0101 // array offset 1, bit 0
#define ITEM_FLG_04 0x0102 // array offset 1, bit 1
#define ITEM_FLG_05 0x0201 // array offset 2, bit 0
#define ITEM_FLG_06 0x0202 // array offset 2, bit 1
#define ITEM_FLG_07 0x0301 // array offset 3, bit 0
#define ITEM_FLG_08 0x0302 // array offset 3, bit 1
#define ITEM_FLG_10 0x0908 // array offset 9, bit 7
Next you have a set of macros to set and unset the bits along with a typedef to make it a bit easier to use. Unfortunately using a typedef with C does not provide you better type checking from the compiler but it does make it easier to use. These macros do no checking of their arguments so you might feel safer using regular functions instead.
#define SET_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) |= (b) & 0xf)
#define TOG_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) ^= (b) & 0xf)
#define CLR_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) &= ~ ((b) & 0xf))
#define TST_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) & ((b) & 0xf))
typedef unsigned char BitSet[10];
An example of using this basic framework is as follows.
BitSet uchR = { 0 };
int bValue;
SET_BIT(uchR, ITEM_FLG_01);
bValue = TST_BIT(uchR, ITEM_FLG_01);
SET_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_04);
CLR_BIT(uchR, ITEM_FLG_05);
CLR_BIT(uchR, ITEM_FLG_01);
Next you can introduce a set of utility functions to do some of the bitwise operations we want to support. These bitwise operations would be analogous to the built in C operators such as bitwise Or (|) or bitwise And (&). These functions use the built in C operators to perform the designated operator on all array elements.
These particular examples of the utility functions modify one of the sets of bitflags provided. However if that is a problem, you can modify the functions to accept three arguments, one being for the result of the operation and the other two for the two sets of bitflags to use in the operation.
void AndBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ &= *s2++;
}
}
void OrBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ |= *s2++;
}
}
void XorBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ ^= *s2++;
}
}
If you need more than one size of a bitflags type using this approach then the most flexible approach to eliminate the typedef and just use straight unsigned char arrays of various sizes. This change would entail modifying the interface of the utility functions replacing BitSet with unsigned char pointer and unsigned char arrays where bitflag variables are defined. Along with the unsigned char pointers, you would also need to specify a length for the arrays.
You may also consider an approach similar to what is being done for text strings in Is concatenating arbitrary number of strings with nested function calls in C undefined behavior?.
I'm looking for input on the most elegant interface to put around a memory-mapped register interface where the target object is split in the register:
union __attribute__ ((__packed__)) epsr_t {
uint32_t storage;
struct {
unsigned reserved0 : 10;
unsigned ICI_IT_2to7 : 6; // TOP HALF
unsigned reserved1 : 8;
unsigned T : 1;
unsigned ICI_IT_0to1 : 2; // BOTTOM HALF
unsigned reserved2 : 5;
} bits;
};
In this case, accessing the single bit T or any of the reserved fields work fine, but to read or write the ICI_IT requires code more like:
union epsr_t epsr;
// Reading:
uint8_t ici_it = (epsr.bits.ICI_IT_2to7 << 2) | epsr.bits.ICI_IT_0to1;
// Writing:
epsr.bits.ICI_IT_2to7 = ici_it >> 2;
epsr.bits.ICI_IT_0to1 = ici_it & 0x3;
At this point I've lost a chunk of the simplicity / convenience that the bitfield abstraction is trying to provide. I considered the macro solution:
#define GET_ICI_IT(_e) ((_e.bits.ICI_IT_2to7 << 2) | _e.bits.ICI_IT_0to1)
#define SET_ICI_IT(_e, _i) do {\
_e.bits.ICI_IT_2to7 = _i >> 2;\
_e.bits.ICI_IT_0to1 = _i & 0x3;\
while (0);
But I'm not a huge fan of macros like this as a general rule, I hate chasing them down when I'm reading someone else's code, and far be it from me to inflict such misery on others. I was hoping there was a creative trick involving structs / unions / what-have-you to hide the split nature of this object more elegantly (ideally as a simple member of an object).
I don't think there's ever a 'nice' way, and actually I wouldn't rely on bitfields... Sometimes it's better to just have a bunch of exhaustive macros to do everything you'd want to do, document them well, and then rely on them having encapsulated your problem...
#define ICI_IT_HI_SHIFT 14
#define ICI_IT_HI_MASK 0xfc
#define ICI_IT_LO_SHIFT 5
#define ICI_IT_LO_MASK 0x02
// Bits containing the ICI_IT value split in the 32-bit EPSR
#define ICI_IT_PACKED_MASK ((ICI_IT_HI_MASK << ICI_IT_HI_SHIFT) | \
(ICI_IT_LO_MASK << ICI_IT_LO_SHIFT))
// Packs a single 8-bit ICI_IT value x into a 32-bit EPSR e
#define PACK_ICI_IT(e,x) ((e & ~ICI_IT_PACKED_MASK) | \
((x & ICI_IT_HI_MASK) << ICI_IT_HI_SHIFT) | \
((x & ICI_IT_LO_MASK) << ICI_IT_LO_SHIFT)))
// Unpacks a split 8-bit ICI_IT value from a 32-bit EPSR e
#define UNPACK_ICI_IT(e) (((e >> ICI_IT_HI_SHIFT) & ICI_IT_HI_MASK) | \
((e >> ICI_IT_LO_SHIFT) & ICI_IT_LO_MASK)))
Note that I haven't put type casting and normal macro stuff in, for the sake of readability. Yes, I get the irony in mentioning readability...
If you dislike macros that much just use an inline function, but the macro solution you have is fine.
Does your compiler support anonymous unions?
I find it an elegant solution which gets rid of your .bits part. It is not C99 compliant, but most compilers do support it. And it became a standard in C11.
See also this question: Anonymous union within struct not in c99?.