Displaying integer in a special hexadecimal format - c

i am again facing a formatting problem. I want to pass a port number (as integer) as parameter to a function (argv[]) and want to display it in a special format. In my actual case i want to display the port number 1234 in hexadecimal. I try it this way
int port = 1234;
char* _port = (char*)&port;
for (int i = 0; i < sizeof(port); i++) {
printf("\\x%02x", _port[i]);
}
but it shows
\xffffffd2\x04\x00\x00
But i want it in format with leading zeros and 2 digits like
\x04\xd2
can you help me please?
EDIT: I changed it to
sizeof(port)-2
and it shows only 2 digits but in the wrong endian :S

On most systems the size of int is four bytes, 32 bits. The hexadecimal representation of 1234 is 0x000004d2. On a little-endian system (like x86 and x86-64) it's stored in memory like the four bytes 0xd2, 0x04, 0x00 and 0x00 in that order.
If we look at it as an array of bytes, it looks like
+------+------+------+------+
| 0xd2 | 0x04 | 0x00 | 0x00 |
+------+------+------+------+
There are three problems you have:
You loop over all four bytes of the int, while you only want the significant bits
You don't consider the endianness
That char on your system is signed and when promoted to int it will be sign-extended (according to the two's complement rules)
To solve the first point you need to discard the "leading" zero bytes.
To solve the second point you need to loop from the end (but only on little-endian systems).
To solve the third point use a type which won't be sign-extended (i.e. uint8_t).
Put together you could do something like this:
// The number we want to print
int port = 1234;
// Copy the raw binary data to a buffer
// This buffer is to not break strict aliasing
uint8_t _port[sizeof port];
memcpy(_port, &port, sizeof port);
// Skip leading zeroes in the buffer
// This is done by looping from the end of the buffer to the beginning,
// and loop as long as the current byte is zero
uint8_t *current;
for (current = _port + sizeof _port - 1; current > _port && *current == 0; --current)
{
// Empty
}
// Print the remaining bytes
for (; current >= _port; --current)
{
printf("\\x%02x", *current); // Print with zero-padding, so e.g. \x4 becomes \x04
}
Proof of concept

Get rid of signess and modify the formats.
void foo(int port, int endianess)
{
unsigned char * _port = (unsigned char*)&port;
if(endianess)
{
for (size_t i = 0; i < 2; i++)
{
printf("\\x%02hhx", _port[i]);
}
}
else
{
for (size_t i = sizeof(port) - 1; i >= sizeof(port) - 2; i--)
{
printf("\\x%02hhx", _port[i]);
}
}
}

Related

sprintf - producing char array from an int in C

I'm doing an assignment for school to swap the bytes in an unsigned long, and return the swapped unsigned long. ex. 0x12345678 -> 0x34127856.
I figured I'll make a char array, use sprintf to insert the long into a char array, and then do the swapping, stepping through the array. I'm pretty familiar with c++, but C seems a little more low level. I researched a few topics on sprintf, and I tried to make an array, but I'm not sure why it's not working.
unsigned long swap_bytes(unsigned long n) {
char new[64];
sprintf(new, "%l", n);
printf("Char array is now: %s\n", new);
}
TLDR; The correct approach is at the bottom
Preamble
Issues with what you're doing
First off using sprintf for byte swapping is the wrong approach because
it is a MUCH MUCH slower process than using the mathematical properties of bit operations to perform the byte swapping.
A byte is not a digit in a number. (a wrong assumption that you've made in your approach)
It's even more painful when you don't know the size of your integer (is it 32-bits, 64 bits or what)
The correct approach
Use bit manipulation to swap the bytes (see way way below)
The absolutely incorrect implementation with wrong output (because we're ignoring issue #2 above)
There are many technical reasons why sprintf is much slower but suffice it to say that it's so because moving contents of memory around is a slow operation, and of course more data you're moving around the slower it gets:
In your case, by changing a number (which sits in one manipulatable 'word' (think of it as a cell)) into its human readable string-equivalence you are doing two things:
You are converting (let's assume a 64-bit CPU) a single number represented by 8 bytes in a single CPU cell (officially a register) into a human equivalence string and putting it in RAM (memory). Now, each character in the string now takes up at least a byte: So a 16 digit number takes up 16 bytes (rather than 8)
You are then moving these characters around using memory operations (which are slow compared do doing something directly on CPU, by factor of a 1000)
Then you're converting the characters back to integers, which is a long and tedious operation
However, since that's the solution that you came up with let's first look at it.
The really wrong code with a really wrong answer
Starting (somewhat) with your code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
sprintf(new, "%lu", n); /* you forgot the `u` here */
printf("The number is: %s\n", new); /* well it shows up :) */
l = strlen(new);
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 10); /* convert new back */
}
/* testing swap byte */
int main() {
/* seems to work: */
printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
/* how about 432? (err not) */
printf("Swapping 432: %lu\n", swap_bytes(432));
}
As you can see the above is not really byte swapping but character swapping. And any attempt to try and "fix" the above code is nonsensical. For example,how do we deal with odd number of digits?
Well, I suppose we can pad odd digit counts with a zero:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64]; /* the fact that you used 64 here told me that you made assumption 2 */
sprintf(new, "%lu", n); /* you forgot the `u` here */
printf("The number is: %s\n", new); /* well it shows up :) */
l = strlen(new);
if(l % 2 == 1) { /* check if l is odd */
printf("adding a pad to make n even digit count");
sprintf(new, "0%lu", n);
l++; /* length has increased */
}
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 10); /* convert new back */
}
/* testing swap byte */
int main() {
/* seems to work: */
printf("Swapping 12345678: %lu\n", swap_bytes(12345678));
printf("Swapping 432: %lu\n", swap_bytes(432));
/* how about 432516? (err not) */
printf("Swapping 432: %lu\n", swap_bytes(432));
}
Now we run into an issue with numbers which are not divisible by 4... Do we pad them with zeros on the right or the left or the middle? err NOT REALLY.
In any event this entire approach is wrong because we're not swapping bytes anyhow, we're swapping characters.
Now what?
So you may be asking
what the heck is my assignment talking about?
Well numbers are represented as bytes in memory, and what the assignment is asking for is for you to get that representation and swap it.
So for example, if we took a number like 12345678 it's actually stored as some sequence of bytes (1 byte == 8 bits). So let's look at the normal math way of representing 12345678 (base 10) in bits (base 2) and bytes (base 8):
(12345678)10 = (101111000110000101001110)2
Splitting the binary bits into groups of 4 for visual ease gives:
(12345678)10 = (1011 1100 0110 0001 0100 1110)2
But 4 bits are equal to 1 hex number (0, 1, 2, 3... 9, A, B...F), so we can convert the bits into nibbles (4-bit hex numbers) easily:
(12345678)10 = 1011 | 1100 | 0110 | 0001 | 0100 | 1110
(12345678)10 = B | C | 6 | 1 | 4 | E
But each byte (8-bits) is two nibbles (4-bits) so if we squish this a bit:
(12345678)10 = (BC 61 4E)16
So 12345678 is actually representable in 3 bytes;
However CPUs have specific sizes for integers, usually these are multiples of 2 and divisible by 4. This is so because of a variety of reasons that are beyond the scope of this discussion, suffice it to say that you will get things like 16-bit, 32-bit, 64-bit, 128-bit etc... And most often the CPU of a particular bit-size (say a 64bit CPU) will be able to manipulate unsigned integers representable in that bit-size directly without having to store parts of the number in RAM.
Slight Digression
So let's say we have a 32-bit CPU, and somewhere at byte number α in RAM. The CPU could store the number 12345678 as:
> 00 BC 61 4E
> ↑ α ↑ α+1 ↑ α+2 ↑ α+3
(Figure 1)
Here the most significant part of the number, is sitting at the lowest memory address index α
Or the CPU could store it differently, where the least significant part of the number is sitting at the lowest memory.
> 4E 61 BC 00
> ↑ α ↑ α+1 ↑ α+2 ↑ α+3
(Figure 2)
The way a CPU stores a number is called Endianness (of the CPU). Where, if the most significant part is on the left then it's called Big-Endian CPU (Figure 1), or Little-Endian if it stores it as in (Figure 2)
Getting the correct answer (the wrong way)
Now that we have an idea of how things may be stored, let's try and pull this out still using sprintf.
We're going to use a couple of tricks here:
we'll convert the numbers to hexadecimal and then pad the number to 8 bytes
we'll use printf's (therefore sprintf) format string capability that if we want to use a variable to specify the width of an argument then we can use a * after the % sign like so:
printf("%*d", width, num);
If we set our format string to %0*x we get a hex number that's zero padded in output automatically, so:
sprintf(new, "%0*llx", sizeof(n), n);
Our program then becomes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned long swap_bytes(unsigned long n) {
int i, l;
char new[64] = "";
sprintf(new, "%0*llx", sizeof(n), n);
printf("The number is: %s\n", new);
l = strlen(new);
for(i = 0; i < l; i+=4) {
char tmp[2];
tmp[0] = new[i+2]; /* get next two characters */
tmp[1] = new[i+3];
new[i+2] = new[i];
new[i+3] = new[i+1];
new[i] = tmp[0];
new[i+1] = tmp[1];
}
return strtoul(new, NULL, 16); /* convert new back */
}
/* testing swap byte */
int main() {
printf("size of unsigned long is %ld\n", sizeof(unsigned long));
printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
/* how about 123456? */
printf("Swapping 123456: %llx\n", swap_bytes(123456));
printf("Swapping 123456: %llx\n", swap_bytes(98899));
}
The output would look something like:
size of unsigned long is 8
The number is: 00bc614e
Swapping 12345678: bc004e61
The number is: 0001e240
Swapping 123456: 10040e2
The number is: 00018253
Swapping 123456: 1005382
Obviously we can change our outputs by using %ld and print the base 10 versions of the numbers, rather than base 16 as is happening above. I'll leave that to you.
Now let's do it the right way
This is however rather terrible, since byte swapping can be done much faster without ever doing the integer to string and string to integer conversion.
Let's see how that's done:
The rather explicit way
Before we go on, just a bit on bit shifting in C:
If I have a number, say 6 (=1102) and I shift all the bits to the left by 1 I would get 12 (11002) (we simply shifted everything to the left adding zeros on the right as needed)
This is written in C as 6 << 1.
A right shift is similar and can be expressed in C with >> so if I have a number say 240 = (11110000)2 and I right-shift it 4 times I would get 15 = (1111)2 this is expressed as 240 >> 3
Now we have unsigned long integers which are (in my case at least) 64 bits long, or 8 bytes long.
Let's say my number is 12345678 which is (00 00 00 00 00 bc 61 4e)16 in hex at 8 bytes long. If I want to get the value of byte number 3 I can extract it by taking the number 0xFF (1111 1111) all bits of a byte set to 1 and left shifting it until i get to the byte 3 (so left shift 3*8 = 24 times) performing a bitwise and with the number and then right shifting the results to get rid of the zeros. This is what it looks like:
0xFF << (3 * 8) = 0xFF0000 & 0000 0000 00bc 614e = 0000 0000 00bc 0000
Now right shift:
0xFF0000 & 0000 0000 00bc 0000 >> (3 * 8) = bc
Another (better) way to do it would be to right shift first and then perform bitwise and with 0xFF to drop all higher bits:
0000 0000 00bc 614e >> 24 = 0000 0000 0000 00bc & 0xFF = bc
We will use the second way, and make a macro using #define now we can add the bytes back at the right location by right shifting each kth byte k+1 times and each k+1st byte k times.
Here is a sample implementation of this:
#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)
unsigned long swap_bytes(unsigned long n)
{
unsigned long long rv = 0ULL;
int k;
printf("number is %016llx\n", n);
for(k =0 ; k < sizeof(n); k+=2) {
printf("swapping bytes %d[%016lx] and %d[%016lx]\n", k, GET_BYTE(n, k),
k+1, GET_BYTE(n, k+1));
rv += GET_BYTE(n, k) << 8*(k+1);
rv += GET_BYTE(n, k+1) << 8*k;
}
return rv;
}
/* testing swap byte */
int main() {
printf("size of unsigned long is: %ld\n", sizeof(unsigned long));
printf("Swapping 12345678: %llx\n", swap_bytes(12345678));
/* how about 123456? */
printf("Swapping 123456: %llx\n", swap_bytes(123456));
printf("Swapping 123456: %llx\n", swap_bytes(98899));
}
But this can be done so much more efficiently. I leave it here for now. We'll come back to using bit blitting and xor swapping later.
Update with GET_BYTE as a function instead of a macro:
#define GET_BYTE(N, B) ((N >> (8 * (B))) & 0xFFUL)
Just for fun we also use a shift operator for multiplying by 8. You can note that left shifting a number by 1 is like multiplying it by 2 (makes sense since in binary 2 is 10 and multiplying by 10 adds a zero to the end and therefore is the same as shifting something left by one space) So multiplying by 8 (1000)2 is like shifting something three spaces over or basically tacking on 3 zeros (overflows notwithstanding):
unsigned long __inline__ get_byte(const unsigned long n, const unsigned char idx) {
return ((n >> (idx << 3)) & 0xFFUL);
}
Now the really really fun and correct way to do this
Okay so a fast way to swap integers around is to realize that if we have two integers x, and y we can use properties of xor function to swap their values. The basic algorithm is this:
X := X XOR Y
Y := Y XOR X
X := X XOR Y
Now we know that a char is one byte in C. So we can force the compiler to treat the 8 byte integer as a sequence of 1-byte chars (hehe it's a bit of a mind bender considering everything I said about not doing it in sprintf) but this is different. You have to just think about it a bit.
We'll take the memory address of our integer, cast it to a char pointer (char *) and treat the result as an array of chars. Then we'll use the xor function property above to swap the two consecutive array values.
To do this I am going to use a macro (although we could use a function) but using a function will make the code uglier.
One thing you'll note is that there is the use of ?: in XORSWAP below. That's like an if-then-else in C but with expressions rather than statements, so basically (conditional_expression) ? (value_if_true) : (value_if_false) means if conditional_expression is non-zero the result will be value_if_true, otherwise it will be value_if_false. AND it's important not to xor a value with itself because you will always get 0 as a result and clobber the content. So we use the conditional to check if the addresses of the values we are changing are DIFFERENT from each other. If the addresses are the same (&a == &b) we simply return the value at the address (&a == &b) ? a : (otherwise_do_xor)
So let's do it:
#include <stdio.h>
/* this macro swaps any two non floating C values that are at
* DIFFERENT memory addresses. That's the entire &a == &b ? a : ... business
*/
#define XORSWAP(a, b) ((&(a) == &(b)) ? (a) : ((a)^=(b),(b)^=(a),(a)^=(b)))
unsigned long swap_bytes(const unsigned long n) {
unsigned long rv = n; /* we are not messing with original value */
int k;
for(k = 0; k < sizeof(rv); k+=2) {
/* swap k'th byte with k+1st byte */
XORSWAP(((char *)&rv)[k], ((char *)&rv)[k+1]);
}
return rv;
}
int main()
{
printf("swapped: %lx", swap_bytes(12345678));
return 0;
}
Here endeth the lesson. I hope that you will go through all the examples. If you have any more questions just ask in comments and I'll try to elaborate.
unsigned long swap_bytes(unsigned long n) {
char new[64];
sprintf(new, "%lu", n);
printf("Char array is now: %s\n", new);
}
You need to use %lu - long unsigned, for format in sprintf(), the compiler should also given you conversion lacks type warning because of this.
To get it to print you need to use %lu (for unsigned)
It doesn't seem like you attempted the swap, could I see your try?

Increment a byte by one as if it were a Base10 number

I'm reading data from a serial port from a hardware device which I need to increment by 1 and then send back out to the device however, I need to increment it as if it were a base10 number.
For example, if I read 0x09, I need to send back 0x10 rather than 0x0a. Or, if I receive 0x89, I should send back 0x90. If I receive 0x99, I send back 0x00 and carry the 1 up to the previous byte. It's actually a total of 5 bytes I have to run through.
I have this increment working in the following way. I'd like know know if there's a better way through some unique shifting and/or and/or'ing of bits.
Thank you for any pointers you can provide!
Stateful
#include <stdio.h>
#include <stdlib.h>
int main()
{
//start with 0x09 as byte
char input = 0x09;
printf("input is: 0x%02x\n", input);
//increment it by one
input++;
//turn it into a two char array as a base10 value, ignore overflow for now
char asString[3];
sprintf(asString, "%d", input);
//convert back to byte
unsigned char newI = ((asString[0]-0x30)*16)+((asString[1]-0x30));
printf ("newI is 0x%02x\n", newI);
return 0;
}
You compute the modulo 16 of the received number.
If it's 9, you add 7, else 1.
Convert the whole byte sequence from BCD to an integer type, add, then convert back. Something like these functions should work for the conversion, but be aware that you may need a longer type than unsigned if you need to support completely arbitrary 5-byte BCD sequences (but 5 bytes of BCD suspiciously coincides with the range of a 32-bit integer).
/*
* Decodes a BCD byte sequence to an unsigned integer. The bytes are assumed to be in
* order from most- to least-significant.
*/
unsigned bcd_to_int(unsigned char bytes[], int byte_count) {
unsigned result = 0;
int counter;
for (counter = 0; counter < byte_count; counter += 1) {
result = result * 100 + (bytes[counter] & 0xf0) * 10 + (bytes[counter] & 0x0f);
}
return result;
}
/*
* Encodes an unsigned integer into a BCD byte sequence. The bytes will be ordered
* from most- to least-significant.
*/
void int_to_bcd(unsigned char bytes[], int byte_count, unsigned value) {
int counter;
for (counter = byte_count; counter-- > 0; ) {
unsigned chunk = value % 100;
bytes[counter] = (chunk / 10) * 0x10 + (chunk % 10);
value /= 100;
}
}
You could also implement long-form addition directly on your byte sequence; that might perform as well or better, but if you want to perform more or different operations than a single add / increment then it will be to your advantage to use native arithmetic.

Memory layout of struct having bitfields

I have this C struct: (representing an IP datagram)
struct ip_dgram
{
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
unsigned int tlen : 16;
unsigned int fid : 16;
unsigned int flags : 3;
unsigned int foff : 13;
unsigned int ttl : 8;
unsigned int pcol : 8;
unsigned int chksm : 16;
unsigned int src : 32;
unsigned int des : 32;
unsigned char opt[40];
};
I'm assigning values to it, and then printing its memory layout in 16-bit words like this:
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned short int* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
for(j=15 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}
ptr++;
printf("\n");
}
}
int main()
{
struct ip_dgram dgram;
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
dgram.tlen = 28;
dgram.fid = 1;
dgram.flags = 0;
dgram.foff = 0;
dgram.ttl = 4;
dgram.pcol = 17;
dgram.chksm = 0;
dgram.src = (unsigned int)htonl(inet_addr("10.12.14.5"));
dgram.des = (unsigned int)htonl(inet_addr("12.6.7.9"));
print_dgram(dgram);
return 0;
}
I get this output:
00000000 01010100
00000000 00011100
00000000 00000001
00000000 00000000
00010001 00000100
00000000 00000000
00001110 00000101
00001010 00001100
00000111 00001001
00001100 00000110
But I expect this:
The output is partially correct; somewhere, the bytes and nibbles seem to be interchanged. Is there some endianness issue here? Are bit-fields not good for this purpose? I really don't know. Any help? Thanks in advance!
No, bitfields are not good for this purpose. The layout is compiler-dependant.
It's generally not a good idea to use bitfields for data where you want to control the resulting layout, unless you have (compiler-specific) means, such as #pragmas, to do so.
The best way is probably to implement this without bitfields, i.e. by doing the needed bitwise operations yourself. This is annoying, but way easier than somehow digging up a way to fix this. Also, it's platform-independent.
Define the header as just an array of 16-bit words, and then you can compute the checksum easily enough.
The C11 standard says:
An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
I'm pretty sure this is undesirable, as it means there might be padding between your fields, and that you can't control the order of your fields. Not just that, but you're at the whim of the implementation in terms of network byte order. Additionally, imagine if an unsigned int is only 16 bits, and you're asking to fit a 32-bit bitfield into it:
The expression that specifies the width of a bit-field shall be an
integer constant expression with a nonnegative value that does not
exceed the width of an object of the type that would be specified were
the colon and expression omitted.
I suggest using an array of unsigned chars instead of a struct. This way you're guaranteed control over padding and network byte order. Start off with the size in bits that you want your structure to be, in total. I'll assume you're declaring this in a constant such as IP_PACKET_BITCOUNT: typedef unsigned char ip_packet[(IP_PACKET_BITCOUNT / CHAR_BIT) + (IP_PACKET_BITCOUNT % CHAR_BIT > 0)];
Write a function, void set_bits(ip_packet p, size_t bitfield_offset, size_t bitfield_width, unsigned char *value) { ... } which allows you to set the bits starting at p[bitfield_offset / CHAR_BIT] bit bitfield_offset % CHARBIT to the bits found in value, up to bitfield_width bits in length. This will be the most complicated part of your task.
Then you could define identifiers for VER_OFFSET 0 and VER_WIDTH 4, HLEN_OFFSET 4 and HLEN_WIDTH 4, etc to make modification of the array seem less painless.
Although question was asked long time back, there's no answer with explaination of your result. I'll answer it, hopefully it'll be useful to someone.
I'll illustrate the bug using first 16 bits of your data structure.
Please Note: This explaination is guarranteed to be true only with the set of your processor and compiler. If any of these changes, behaviour may change.
Fields:
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
Assigned to:
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
Compiler starts assigning bit fields starting with offset 0. This means first byte of your data structure is stored in memory as:
Bit offset: 7 4 0
-------------
| 5 | 4 |
-------------
First 16 bits after assignment look like this:
Bit offset: 15 12 8 4 0
-------------------------
| 5 | 4 | 0 | 0 |
-------------------------
Memory Address: 100 101
You are using Unsigned 16 pointer to dereference memory address 100. As a result address 100 is treated as LSB of a 16 bit number. And 101 is treated as MSB of a 16 bit number.
If you print *ptr in hex you'll see this:
*ptr = 0x0054
Your loop is running on this 16 bit value and hence you get:
00000000 0101 0100
-------- ---- ----
0 5 4
Solution:
Change order of elements to
unsigned int hlen : 4;
unsigned int ver : 4;
unsigned int stype : 8;
And use unsigned char * pointer to traverse and print values.
It should work.
Please note, as others've said, this behavior is platform and compiler specific. If any of these changes, you need to verify that memory layout of your data structure is correct.
For Chinese users, I think you can refer blog for more details, really good.
In summary, due to endianness, there is byte order as well as bit order. Bit order is the order how each bit of one byte saved in memory. Bit order has same rule with byte order in sense of endianness issue.
For your picture, it's designed in network order which is big endian. So your struct defination is actually for big endian. Per your output, your PC is little endian, so you need change struct field orders when use.
The way to show each bits is incorrect since when get by char, the bit order has changed from machine order (little endian in your case) to normal order which we human use. You may change it as following per refered blog.
void
dump_native_bits_storage_layout(unsigned char *p, int bytes_num)
{
union flag_t {
unsigned char c;
struct base_flag_t {
unsigned int p7:1,
p6:1,
p5:1,
p4:1,
p3:1,
p2:1,
p1:1,
p0:1;
} base;
} f;
for (int i = 0; i < bytes_num; i++) {
f.c = *(p + i);
printf("%d%d%d%d %d%d%d%d ",
f.base.p7,
f.base.p6,
f.base.p5,
f.base.p4,
f.base.p3,
f.base.p2,
f.base.p1,
f.base.p0);
}
printf("\n");
}
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned char* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
dump_native_bits_storage_layout(ptr, 1);
/* for(j=7 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}*/
ptr++;
//printf("\n");
}
}
#unwind
A typical use case of Bit Fields is interpreting/emulation of byte code or CPU instructions with given layout. "Don't use it, because you cannot control it" is the answer for children.
#Bruce
For Intel/GCC I see a packed LITTLE ENDIAN bit layout, i.e. in struct ip_dgram field ver is represented by bits 0..3, field hlen is represented by bits 4..7 ...
For correctness of operation it is required to verify the memory layout against your design at runtime.
struct ModelIndicator
{
int a:4;
int b:4;
int c:4;
};
union UModelIndicator
{
ModelIndicator i;
int v;
};
// test packed little endian
static bool verifyLayoutModel()
{
UModelIndicator um;
um.v = 0;
um.i.a = 2; // 0..3
um.i.b = 3; // 4..7
um.i.c = 9; // 8..11
return um.v = (9 << 8) + (3 << 4) + 2;
}
int main()
{
if (!verifyLayoutModel())
{
std::cerr << "Invalid memory layout" << std::endl;
return -1;
}
// ...
}
At the earliest, when above test fails, you need to consider compiler pragmas or adjust your structures accordingly, resp. verifyLayoutModel().
I agree with what unwind said. Bit fields are compiler dependent.
If you need the bits to be in a specific order, pack the data into a pointer to a character array. Increment the buffer the size of the element being packed. Pack the next element.
pack( char** buffer )
{
if ( buffer & *buffer )
{
//pack ver
//assign first 4 bits to 4.
*((UInt4*) *buffer ) = 4;
*buffer += sizeof(UInt4);
//assign next 4 bits to 5
*((UInt4*) *buffer ) = 5;
*buffer += sizeof(UInt4);
... continue packing
}
}
Compiler dependant or not, It depends whether you want to write a very fast program or if you want one that works with different compilers. To write for C a fast, compact application, use a stuct with bit fields/. If you want a slow general purpose program , long code it.

Converting a byte array to an int array in C

I have some code below that is supposed to be converting a C (Arduino) 8-bit byte array to a 16-bit int array, but it only seems to partially work. I'm not sure what I'm doing wrong.
The byte array is in little endian byte order. How do I convert it to an int (two bytes per enty) array?
In layman's terms, I want to merge every two bytes.
Currently it is outputting for an input BYTE ARRAY of: {0x10, 0x00, 0x00, 0x00, 0x30, 0x00}. The output INT ARRAY is: {1,0,0}. The output should be an INT ARRAY is: {1,0,3}.
The code below is what I currently have:
I wrote this function based on a solution in Stack Overflow question Convert bytes in a C array as longs.
I also have this solution based off the same code which works fine for byte array to long (32-bits) array http://pastebin.com/TQzyTU2j.
/**
* Convert the retrieved bytes into a set of 16 bit ints
**/
int * byteA2IntA(byte * byte_slice, int sizeOfB, int * ret_array){
//Variable that stores the addressed int to be stored in SRAM
int currentInt;
int sizeOfI = sizeOfB / 2;
if(sizeOfB % 2 != 0) ++sizeOfI;
for(int i = 0; i < sizeOfB; i+=2){
currentInt = 0;
if(byte_slice[i]=='\0') {
break;
}
if(i + 1 < sizeOfB)
currentInt = (currentInt << 8) + byte_slice[i+1];
currentInt = (currentInt << 8) + byte_slice[i+0];
*ret_array = currentInt;
ret_array++;
}
//Pointer to the return array in the parent scope.
return ret_array;
}
What is the meaning of this line of code?
if(i + 1 < sizeOfB) currentInt = (currentInt << 8) + byte_slice[i+1];
Here currentInt is always 0 and 0 << 8 = 0.
Also what you do is, for each couple of bytes (let me call them uint8_t from now on), you pack an int (let me call it uint16_t from now on) by doing the following:
You take the rightmost uint8_t
You shift it 8 positions to the left
You add the leftmost uint8_t
Is this really what you want?
Supposing you have byte_slice[] = {1, 2}, you pack a 16 bit integer with the value 513 (2<<8 + 1)!
Also, you don't need to return the pointer to the array of uint16_t as the caller has already provided it to the function.
If you use the return of your function, as Joachim said, you get a pointer starting from a position of the uint16_t array which is not position [0].
Vincenzo has a point (or two), you need to be clear what you're trying to do;
Combine two bytes to one 16-bit int, one byte being the MSB and one byte being the LSB
int16 result = (byteMSB << 8) | byteLSB;
Convert an array of bytes into 16-bit
for(i = 0; i < num_of_bytes; i++)
{
myint16array[i] = mybytearray[i];
}
Copy an array of data into another one
memcpy(dest, src, num_bytes);
That will (probably, platform/compiler dependent) have the same effect as my 1st example.
Also, beware of using ints as that suggests signed values, use uints, safer and probably faster.
The problem is most likely that you increase ret_array and then return it. When you return it, it will point to one place beyond the destination array.
Save the pointer at the start of the function, and use that pointer instead.
Consider using a struct. This is kind of a hack, though.
Off the top of my head it would look like this.
struct customINT16 {
byte ByteHigh;
byte ByteLow;
}
So in your case you would write:
struct customINT16 myINT16;
myINT16.ByteHigh = BYTEARRAY[0];
myINT16.ByteLow = BYTEARRAY[1];
You'll have to go through a pointer to cast it, though:
intpointer = (int*)(&myINT16);
INTARRAY[0] = *intpointer;

Decoding Binary via fget / buffer string (Trying to get mp3 header)

I'm writing some quick code to try and extract data from an mp3 file header.
The objective is to extract information from the header such as the bitrate and other vital information so that I can appropriately stream the file to a mp3decoder with the necessary arguments.
Here is a wikipedia image showing the mp3header information:
http://upload.wikimedia.org/wikipedia/commons/0/01/Mp3filestructure.svg
My question is, am I attacking this correctly? Printing the data received is worthless -- I just get a bunch of random characters. I need to get to the binary so that I can decode it and determine vital information.
Here is my baseline code:
// mp3 Header File IO.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
// Main function
int main (void)
{
// Declare variables
FILE *mp3file;
char *mp3syncword; // we will need to allocate memory to this!!
char requestedFile[255] = "";
unsigned long fileLength;
// Counters
int i;
// Memory allocation with malloc
mp3syncword=(char *)malloc(2000);
// Let's get the name of the requested file (hard-coded for now)
strcpy(requestedFile,"testmp3.mp3");
// Open the file with mode read, binary
mp3file = fopen(requestedFile, "rb");
if (!mp3file){
// If we can't find the file, notify the user of the problem
printf("Not found!");
}
// Let's get some header data from the file
fseek(mp3file,1,SEEK_SET);
fread(mp3syncword,32,1,mp3file);
// For debug purposes, lets print the received data
for(i = 0; i < 32; ++i)
printf("%c", ((char *)mp3syncword)[i]);
enter code here
return 0;
}
Help appreciated.
You are printing the bytes out using %c as the format specifier. You need to use an unsigned numeric format specifier (e.g. %u for a decimal number or %x or %X for hexadecimal) to print the byte values.
You should also declare your byte arrays as unsigned char as they are signed by default on Windows.
You might also want to print out a space (or other separator) after each byte value to make the output clearer.
The standard printf does not provide a binary representation type specifier. Some implementations do have this but the version supplied with Visual Studio does not. In order to output this you will need to perform bit operations on the number to extract the individual bits and print each of them in turn for each byte. For example:
unsigned char byte = // Read from file
unsigned char mask = 1; // Bit mask
unsigned char bits[8];
// Extract the bits
for (int i = 0; i < 8; i++) {
// Mask each bit in the byte and store it
bits[i] = (byte & (mask << i)) >> i;
}
// The bits array now contains eight 1 or 0 values
// bits[0] contains the least significant bit
// bits[7] contains the most significant bit
C does not have a printf() specifier to print in binary. Most people print in hex instead, which will give you (typically) eight bits at a time:
printf("the first eight bits are %02x\n", (unsigned char) mp3syncword[0]);
You will need to interpret this manually to figure out the values of individual bits. The cast to unsigned char on the argument is to avoid surprises if it's negative.
To test bits, you can use use the & operator together with the bitwise left shift operator, <<:
if(mp3syncword[2] & (1 << 2))
{
/* The third bit from the right of the third byte was set. */
}
If you want to be able to use "big" (larger than 7) indexes for bits, i.e. treat the data as a 32-bit word, it might be good to read it into e.g. an unsigned int, and then inspect that. Be careful with endian-ness when you do this reading, however.
Warning: there are probably errors with memory layout and/or endianess with this approach. It is not guaranteed that the struct members match the same bits from computer to computer.
In short: don't rely on this (I'll leave the answer, it might be useful for something else)
You can define a struct with bit fields:
struct MP3Header {
unsigned SyncWord : 12;
unsigned Version : 1;
unsigned Layer : 2;
unsigned ErrorProtection : 1;
unsigned BitRate : 4;
unsigned Frequency : 2;
unsigned PadBit : 1;
unsigned PrivBit : 1;
unsigned Mode : 2;
unsigned ModeExtension : 2;
unsigned Copy : 1;
unsigned Original : 1;
unsigned Emphasis : 2;
};
and then use each member as an isolated value:
struct MP3Header h;
/* ... */
fread(&h, sizeof h, 1, mp3file); /* error check!! */
printf("Frequency: %u\n", h.Frequency);

Resources