I'm trying to convert a struct to a char array to send over the network. However, I get some weird output from the char array when I do.
#include <stdio.h>
struct x
{
int x;
} __attribute__((packed));
int main()
{
struct x a;
a.x=127;
char *b = (char *)&a;
int i;
for (i=0; i<4; i++)
printf("%02x ", b[i]);
printf("\n");
for (i=0; i<4; i++)
printf("%d ", b[i]);
printf("\n");
return 0;
}
Here is the output for various values of a.x (on an X86 using gcc):
127:
7f 00 00 00
127 0 0 0
128:
ffffff80 00 00 00
-128 0 0 0
255:
ffffffff 00 00 00
-1 0 0 0
256:
00 01 00 00
0 1 0 0
I understand the values for 127 and 256, but why do the numbers change when going to 128? Why wouldn't it just be:
80 00 00 00
128 0 0 0
Am I forgetting to do something in the conversion process or am I forgetting something about integer representation?
*Note: This is just a small test program. In a real program I have more in the struct, better variable names, and I convert to little-endian.
*Edit: formatting
What you see is the sign preserving conversion from char to int. The behavior results from the fact that on your system, char is signed (Note: char is not signed on all systems). That will lead to negative values if a bit-pattern yields to a negative value for a char. Promoting such a char to an int will preserve the sign and the int will be negative too. Note that even if you don't put a (int) explicitly, the compiler will automatically promote the character to an int when passing to printf. The solution is to convert your value to unsigned char first:
for (i=0; i<4; i++)
printf("%02x ", (unsigned char)b[i]);
Alternatively, you can use unsigned char* from the start on:
unsigned char *b = (unsigned char *)&a;
And then you don't need any cast at the time you print it with printf.
The x format specifier by itself says that the argument is an int, and since the number is negative, printf requires eight characters to show all four non-zero bytes of the int-sized value. The 0 modifier tells to pad the output with zeros, and the 2 modifier says that the minimum output should be two characters long. As far as I can tell, printf doesn't provide a way to specify a maximum width, except for strings.
Now then, you're only passing a char, so bare x tells the function to use the full int that got passed instead — due to default argument promotion for "..." parameters. Try the hh modifier to tell the function to treat the argument as just a char instead:
printf("%02hhx", b[i]);
char is a signed type; so with two's complement, 0x80 is -128 for an 8-bit integer (i.e. a byte)
Treating your struct as if it were a char array is undefined behavior. To send it over the network, use proper serialization instead. It's a pain in C++ and even more so in C, but it's the only way your app will work independently of the machines reading and writing.
http://en.wikipedia.org/wiki/Serialization#C
Converting your structure to characters or bytes the way you're doing it, is going to lead to issues when you do try to make it network neutral. Why not address that problem now? There are a variety of different techniques you can use, all of which are likely to be more "portable" than what you're trying to do. For instance:
Sending numeric data across the network in a machine-neutral fashion has long been dealt with, in the POSIX/Unix world, via the functions htonl, htons, ntohl and ntohs. See, for example, the byteorder(3) manual page on a FreeBSD or Linux system.
Converting data to and from a completely neutral representation like JSON is also perfectly acceptable. The amount of time your programs spend converting the data between JSON and native forms is likely to pale in comparison to the network transmission latencies.
char is a signed type so what you are seeing is the two-compliment representation, casting to (unsigned char*) will fix that (Rowland just beat me).
On a side note you may want to change
for (i=0; i<4; i++) {
//...
}
to
for (i=0; i<sizeof(x); i++) {
//...
}
The signedness of char array is not the root of the problem! (It is -a- problem, but not the only problem.)
Alignment! That's the key word here. That's why you should NEVER try to treat structs like raw memory. Compliers (and various optimization flags), operating systems, and phases of the moon all do strange and exciting things to the actual location in memory of "adjacent" fields in a structure. For example, if you have a struct with a char followed by an int, the whole struct will be EIGHT bytes in memory -- the char, 3 blank, useless bytes, and then 4 bytes for the int. The machine likes to do things like this so structs can fit cleanly on pages of memory, and such like.
Take an introductory course to machine architecture at your local college. Meanwhile, serialize properly. Never treat structs like char arrays.
When you go to send it, just use:
(char*)&CustomPacket
to convert. Works for me.
You may want to convert to a unsigned char array.
Unless you have very convincing measurements showing that every octet is precious, don't do this. Use a readable ASCII protocol like SMTP, NNTP, or one of the many other fine Internet protocols codified by the IETF.
If you really must have a binary format, it's still not safe just to shove out the bytes in a struct, because the byte order, basic sizes, or alignment constraints may differ from host to host. You must design your wire protcol to use well-defined sizes and to use a well defined byte order. For your implementation, either use macros like ntohl(3) or use shifting and masking to put bytes into your stream. Whatever you do, make sure your code produces the same results on both big-endian and little-endian hosts.
Related
#include <stdio.h>
int main()
{
typedef union{
int a ;
char c;
float f;
} myu;
myu sam;
sam.a = 10;
sam.f=(float)5.99;
sam.c= 'H';
printf("%d\n %c\n %f\n",sam.a,sam.c,sam.f);
return 0;
}
Output
1086303816
H
5.990025
How come the value of integer has changed so drastically while the float is almost the same.
The fields of a union all share the same starting memory address. This means that writing to one member will overwrite the contents of another.
When you write one member and then read a different member, the representation of the written member (i.e. how it is laid out in memory) is reinterpreted as the representation of the read member. Integers and floating point types have very different representations, so it makes sense that reading a float as though it were an int can vary greatly.
Things become even more complicated if the two types are not the same size. If a smaller field is written and a larger field is read, the excess bytes might not have event been initialized.
In your example, you first write the value 10 to the int member. Then you write the value 5.99 to the float member. Assuming int and float are both 4 bytes in length, all of the bytes used by the int member are overwritten by the float member.
When you then change the char member, this only changes the first byte. Assuming a float is represented in little-endian IEEE754, this changes just the low-order byte of the mantissa, so only the digits furthest to the right are affected.
Try this. Instead of using printf (which will mainly output nonesens) show the raw memory after each modification.
The code below assumes that int and float are 32 bit types and that your compiler does not add padding bytes in this union.
#include <string.h>
#include <stdio.h>
#include <assert.h>
void showmemory(void* myu)
{
unsigned char memory[4];
memcpy(memory, myu, 4);
for (int i = 0; i < 4; i++)
{
printf("%02x ", memory[i]);
}
printf("\n");
}
int main()
{
typedef union {
int a;
char c;
float f;
} myu;
assert(sizeof(myu) == 4); // assume size of the union is 4 bytes
myu sam;
sam.a = 10;
showmemory(&sam);
sam.f = (float)5.99;
showmemory(&sam);
sam.c = 'H';
showmemory(&sam);
}
Possible output on a little endian system:
0a 00 00 00 // 0a is 10 in hexadecimal
14 ae bf 40 // 5.99 in float
48 ae bf 40 // 48 is 'H'
How come the value of integer has changed so drastically while the float is almost the same.
That is just a coincidence. Your union will be stored in 4 bytes. When you assign the field "a" to 10, the binary representation of the union is 0x0000000A. And then, when you assign the field
f to 5.99, it becomes 0x40bfae14. Finally, you set the c to 'H' (0x48 in Hex), it will overwrite the first byte, which corresponds to the mantissa of the float value. Thus, the float part changes slightly. For more information about floating-point encoding, you can check this handy website out.
In "traditional" C, any object that are not bitfields and do not have a register storage class will represent an association between a sequence of consecutive bytes somewhere in memory and a means of reading or writing values of the object's type. Storing a value into an object of type T will convert the value into a a pattern of sizeof (T) * CHAR_BIT bits, and store that pattern into the associated memory. Reading an object of type T will read the sizeof (T) * CHAR_BIT bits from the object's associated storage and convert that bit pattern into a value of type T, without regard for how the underlying storage came to hold that bit pattern.
A union object serves to reserve space for the largest member, and then creates an association between each member and a region of storage that begins at the start of the union object. Any write to a member of a union will affect the appropriate part of the underlying storage, and any read of a union member will interpret whatever happens to be in the underlying storage as a value of its type. This will be true whether the member is accessed directly, or via pointer or array syntax.
The "traditional C" model is really quite simple. The C Standard, however, is much more complicated because the authors wanted to allow implementations to deviate from that behavior when doing so wouldn't interfere with whatever their customers need to do. This in turn has been interpreted by some compiler writers as an invitation to deviate from the traditional behavior without any regard for whether the traditional behavior might be useful in more circumstances than the bare minimums mandated by the Standard.
I've a doubt here, i'm trying to use memcpy() to copy an string[9] to a unsigned long long int variable, here's the code:
unsigned char string[9] = "message";
string[8] = '\0';
unsigned long long int aux;
memcpy(&aux, string, 8);
printf("%llx\n", aux); // prints inverted data
/*
* expected: 6d65737361676565
* printed: 656567617373656d
*/
How do I make this copy without inverting the data?
Your system is using little endian byte ordering for integers. That means that the least significant byte comes first. For example, a 32 bit integer would store 258 (0x00000102) as 0x02 0x01 0x00 0x00.
Rather than copying your string into an integer, just loop through the characters and print each one in hex:
int i;
int len = strlen(string);
for (i=0; i<len; i++) {
printf("%02x ", string[i]);
}
printf("\n");
Since string is an array of unsigned char and you're doing bit manipulation for the purpose of implementing DES, you don't need to change it at all. Just use it as it.
Looks like you've just discovered by accident how CPUs store integer values. There's two competing schools of thought that are termed endian, with little-endian and big-endian both found in the wild.
If you want them in byte-for-byte order, an integer type will be problematic and should be avoided. Just use a byte array.
There are conversion functions that can go from one endian form to another, though you need to know what sort your architecture uses before converting properly.
So if you're reading in a binary value you must know what endian form it's in in order to import it correctly into a native int type. It's generally a good practice to pick a consistent endian form when writing binary files to avoid guessing, where the "network byte order" scheme used in the vast majority of internet protocols is a good default. Then you can use functions like htonl and ntohl to convert back and forth as necessary.
I wanted to print the actual bit representation of integers in C. These are the two approaches that I found.
First:
union int_char {
int val;
unsigned char c[sizeof(int)];
} data;
data.val = n1;
// printf("Integer: %p\nFirst char: %p\nLast char: %p\n", &data.f, &data.c[0], &data.c[sizeof(int)-1]);
for(int i = 0; i < sizeof(int); i++)
printf("%.2x", data.c[i]);
printf("\n");
Second:
for(int i = 0; i < 8*sizeof(int); i++) {
int j = 8 * sizeof(int) - 1 - i;
printf("%d", (val >> j) & 1);
}
printf("\n");
For the second approach, the outputs are 00000002 and 02000000. I also tried the other numbers and it seems that the bytes are swapped in the two. Which one is correct?
Welcome to the exotic world of endian-ness.
Because we write numbers most significant digit first, you might imagine the most significant byte is stored at the lower address.
The electrical engineers who build computers are more imaginative.
Someimes they store the most significant byte first but on your platform it's the least significant.
There are even platforms where it's all a bit mixed up - but you'll rarely encounter those in practice.
So we talk about big-endian and little-endian for the most part. It's a joke about Gulliver's Travels where there's a pointless war about which end of a boiled egg to start at. Which is itself a satire of some disputes in the Christian Church. But I digress.
Because your first snippet looks at the value as a series of bytes it encounters then in endian order.
But because the >> is defined as operating on bits it is implemented to work 'logically' without regard to implementation.
It's right of C to not define the byte order because hardware not supporting the model C chose would be burdened with an overhead of shuffling bytes around endlessly and pointlessly.
There sadly isn't a built-in identifier telling you what the model is - though code that does can be found.
It will become relevant to you if (a) as above you want to breakdown integer types into bytes and manipulate them or (b) you receive files for other platforms containing multi-byte structures.
Unicode offers something called a BOM (Byte Order Marker) in UTF-16 and UTF-32.
In fact a good reason (among many) for using UTF-8 is the problem goes away. Because each component is a single byte.
Footnote:
It's been pointed out quite fairly in the comments that I haven't told the whole story.
The C language specification admits more than one representation of integers and particularly signed integers. Specifically signed-magnitude, twos-complement and ones-complement.
It also permits 'padding bits' that don't represent part of the value.
So in principle along with tackling endian-ness we need to consider representation.
In principle. All modern computers use twos complement and extant machines that use anything else are very rare and unless you have a genuine requirement to support such platforms, I recommend assuming you're on a twos-complement system.
The correct Hex representation as string is 00000002 as if you declare the integer with hex represetation.
int n = 0x00000002; //n=2
or as you where get when printing integer as hex like in:
printf("%08x", n);
But when printing integer bytes 1 byte after the other, you also must consider the endianess, which is the byte order in multi-byte integers:
In big endian system (some UNIX system use it) the 4 bytes will be ordered in memory as:
00 00 00 02
While in little endian system (most of OS) the bytes will be ordered in memory as:
02 00 00 00
The first prints the bytes that represent the integer in the order they appear in memory. Platforms with different endian will print different results as they store integers in different ways.
The second prints the bits that make up the integer value most significant bit first. This result is independent of endian. The result is also independent of how the >> operator is implemented for signed ints as it does not look at the bits that may be influenced by the implementation.
The second is a better match to the question "Printing actual bit representation of integers in C". Although there is a lot of ambiguity.
It depends on your definition of "correct".
The first one will print the data exactly like it's laid out in memory, so I bet that's the one you're getting the maybe unexpected 02000000 for. *) IMHO, that's the correct one. It could be done simpler by just aliasing with unsigned char * directly (char pointers are always allowed to alias any other pointers, in fact, accessing representations is a usecase for char pointers mentioned in the standard):
int x = 2;
unsigned char *rep = (unsigned char *)&x;
for (int i = 0; i < sizeof x; ++i) printf("0x%hhx ", rep[i]);
The second one will print only the value bits **) and take them in the order from the most significant byte to the least significant one. I wouldn't call it correct because it also assumes that bytes have 8 bits, and because the shifting used is implementation-defined for negative numbers. ***) Furthermore, just ignoring padding bits doesn't seem correct either if you really want to see the representation.
edit: As commented by Gerhardh meanwhile, this second code doesn't print byte by byte but bit by bit. So, the output you claim to see isn't possible. Still, it's the same principle, it only prints value bits and starts at the most significant one.
*) You're on a "little endian" machine. On these machines, the least significant byte is stored first in memory. Read more about Endianness on wikipedia.
**) Representations of types in C may also have padding bits. Some types aren't allowed to include padding (like char), but int is allowed to have them. This second option doesn't alias to char, so the padding bits remain invisible.
***) A correct version of this code (for printing all the value bits) must a) correctly determine the number of value bits (8 * sizeof int is wrong because bytes (char) can have more then 8 bits, even CHAR_BIT * sizeof int is wrong, because this would also count padding bits if present) and b) avoid the implementation-defined shifting behavior by first converting to unsigned. It could look for example like this:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
int main(void)
{
int x = 2;
for (unsigned mask = 1U << (IMAX_BITS((unsigned)-1) - 1); mask; mask >>= 1)
{
putchar((unsigned) x & mask ? '1' : '0');
}
puts("");
}
See this answer for an explanation of this strange macro.
How can one use scanf to scan in an integer amount of characters and simply stuff them into an unsigned int without conversion?
Take an example, I have the following input characters (I have put them in hex for visibility):
5A 5F 03 00 FF FF 3D 2A
I want the first 4 (because 4 char's fit in an int). In base 10 (decimal) this is equal to 221018 (big-endian). Great! That's what I want in my int. This seems to work as expected:
scanf("%s", &my_integer);
Somehow it seems to get the endianness right, placing the first character in the LSB of the int (why?). As you would expect however this produces a compiler warning as the pointer must be to a character array (man 3 scanf).
An alternate approach without using scanf():
for (int i = 0; i < 4; i++)
{
my_integer |= (getchar() << i * 8);
}
Note that I don't intend to do any conversion here, I simple wish to use the pointer type to specify how many characters to read. The same is true if &my_integer was a long, I would read and store eight characters.
Simple really.
It appears my idea behind the use of scanf isn't correct and there must be a better approach.
How would you do it?
N.B. I'm aware type sizes are architecture dependent.
So you want to read 4 bytes from stdin and use them as they are as the representation of a 32-bit big-endian value:
int my_integer;
if (fread (&my_integer, sizeof my_integer, 1, stdin) != 1) {
/* Some problem... */
}
I have a simple code
char t = (char)(3000);
Then value of t is -72. The hex value of 3000 is 0xBB8. I couldn't understand why the value of t is -72.
Thanks for your answers.
I don't know about Mac. So my result is -72. As I know, MAC is using Big Endian, so does it affect the result? I dont have any MAC computer to test so I want to know from MAC people.
The hex value of 3000 is 0xBB8.
And so the hex value of the char (which, by the way, appears to be signed on your compiler) is 0xB8.
If it were unsigned, 0xB8 would be 184. But since it's signed, its actual value is 256 less, i.e. -72.
If you want to know why this is, read about two's complement notation.
A char is 8 bits (which can only represent a 0-255 range). Trying to cast 3000 to a char is... impossible impossible, at least for what you are intending.
This is happening because 3000 is too big a value and causes an overflow. Char is generally from -128 to 127 signed, or 0 to 255 unsigned, but it can change depending upon the implementation.
char is an integral type with certain range of representable values. int is also an integral type with certain range of representable values. Normally, range of int is [much] wider than that of char. When you try to squeeze into a char an int value that doesn't fit into the range of char, the value will not "fit", of course. The actual result is implementation-defined.
In your case 3000 is an int value that doesn't' fit into the range of char on your implementation. So, you won't get 3000 as the result. If you really want to know why it specifically came out as -72 - consult the documentation that came with your implementation.
As specified, the 16-bit hex value of 3000 is 0x0BB8. Although implementation specific, from your posted results this is likely stored in memory in 8-bit pairs as B8 0B (some architectures would store it as 0B B8. This is known as endianness.)
char, on the other hand, is probably not a 16-bit type. Again, this is implementation specific, but from your posted results it appears to be 8-bits, which is not uncommon.
So while your program has allocated 8-bits of memory for your value, you're storing twice as much information in that memory. When your program retrieves this value later, it will only be pulling the first stored octet, in this case B8. The 0B will be ignored, and may cause problems later down the line if it ended up overwriting something important. This is known as a buffer overflow, which is very bad.
Assuming two's complement (technically implementation specific, but a reasonable assumption), the hex value of B8 translates to either -72 or 184 in decimal, depending on whether your dealing with a signed or unsigned type. Since you didn't specify either, your compiler will go with it's default. Yet again, this is implementation specific, and it appears your compiler goes with signed char.
Therefore, you get -72. But don't expect the same results on any other system.
A char is (typically) just 8 bits, so you cant store values as large as 3000 (which would require at least 11 12 bits). So if you trie to store 3000 in a byte, it will just wrap.
Since 3000 is 0xBBA, it requires two bytes, one 0x0B and one which is 0xBA. If you try to store it in a single byte, you will just get one of them (0xBA). And since a byte is (typically) signed, that is -72.
char is used to hold a single character, and you're trying to store a 4-digit int in one. Perhaps you meant to use an array of chars, or string (char t[4] in this case).
To convert an int to a string (untested):
#include <stdlib.h>
int main() {
int num = 3000;
char numString[4];
itoa(num, buf, 10);
}
oh, i get it, it's overflow, it's like char is only from -256 to 256 or something like that i'm not sure, like if you have a var which type's max limit is 256 and you add 1 to it, than it becomes -256 and so on