uint32_t after = 0xe1ca95ee;
char new_buf[4];
memcpy(new_buf, &after, 4);
printf("%x\n", *new_buf); // I want to print the content of new_buf
I want to copy the content of after to new_buf. But the result is confusing. printf gives me ffffffee. It looks like an address. I have already dereferenced new_buf.
According to the comments, I can't use memcpy or strncpy to do this task. But why? memcpy and strncpy are only designed to handle char *? But the content of after is in memory.
PS: I know I should use sprintf or snprintf. If you can explain why memcpy and strncpy is not for this case, I appreciate it.
This is the problem right here:
printf("%x\n", *new_buf);
This gives you what you asked for: it prints the char at location new_buf using %x format. that location contains 0xee already (after your successful memcpy, least significant byte first, since you're most probably on an Intel machine, little endian), but it is printed as 0xffffffee (a negative number), since it's a char and not an unsigned char, and because ofcourse 0xee is a signed byte (> 0x7F, highest bit set)
You should use instead:
printf("%x\n", *((unsigned int*)new_buf));
...edited below...
Or rather:
printf("%x\n", *((uint32_t*)new_buf));
If you do this:
int i;
for (i = 0; i < 4; i++) printf("%x\n", new_buf[i]);
You can see it prints
ffffffee
ffffff95
ffffffca
ffffffe1
So your bytes are all there. As pointed out by #George André, they are signed bytes, so you see fs being padded to the front because the numbers are negative and it always prints 4 bytes, ee represented in 32 bits is ffffffee. You are probably on a little-endian machine so that the least significant byte ee is actually stored at the lowest memory position, which is why you get the "last" byte of your number when dereferencing new_buf. The other part is answered already by others, you must declare new_buf as unsigned or cast during printing.
uint32_t after = 0xe1ca95ee;
char new_buf[4];
memcpy(new_buf, &after, 4);
printf("%x\n", *((unsigned char*)new_buf));
Or alternatively
uint32_t after = 0xe1ca95ee;
unsinged char new_buf[4];
memcpy(new_buf, &after, 4);
printf("%x\n", *new_buf);
If you're trying to copy an integer and print it to stdout, as an integer, in base 16:
char new_buf[4];
...
printf("%x\n", *new_buf);
Whatever you stored in new_buf, it's type is still char[4]. So, the type of *new_buf is char (it's identical to new_buf[0]).
So, you're getting the first char of your integer (which may be the high or low byte, depending on platform), having it automatically promoted to an integer, and then printing that as an unsigned int in base 16.
memcpy has indeed copied your value into the array, but if you want to print it, use
printf("%x\n", *(uint32_t *)new_buf);
or
printf("%02x%02x%02x%02x\n", new_buf[0], new_buf[1], new_buf[2], new_buf[3]);
(but note in the latter case your byte order may be reversed, depending on platform).
If you're trying to create a char array containing a base-16 string representation of your number:
Don't use memcpy, that doesn't convert from the integer to its string representation.
Try
uint32_t after = 0xe1ca95ee;
char new_buf[1 + 2*sizeof(after)];
snprintf(new_buf, sizeof(new_buf), "%x", after);
printf("original %x formatted as '%s'\n", after, new_buf);
(the buffer is sized to give 2 chars per octet, plus one for the nul-terminator).
Related
I'm trying to convert 2 bytes array to an unsigned short.
this is the code for the conversion :
short bytesToShort(char* bytesArr)
{
short result =(short)((bytesArr[1] << 8)|bytesArr[0]);
return result;
}
I have an InputFile which stores bytes, and I read its bytes via loop (2 bytes each time) and store it in char N[] arr in this manner :
char N[3];
N[2]='\0';
while(fread(N,1,2,inputFile)==2)
when the (hex) value of N[0]=0 the computation is correct otherwise its wrong, for example :
0x62 (N[0]=0x0,N[1]=0x62) will return 98 (in short value), but 0x166 in hex (N[0]=0x6,N[1]=0x16) will return 5638 (in short value).
In the first place, it's generally best to use type unsigned char for the bytes of raw binary data, because that correctly expresses the semantics of what you're working with. Type char, although it can be, and too frequently is, used as a synonym for "byte", is better reserved for data that are actually character in nature.
In the event that you are furthermore performing arithmetic on byte values, you almost surely want unsigned char instead of char, because the signedness of char is implementation-defined. It does vary among implementations, and on many common implementations char is signed.
With that said, your main problem appears simple. You said
166 in hex (N[0]=6,N[1]=16) will return 5638 (in short value).
but 0x166 packed into a two-byte little-endian array would be (N[0]=0x66,N[1]=0x1). What you wrote would correspond to 0x1606, which indeed is the same as decimal 5638.
The problem is sign extension due to using char. You should use unsigned char instead:
#include <stdio.h>
short bytesToShort(unsigned char* bytesArr)
{
short result = (short)((bytesArr[1] << 8) | bytesArr[0]);
return result;
}
int main()
{
printf("%04x\n", bytesToShort("\x00\x11")); // expect 0x1100
printf("%04x\n", bytesToShort("\x55\x11")); // expect 0x1155
printf("%04x\n", bytesToShort("\xcc\xdd")); // expect 0xddcc
return 0;
}
Note: the problem in the code is not the one presented by the OP. The problem is returning the wrong result upon the input "\xcc\xdd". It will produce 0xffcc where it should be 0xddcc
I've a doubt here, i'm trying to use memcpy() to copy an string[9] to a unsigned long long int variable, here's the code:
unsigned char string[9] = "message";
string[8] = '\0';
unsigned long long int aux;
memcpy(&aux, string, 8);
printf("%llx\n", aux); // prints inverted data
/*
* expected: 6d65737361676565
* printed: 656567617373656d
*/
How do I make this copy without inverting the data?
Your system is using little endian byte ordering for integers. That means that the least significant byte comes first. For example, a 32 bit integer would store 258 (0x00000102) as 0x02 0x01 0x00 0x00.
Rather than copying your string into an integer, just loop through the characters and print each one in hex:
int i;
int len = strlen(string);
for (i=0; i<len; i++) {
printf("%02x ", string[i]);
}
printf("\n");
Since string is an array of unsigned char and you're doing bit manipulation for the purpose of implementing DES, you don't need to change it at all. Just use it as it.
Looks like you've just discovered by accident how CPUs store integer values. There's two competing schools of thought that are termed endian, with little-endian and big-endian both found in the wild.
If you want them in byte-for-byte order, an integer type will be problematic and should be avoided. Just use a byte array.
There are conversion functions that can go from one endian form to another, though you need to know what sort your architecture uses before converting properly.
So if you're reading in a binary value you must know what endian form it's in in order to import it correctly into a native int type. It's generally a good practice to pick a consistent endian form when writing binary files to avoid guessing, where the "network byte order" scheme used in the vast majority of internet protocols is a good default. Then you can use functions like htonl and ntohl to convert back and forth as necessary.
In my course for intro to operating systems, our task is to determine if a system is big or little endian. There's plenty of results I've found on how to do it, and I've done my best to reconstruct my own version of a code. I suspect it's not the best way of doing it, but it seems to work:
#include <stdio.h>
int main() {
int a = 0x1234;
unsigned char *start = (unsigned char*) &a;
int len = sizeof( int );
if( start[0] > start[ len - 1 ] ) {
//biggest in front (Little Endian)
printf("1");
} else if( start[0] < start[ len - 1 ] ) {
//smallest in front (Big Endian)
printf("0");
} else {
//unable to determine with set value
printf( "Please try a different integer (non-zero). " );
}
}
I've seen this line of code (or some version of) in almost all answers I've seen:
unsigned char *start = (unsigned char*) &a;
What is happening here? I understand casting in general, but what happens if you cast an int to a char pointer? I know:
unsigned int *p = &a;
assigns the memory address of a to p, and that can you affect the value of a through dereferencing p. But I'm totally lost with what's happening with the char and more importantly, not sure why my code works.
Thanks for helping me with my first SO post. :)
When you cast between pointers of different types, the result is generally implementation-defined (it depends on the system and the compiler). There are no guarantees that you can access the pointer or that it correctly aligned etc.
But for the special case when you cast to a pointer to character, the standard actually guarantees that you get a pointer to the lowest addressed byte of the object (C11 6.3.2.3 §7).
So the compiler will implement the code you have posted in such a way that you get a pointer to the least significant byte of the int. As we can tell from your code, that byte may contain different values depending on endianess.
If you have a 16-bit CPU, the char pointer will point at memory containing 0x12 in case of big endian, or 0x34 in case of little endian.
For a 32-bit CPU, the int would contain 0x00001234, so you would get 0x00 in case of big endian and 0x34 in case of little endian.
If you de reference an integer pointer you will get 4 bytes of data(depends on compiler,assuming gcc). But if you want only one byte then cast that pointer to a character pointer and de reference it. You will get one byte of data. Casting means you are saying to compiler that read so many bytes instead of original data type byte size.
Values stored in memory are a set of '1's and '0's which by themselves do not mean anything. Datatypes are used for recognizing and interpreting what the values mean. So lets say, at a particular memory location, the data stored is the following set of bits ad infinitum: 01001010 ..... By itself this data is meaningless.
A pointer (other than a void pointer) contains 2 pieces of information. It contains the starting position of a set of bytes, and the way in which the set of bits are to be interpreted. For details, you can see: http://en.wikipedia.org/wiki/C_data_types and references therein.
So if you have
a char *c,
an short int *i,
and a float *f
which look at the bits mentioned above, c, i, and f are the same, but *c takes the first 8 bits and interprets it in a certain way. So you can do things like printf('The character is %c', *c). On the other hand, *i takes the first 16 bits and interprets it in a certain way. In this case, it will be meaningful to say, printf('The character is %d', *i). Again, for *f, printf('The character is %f', *f) is meaningful.
The real differences come when you do math with these. For example,
c++ advances the pointer by 1 byte,
i++ advanced it by 4 bytes,
and f++ advances it by 8 bytes.
More importantly, for
(*c)++, (*i)++, and (*f)++ the algorithm used for doing the addition is totally different.
In your question, when you do a casting from one pointer to another, you already know that the algorithm you are going to use for manipulating the bits present at that location will be easier if you interpret those bits as an unsigned char rather than an unsigned int. The same operatord +, -, etc will act differently depending upon what datatype the operators are looking at. If you have worked in Physics problems wherein doing a coordinate transformation has made the solution very simple, then this is the closest analog to that operation. You are transforming one problem into another that is easier to solve.
Here is the program I used:
int hex = 0x23456789;
char * val = &hex;
printf("%p\n",hex);
printf("%p %p %p %p\n",*val,*(val+1),*(val+2),*(val+3));
Here is my output:
0x23456789
0xffffff89 0x67 0x45 0x23
I am working on a 64 bit CPU with a 64 bit OS. This shows my machine is little endian. Why is the first byte 0xffffff89? Why the ff's?
Firstly, you should be using %x since those aren't pointers.
The %x specifiers expect an integer. Because you are passing in a value of type 'char', which is a signed type, the value is being converted to an integer and being sign extended.
http://en.wikipedia.org/wiki/Sign_extension
That essentially means that it takes the most significant bit and uses it for all the higher bits. So 0x89 => 0b10001001 , which has a highest bit of '1' becomes 0xFFFFFF89.
The proper solution is to specify a 'length' parameter options. You can get more info here: Printf Placeholders Essentially, between the '%' and the 'x', you can put extra parameters. 'hh' means that you are passing a char value.
int hex = 0x23456789;
char *val = (char*)&hex;
printf("%x\n",hex);
printf("%hhx %hhx %hhx %hhx\n", val[0], val[1], val[2], val[3]);
char is a signed type, it gets promoted to int when passed as an argument. This promotion causes sign extension. 0x89 is a negative value for char, it gets thus sign-extended to 0xffffff89. This does not happen for the other values, they don't exceed CHAR_MAX, 127 or 0x7f on the most machines. You are getting confused by this behavior because you use the wrong format specifier.
%p is asking printf to format it as an address, you are actaully passing a value (*val)
On a 64 bit machine pointer addresses are 64bit, so printf is adding the ffff to pad the fields
As #Martin Beckett said, %p asks printf to print a pointer, which is equivalent to %#x or %#lx (the exact format depends on your OS).
This means printf expect an int or a long (again depends on OS), but you are only supplying it with char so the value is up-cast to the appropriate type.
When you cast a smaller signed number to a bigger signed number you have to do something called sign extension in order to preserve the value. In the case of 0x89 this occurs because the sign bit is set, so the upper bytes are 0xff and get printed because they are significant.
In the case of 0x67, 0x45, 0x23 sign extension does not happen because the sign bit is not set, and so the upper bytes are 0s and thus not printed.
I test the endian-ness with the condition ((char)((int)511) == (char)255). True means little, false means big.
I have tested this on a few separate systems, both little and big, using gcc with optimizations off and to max. In every test I have done I have gotten correct results.
You could put that condition in an if of your application before it needs to do endian-critical operations. If you only want to guarentee you are using the right endian-ness for your entire application, you could instead use a static assertion method such as follows:
extern char ASSERTION__LITTLE_ENDIAN[((char)((int)511) == (char)255)?1:-1];
That line in the global scope will create a compile error if the system is not little endian and will refuse to compile. If there was no error, it compiles perfectly as if that line didn't exist. I find that the error message is pretty descriptive:
error: size of array 'ASSERTION__LITTLE_ENDIAN' is negative
Now if you're paranoid of your compiler optimizing the actual check away like I am, you can do the following:
int endian;
{
int i = 255;
char * c = &i;
endian = (c[0] == (char)255);
}
if(endian) // if endian is little
Which compacts nicely in to this macro:
#define isLittleEndian(e) int e; { int i = 255; char * c = &i; e = (c[0] == (char)255); }
isLittleEndian(endian);
if(endian) // if endian is little
Or if you use GCC, you can get away with:
#define isLittleEndian ({int i = 255; char * c = &i; (c[0] == (char)255);})
if(isLittleEndian) // if endian is little
I am trying to write server that will communicate with any standard client that can make socket connections (e.g. telnet client)
It started out as an echo server, which of course did not need to worry about network byte ordering.
I am familiar with ntohs, ntohl, htons, htonl functions. These would be great by themselves if I were transfering either 16 or 32-bit ints, or if the characters in the string being sent were multiples of 2 or 4 bytes.
I'd like create a function that operates on strings such as:
str_ntoh(char* net_str, char* host_str, int len)
{
uint32_t* netp, hostp;
netp = (uint32_t*)&net_str;
for(i=0; i < len/4; i++){
hostp[i] = ntoh(netp[i]);
}
}
Or something similar. The above thing assumes that the wordsize is 32-bits. We can't be sure that the wordsize on the sending machine is not 16-bits, or 64-bits right?
For client programs, such as telnet, they must be using hton* before they send and ntoh* after they receive data, correct?
EDIT: For the people that thing because 1-char is a byte that endian-ness doesn't matter:
int main(void)
{
uint32_t a = 0x01020304;
char* c = (char*)&a;
printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]);
}
Run this snippet of code. The output for me is as follows:
$ ./a.out
4 3 2 1
Those on powerPC chipsets should get '1 2 3 4' but those of us on intel chipset should see what I got above for the most part.
Maybe I'm missing something here, but are you sending strings, that is, sequences of characters? Then you don't need to worry about byte order. That is only for the bit pattern in integers. The characters in a string are always in the "right" order.
EDIT:
Derrick, to address your code example, I've run the following (slightly expanded) version of your program on an Intel i7 (little-endian) and on an old Sun Sparc (big-endian)
#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint32_t a = 0x01020304;
char* c = (char*)&a;
char d[] = { 1, 2, 3, 4 };
printf("The integer: %x %x %x %x\n", c[0], c[1], c[2], c[3]);
printf("The string: %x %x %x %x\n", d[0], d[1], d[2], d[3]);
return 0;
}
As you can see, I've added a real char array to your print-out of an integer.
The output from the little-endian Intel i7:
The integer: 4 3 2 1
The string: 1 2 3 4
And the output from the big-endian Sun:
The integer: 1 2 3 4
The string: 1 2 3 4
Your multi-byte integer is indeed stored in different byte order on the two machines, but the characters in the char array have the same order.
With your function signature as posted you don't have to worry about byte order. It accepts a char*, that can only handle 8-bit characters. With one byte per character, you cannot have a byte order problem.
You'd only run into a byte order problem if you send Unicode, either in UTF16 or UTF32 encoding. And the endian-ness of the sending machine doesn't match the one of the receiving machine. The simple solution for that is to use UTF8 encoding. Which is what most text is sent as across networks. Being byte oriented, it doesn't have a byte order issue either. Or you could send a BOM.
If you'd like to send them as an 8-bit encoding (the fact that you're using char implies this is what you want), there's no need to byte swap. However, for the unrelated issue of non-ASCII characters, so that the same character > 127 appears the same on both ends of the connection, I would suggest that you send the data in something like UTF-8, which can represent all unicode characters and can be safely treated as ASCII strings. The way to get UTF-8 text based on the default encoding varies by the platform and set of libraries you're using.
If you're sending 16-bit or 32-bit encoding... You can include one character with the byte order mark which the other end can use to determine the endianness of the character. Or, you can assume network byte order and use htons() or htonl() as you suggest. But if you'd like to use char, please see the previous paragraph. :-)
It seems to me that the function prototype doesn't match its behavior. You're passing in a char *, but you're then casting it to uint32_t *. And, looking more closely, you're casting the address of the pointer, rather than the contents, so I'm concerned that you'll get unexpected results. Perhaps the following would work better:
arr_ntoh(uint32_t* netp, uint32_t* hostp, int len)
{
for(i=0; i < len; i++)
hostp[i] = ntoh(netp[i]);
}
I'm basing this on the assumption that what you've really got is an array of uint32_t and you want to run ntoh() on all of them.
I hope this is helpful.