This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C Macro definition to determine big endian or little endian machine?
int main()
{
int x = 1;
char *y = (char*)&x;
printf("%c\n",*y+48);
}
If it's little endian it will print 1. If it's big endian it will print 0. Is that correct? Or will setting a char* to int x always point to the least significant bit, regardless of endianness?
In short, yes.
Suppose we are on a 32-bit machine.
If it is little endian, the x in the memory will be something like:
higher memory
----->
+----+----+----+----+
|0x01|0x00|0x00|0x00|
+----+----+----+----+
A
|
&x
so (char*)(&x) == 1, and *y+48 == '1'. (48 is the ascii code of '0')
If it is big endian, it will be:
+----+----+----+----+
|0x00|0x00|0x00|0x01|
+----+----+----+----+
A
|
&x
so this one will be '0'.
The following will do.
unsigned int x = 1;
printf ("%d", (int) (((char *)&x)[0]));
And setting &x to char * will enable you to access the individual bytes of the integer, and the ordering of bytes will depend on the endianness of the system.
This is big endian test from a configure script:
#include <inttypes.h>
int main(int argc, char ** argv){
volatile uint32_t i=0x01234567;
// return 0 for big endian, 1 for little endian.
return (*((uint8_t*)(&i))) == 0x67;
}
Thought I knew I had read about that in the standard; but can't find it. Keeps looking. Old; answering heading; not Q-tex ;P:
The following program would determine that:
#include <stdio.h>
#include <stdint.h>
int is_big_endian(void)
{
union {
uint32_t i;
char c[4];
} e = { 0x01000000 };
return e.c[0];
}
int main(void)
{
printf("System is %s-endian.\n",
is_big_endian() ? "big" : "little");
return 0;
}
You also have this approach; from Quake II:
byte swaptest[2] = {1,0};
if ( *(short *)swaptest == 1) {
bigendien = false;
And !is_big_endian() is not 100% to be little as it can be mixed/middle.
Believe this can be checked using same approach only change value from 0x01000000 to i.e. 0x01020304 giving:
switch(e.c[0]) {
case 0x01: BIG
case 0x02: MIX
default: LITTLE
But not entirely sure about that one ...
Related
I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.
For example,
#include <stdint.h>
#include <stdio.h>
main()
{
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
printf("%x\n", (unsigned int)b);
return 0;
}
prints aa15 instead of 15aa (which is what I would expect).
What's the reason behind this, and is there an easy fix?
I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.
As mentioned in the comments, this is due to endianness.
Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.
If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.
Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:
uint16_t b = ntohs(*(uint16_t*)a);
There are similar functions called ntohl and htonl that work on 32-bit values.
This is because of the endianess of your machine.
In order to make your code independent of the machine consider the following function:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
So for each case you can choose which operation to apply.
You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.
A correct version of the code could be:
b = ((uint16_t)a[0] << CHAR_BIT) + a[1];
The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.
This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...
a0 a1
|10000000|00000000
Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.
This program prints a little endian and big endian 1 in binary starting from least significant byte...
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
uint16_t le = 1;
uint16_t be = htons(le);
printf("Little Endian 1\n");
print_bin(le, 2);
printf("Big Endian 1 on little endian machine\n");
print_bin(be, 2);
printf("0xaa15 as little endian\n");
print_bin(b, 2);
return 0;
}
This is the output (this is Least significant byte first)
Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101
I would like to know how the string is represented in integer, so I wrote the following program.
#include <stdio.h>
int main(int argc, char *argv[]){
char name[4] = {"#"};
printf("integer name %d\n", *(int*)name);
return 0;
}
The output is:
integer name 64
This is understandable because # is 64 in integer, i.e., 0x40 in hex.
Now I change the program into:
#include <stdio.h>
int main(int argc, char *argv[]){
char name[4] = {"##"};
printf("integer name %d\n", *(int*)name);
return 0;
}
The output is:
integer name 16448
I dont understand this. Since ## is 0x4040 in hex. So it should be 2^12+2^6 = 4160
If I count the '\0' at the end of the string, then it should be 2^16+2^10 = 66560
Could someone explain where 16448 comes from?
Your math is wrong: 0x4040 == 16448. The two fours are the 6th and 14th bits respectively.
Your code actually invokes undefined behavior because you must not alias a char * with an int *. This is known as the strict aliasing rule. To see just one reason why this should be disallowed, consider what would otherwise have to happen if the code is run on a little and a big endian machine.
If you want to see the hex pattern of the string, you should simply loop over its bytes and print out each byte.
void
print_string(const char * strp)
{
printf("0x");
do
printf("%02X", (unsigned char) *strp);
while (*strp++);
printf("\n");
}
Of course, instead of printing the bytes, you can shift them into an integer (that will very soon overflow) and only finally output that integer. While doing this, you'll be forced to take a stand on “your” endianness.
/* Interpreting as big endian. */
unsigned long
int_string(const char * strp)
{
unsigned long value = 0UL;
do
value = (value << 8) | (unsigned char) *strp;
while (*strp++);
return value;
}
This is how 16448 comes :
0x4040 can be written like this in binary :
4 0 4 0 -> Hex
0100 0000 0100 0000 -> Binary
2^14 2^6 = 16448
Because here 6th and 14th bit are set.
Hope you got it :)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was asked an interview question: given a 6 byte input, which got from a big endian machine, please implement a function to convert/typecast it to 8 bytes, assume we do not know the endian of the machine running this function.
The point of the question seems to test my understanding of endianess because I was asked whether I know endianess before this question.
I do not know how to answer the question. e.g. do I need to pad 6 byte to 8 byte first? and how? Here is my code. is it correct?
bool isBigEndian(){
int num = 1;
char* b = (char*)(&num);
return b ? false:true;
}
long long* convert(char* arr[]){ //size is 6
long long* res = (long long*)malloc(long long);//...check res is NULL...
if (isBigEnian()){
for(int i = 0; i< 6; i++)
memset(res, i+2, arr[i]);
}
else {
for(int i = 0; i< 6; i++)
memset(res, i+2, arr[6-1-i]);
}
return res; //assume caller will free res.
}
update: to answer that my question is not clear, I just found a link: Convert Bytes to Int / uint in C with the similar question. based on my understanding of that, endianess of the host does matters. suppose if input is: char array[] = {01,02,03,04,05,06}, then if host is little endian, output is stored as 00,00,06,05,04,03,02,01, if big endian, output will be stored as 00,00,01,02,03,04,05,06, in both case, the 0000 are padded at beginning.
I am a kind of understand now: in the other machine, suppose there is a number xyz = 010203040506 because it is bigendian and 01 is MSB. so it is stored as char array = {01,02,03,04,05,06} where 01 has lowest address. then in this machine, if the machine is also big endian. it should be stored as {00,00,01,02,03,04,05,06 } where 01 is still MSB, so that it is cast to the same number int_64 xyz2 = 0000010203040506. but if the machine is little endian, it should be stored as {00,00,06,05,04,03,02,01 } where 01 is MSB has highest address in order for int_32 xyz2 = 0000010203040506.
please let me know if my undestanding is incorrect. and Can anybody tell me why 0000 is always padded at beginning no matter what endianess? shouldn't it be padded at the end if this machine is little endian since 00 is Most sign byte?
Before moving on, you should have asked for clarification.
What exactly means converting here? Padding each char with 0's? Prefixing each char with 0's?
I will assume that each char should be prefixed with 0's. This is a possible solution:
#include <stdint.h>
#include <limits.h>
#define DATA_WIDTH 6
uint64_t convert(unsigned char data[]) {
uint64_t res;
int i;
res = 0;
for (i = 0; i < DATA_WIDTH; i++) {
res = (res << CHAR_BIT) | data[i];
}
return res;
}
To append 0's to each char, we could, instead, use this inside the for:
res = (res << CHAR_BIT) | (data[i] << 2);
In an interview, you should always note the limitations for your solution. This solution assumes that the implementation provides uint64_t type (it is not required by the C standard).
The fact that the input is big endian is important because it lets you know that data[0] corresponds to the most significant byte, and it must remain so in your result. This solution works not matter what the target machine's endianness.
I don't understand why you think malloc is necessary. Why not just something like this?
long long convert(unsigned char data[]);
{
long long res;
res = 0;
for( int i=0;i < 6; ++i)
res = (res << 8) + data[i];
return res;
}
I've got 2 chars.
Char 128 and Char 2.
How do I turn these chars into the Short 640 in C?
I've tried
unsigned short getShort(unsigned char* array, int offset)
{
short returnVal;
char* a = slice(array, offset, offset+2);
memcpy(&returnVal, a, 2);
free(a);
return returnVal;
}
But that didn't work, it just displays it as 128. What's the preferred method?
Probably the easiest way to turn two chars, a and b, into a short c, is as follows:
short c = (((short)a) << 8) | b;
To fit this into what you have, the easiest way is probably something like this:
unsigned short getShort(unsigned char* array, int offset)
{
return (short)(((short)array[offset]) << 8) | array[offset + 1];
}
I found that the accepted answer was nearly correct, except i'd run into a bug where sometimes the top byte of the result would be 0xff...
I realized this was because of C sign extension. if the second char is >= 0x80, then converting 0x80 to a short becomes 0xff80. Performing an 'or' of 0xff80 with anything results in the top byte remaining 0xff.
The following solution avoids the issue by zeroing out the top byte of b during its implicit conversion to a short.
short c = (((short)a) << 8) | (0x00ff & b);
I see that there is already an answer, but I'm a bit puzzled about what was going on with your original attempt. The following code shows your way and a technique using a union. Both seem to work just fine. I suppose you might have been running into an endianness problem. Anyway, perhaps this demonstration will be useful even if your problem is already solved.
#include <stdio.h>
#include <string.h>
int main()
{
short returnVal;
char a[2];
union {
char ch[2];
short n;
} char2short;
a[0] = 128;
a[1] = 2;
memcpy(&returnVal, a, 2);
printf("short = %d\n", returnVal);
char2short.ch[0] = 128;
char2short.ch[1] = 2;
printf("short (union) = %d\n", char2short.n);
return 0;
}
Outputs:
short = 640
short (union) = 640
I see that you are not actually trying to shift bits but assemble the equivelant of hex values together, like you would color values in CSS.
Give this code a shot:
char b1=128,b2=2;
char data[16];
sprintf((char *)data,"%x%x",(BYTE)b2,(BYTE)b1);
short result=strtol(data,(char **)NULL, 16);
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Little vs Big Endianess: How to interpret the test
Is there an easy method to test code with gcc or any online compiler like ideone for big endian? I don't want to use qemu or virtual machines
EDIT
Can someone explain the behavior of this piece of code on a system using big endian?
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main (void)
{
int32_t i;
unsigned char u[4] = {'a', 'b', 'c', 'd'};
memcpy(&i, u, sizeof(u));
printf("%d\n", i);
memcpy(u, &i, sizeof(i));
for (i = 0; i < 4; i++) {
printf("%c", u[i]);
}
printf("\n");
return 0;
}
As a program?
#include <stdio.h>
#include <stdint.h>
int main(int argc, char** argv) {
union {
uint32_t word;
uint8_t bytes[4];
} test_struct;
test_struct.word = 0x1;
if (test_struct.bytes[0] != 0)
printf("little-endian\n");
else
printf("big-endian\n");
return 0;
}
On a little-endian architecture, the least significant byte is stored first. On a big-endian architecture, the most-significant byte is stored first. So by overlaying a uint32_t with a uint8_t[4], I can check to see which byte comes first. See: http://en.wikipedia.org/wiki/Big_endian
GCC in particular defines the __BYTE_ORDER__ macro as an extension. You can test against __ORDER_BIG_ENDIAN__, __ORDER_LITTLE_ENDIAN__, and __ORDER_PDP_ENDIAN__ (which I didn't know existed!) -- see http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
See also http://en.wikipedia.org/wiki/Big_endian
As for running code in an endianness that doesn't match your machine's native endianness, then you're going to have to compile and run it on an architecture that has that different endianness. So you are going to need to cross-compile, and run on an emulator or virtual machine.
edit: ah, I didn't see the first printf().
The first printf will print "1633837924", since a big-endian machine will interpret the 'a' character as the most significant byte in the int.
The second printf will just print "abcd", since the value of u has been copied byte-by-byte back and forth from i.