Does my AMD-based machine use little endian or big endian?

Does my AMD-based machine use little endian or big endian? - c

I'm going though a computers system course and I'm trying to establish, for sure, if my AMD based computer is a little-endian machine? I believe it is because it would be Intel-compatible.
Specifically, my processor is an AMD 64 Athlon x2.
I understand that this can matter in C programming. I'm writing C programs and a method I'm using would be affected by this. I'm trying to figure out if I'd get the same results if I ran the program on an Intel based machine (assuming that is little endian machine).
Finally, let me ask this: Would any and all machines capable of running Windows (XP, Vista, 2000, Server 2003, etc) and, say, Ubuntu Linux desktop be little endian?

All x86 and x86-64 machines (which is just an extension to x86) are little-endian.
You can confirm it with something like this:
#include <stdio.h>
int main() {
int a = 0x12345678;
unsigned char *c = (unsigned char*)(&a);
if (*c == 0x78) {
printf("little-endian\n");
} else {
printf("big-endian\n");
}
return 0;
}

An easy way to know the endiannes is listed in the article Writing endian-independent code in C
const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )

Assuming you have Python installed, you can run this one-liner, which will print "little" on little-endian machines and "big" on big-endian ones:
python -c "import struct; print 'little' if ord(struct.pack('L', 1)[0]) else 'big'"

"Intel-compatible" isn't very precise.
Intel used to make big-endian processors, notably the StrongARM and XScale. These do not use the IA32 ISA, commonly known as x86.
Further back in history, Intel also made the little-endian i860 and i960, which are also not x86-compatible.
Further back in history, the prececessors of the x86 (8080, 8008, etc.) are not x86-compatible either. Being 8-bit processors, endianness doesn't really matter...
Nowadays, Intel still makes the Itanium (IA64), which is bi-endian: normal operation is big-endian, but the processor can also run in little-endian mode. It does happen to be able to run x86 code in little-endian mode, but the native ISA is not IA32.
To my knowledge, all of AMD's processors have been x86-compatible, with some extensions like x86_64, and thus are necessarily little-endian.
Ubuntu is available for x86 (little-endian) and x86_64 (little-endian), with less complete ports for ia64 (big-endian), ARM(el) (little-endian), PA-RISC (big-endian, though the processor supports both), PowerPC (big-endian), and SPARC (big-endian). I don't believe there is an ARM(eb) (big-endian) port.

In answer to your final question, the answer is no. Linux is capable of running on big endian machines like e.g., the older generation PowerMacs.

The below snippet of code works:
#include <stdio.h>
int is_little_endian() {
short x = 0x0100; //256
char *p = (char*) &x;
if (p[0] == 0) {
return 1;
}
return 0;
}
int main() {
if (is_little_endian()) {
printf("Little endian machine\n");
} else printf("Big endian machine\n");
return 0;
}
The "short" integer in the code is 0x0100 (256 in decimal) and is 2 bytes long. The least significant byte is 00, and the most significant is 01. Little endian ordering puts the least significant byte in the address of the variable. So it just checks whether the value of the byte at the address pointed by the variable's pointer is 0 or not.
If it is 0, it is little endian byte ordering, otherwise it's big endian.

You have to download a version of Ubuntu designed for big endian machines. I know only of the PowerPC versions. I'm sure you can find some place which has a more generic big-endian implementation.

/* by Linas Samusas */
#ifndef _bitorder
#define _bitorder 0x0008
#if (_bitorder > 8)
#define BE
#else
#define LE
#endif
and use this
#ifdef LE
#define Function_Convert_to_be_16(value) real_function_to_be_16(value)
#define Function_Convert_to_be_32(value) real_function_to_be_32(value)
#define Function_Convert_to_be_64(value) real_function_to_be_64(value)
#else
#define Function_Convert_to_be_16
#define Function_Convert_to_be_32
#define Function_Convert_to_be_64
#endif
if LE
unsigned long number1 = Function_Convert_to_be_16(number2);
*macro will call real function and it will convert to BE
if BE
unsigned long number1 = Function_Convert_to_be_16(number2);
*macro will be defined as word not a function and your number will be between brackets

We now have std::endian!
constexpr bool is_little = std::endian::native == std::endian::little;
https://en.cppreference.com/w/cpp/types/endian

Related

What's the "to little endian" equivalent of htonl? [duplicate]

I need to convert a short value from the host byte order to little endian. If the target was big endian, I could use the htons() function, but alas - it's not.
I guess I could do:
swap(htons(val))
But this could potentially cause the bytes to be swapped twice, rendering the result correct but giving me a performance penalty which is not alright in my case.

Here is an article about endianness and how to determine it from IBM:
Writing endian-independent code in C: Don't let endianness "byte" you
It includes an example of how to determine endianness at run time ( which you would only need to do once )
const int i = 1;
#define is_bigendian() ( (*(char*)&i) == 0 )
int main(void) {
int val;
char *ptr;
ptr = (char*) &val;
val = 0x12345678;
if (is_bigendian()) {
printf(“%X.%X.%X.%X\n", u.c[0], u.c[1], u.c[2], u.c[3]);
} else {
printf(“%X.%X.%X.%X\n", u.c[3], u.c[2], u.c[1], u.c[0]);
}
exit(0);
}
The page also has a section on methods for reversing byte order:
short reverseShort (short s) {
unsigned char c1, c2;
if (is_bigendian()) {
return s;
} else {
c1 = s & 255;
c2 = (s >> 8) & 255;
return (c1 << 8) + c2;
}
}
;
short reverseShort (char *c) {
short s;
char *p = (char *)&s;
if (is_bigendian()) {
p[0] = c[0];
p[1] = c[1];
} else {
p[0] = c[1];
p[1] = c[0];
}
return s;
}

Then you should know your endianness and call htons() conditionally. Actually, not even htons, but just swap bytes conditionally. Compile-time, of course.

Something like the following:
unsigned short swaps( unsigned short val)
{
return ((val & 0xff) << 8) | ((val & 0xff00) >> 8);
}
/* host to little endian */
#define PLATFORM_IS_BIG_ENDIAN 1
#if PLATFORM_IS_LITTLE_ENDIAN
unsigned short htoles( unsigned short val)
{
/* no-op on a little endian platform */
return val;
}
#elif PLATFORM_IS_BIG_ENDIAN
unsigned short htoles( unsigned short val)
{
/* need to swap bytes on a big endian platform */
return swaps( val);
}
#else
unsigned short htoles( unsigned short val)
{
/* the platform hasn't been properly configured for the */
/* preprocessor to know if it's little or big endian */
/* use potentially less-performant, but always works option */
return swaps( htons(val));
}
#endif
If you have a system that's properly configured (such that the preprocessor knows whether the target id little or big endian) you get an 'optimized' version of htoles(). Otherwise you get the potentially non-optimized version that depends on htons(). In any case, you get something that works.
Nothing too tricky and more or less portable.
Of course, you can further improve the optimization possibilities by implementing this with inline or as macros as you see fit.
You might want to look at something like the "Portable Open Source Harness (POSH)" for an actual implementation that defines the endianness for various compilers. Note, getting to the library requires going though a pseudo-authentication page (though you don't need to register to give any personal details): http://hookatooka.com/poshlib/

This trick should would: at startup, use ntohs with a dummy value and then compare the resulting value to the original value. If both values are the same, then the machine uses big endian, otherwise it is little endian.
Then, use a ToLittleEndian method that either does nothing or invokes ntohs, depending on the result of the initial test.
(Edited with the information provided in comments)

My rule-of-thumb performance guess is that depends whether you are little-endian-ising a big block of data in one go, or just one value:
If just one value, then the function call overhead is probably going to swamp the overhead of unnecessary byte-swaps, and that's even if the compiler doesn't optimise away the unnecessary byte swaps. Then you're maybe going to write the value as the port number of a socket connection, and try to open or bind a socket, which takes an age compared with any sort of bit-manipulation. So just don't worry about it.
If a large block, then you might worry the compiler won't handle it. So do something like this:
if (!is_little_endian()) {
for (int i = 0; i < size; ++i) {
vals[i] = swap_short(vals[i]);
}
}
Or look into SIMD instructions on your architecture which can do it considerably faster.
Write is_little_endian() using whatever trick you like. I think the one Robert S. Barnes provides is sound, but since you usually know for a given target whether it's going to be big- or little-endian, maybe you should have a platform-specific header file, that defines it to be a macro evaluating either to 1 or 0.
As always, if you really care about performance, then look at the generated assembly to see whether pointless code has been removed or not, and time the various alternatives against each other to see what actually goes fastest.

Unfortunately, there's not really a cross-platform way to determine a system's byte order at compile-time with standard C. I suggest adding a #define to your config.h (or whatever else you or your build system uses for build configuration).
A unit test to check for the correct definition of LITTLE_ENDIAN or BIG_ENDIAN could look like this:
#include <assert.h>
#include <limits.h>
#include <stdint.h>
void check_bits_per_byte(void)
{ assert(CHAR_BIT == 8); }
void check_sizeof_uint32(void)
{ assert(sizeof (uint32_t) == 4); }
void check_byte_order(void)
{
static const union { unsigned char bytes[4]; uint32_t value; } byte_order =
{ { 1, 2, 3, 4 } };
static const uint32_t little_endian = 0x04030201ul;
static const uint32_t big_endian = 0x01020304ul;
#ifdef LITTLE_ENDIAN
assert(byte_order.value == little_endian);
#endif
#ifdef BIG_ENDIAN
assert(byte_order.value == big_endian);
#endif
#if !defined LITTLE_ENDIAN && !defined BIG_ENDIAN
assert(!"byte order unknown or unsupported");
#endif
}
int main(void)
{
check_bits_per_byte();
check_sizeof_uint32();
check_byte_order();
}

On many Linux systems, there is a <endian.h> or <sys/endian.h> with conversion functions. man page for ENDIAN(3)

CPU write value passed from application to qemu is strange

I was trying to run RTEMS(a real-time OS) application on a sparc virtual machine using QEMU.
I'm almost there and I've seen it working hours ago. But after removing some prints it is not working and later I found it's not because of the removed prints. The data is not being passed correctly between the RTEMS image and the QEMU emulation model.(I'm working with QEMU version 1.5.50 and lan9118.c model borrowed from QEMU version 2.0.0. I modifed lan9118 a little.)
In the QEMU model, the memory region ops are defined as
struct MemoryRegionOps {
/* Read from the memory region. #addr is relative to #mr; #size is
* in bytes. */
uint64_t (*read)(void *opaque,
hwaddr addr,
unsigned size);
/* Write to the memory region. #addr is relative to #mr; #size is
* in bytes. */
void (*write)(void *opaque,
hwaddr addr,
uint64_t data,
unsigned size);
...
}
and in the RTEMS application, I write to the device like
*TX_FIFO_PORT = cmdA;
*TX_FIFO_PORT = cmdB;
where TX_FIFO_PORT is defined as below.
#define TX_FIFO_PORT (volatile ulong *)(SMSC9118_BASE + 0x20)
But when I write, for example,
cmdA : 0x2a300200 and cmdB : 0x2a002a00,
The values I expected are
cmdA : 0x0002302a and cmdB : 0x002a002a. (Just endian converted values)
But the values I see at the write function (entrance of QEMU) are
cmdA : 0x02000200 and cmdB : 0x2a002a00 respectively.
The observed values have not been endian converted and even the first value is different(lower 16 bit repeated).
What could be problem?
Any hint will be deeply appreciated.

Strangely I fixed this by commenting out the endian conversion for cmdA and cmdB in the RTEMS before writing to the device.(It was ok with the endian conversion..I don't know) So it's working 'almost'.
Anyway, here is a tip about exchaning CPU write/read data in QEMU processor and deivce.
In QEMU, Each device model provides write and read function, also it specifies how the word should be transferd to/from the device regarding endianness. It is specified like below.
static const MemoryRegionOps lan9118_mem_ops = {
.read = lan9118_readl,
.write = lan9118_writel,
.endianness = DEVICE_NATIVE_ENDIAN,
};
Here is the copy from email I received from Peter Maydell from qemu-discuss#nongnu.org mailing list.
------------------------
This depends on what the MemoryRegionOps struct for the memory region sets its .endianness field to.
DEVICE_NATIVE_ENDIAN means the device sees values the same way round as the guest CPU's native endianness[*], so if the guest does a 32 bit write of 0x12345678 then it appears in the write function's argument as 0x12345678. DEVICE_BIG_ENDIAN means that if the CPU is little endian then the word will be byteswapped.
DEVICE_LITTLE_ENDIAN means that if the CPU is big endian then the word will be byteswapped. The latter are useful for devices or buses which have a specific endianness which is not the same as that of the CPU (eg PCI is always little endian).

fread of a struct diffrent under solaris and linux

I'm reading in the first Bytes of an File with fread:
fread(&example_struct, sizeof(example_struct), 1, fp_input);
Which ends up with different results under linux and solaris? Whereby the example_struct (Elf32_Ehdr) is part of Standart GNU C Liborary defined in elf.h? I would be happy to know why this happens?
General the struct looks the following:
typedef struct
{
unsigned char e_ident[LENGTH];
TYPE_Half e_type;
} example_struct;
The Debugcode:
for(i=0;paul<sizeof(example_struct);i++){
printf("example_struct->e_ident[%i]:(%x) \n",i,example_struct.e_ident[i]);
}
printf("example_struct->e_type: (%x) \n",example_struct.e_type);
printf("example_struct->e_machine: (%x) \n",example_struct.e_machine);
Solaris output:
Elf32_Ehead->e_ident[0]: (7f)
Elf32_Ehead->e_ident[1]: (45)
...
Elf32_Ehead->e_ident[16]: (2)
Elf32_Ehead->e_ident[17]: (0)
...
Elf32_Ehead->e_type: (200)
Elf32_Ehead->e_machine: (6900)
Linux output:
Elf32_Ehead->e_ident[0]: (7f)
Elf32_Ehead->e_ident[1]: (45)
...
Elf32_Ehead->e_ident[16]: (2)
Elf32_Ehead->e_ident[17]: (0)
...
Elf32_Ehead->e_type: (2)
Elf32_Ehead->e_machine: (69)
Maybe similar to: http://forums.devarticles.com/c-c-help-52/file-io-linux-and-solaris-108308.html

You don't mention what CPU you have in the machines, maybe Sparc64 in the Solaris machine and x86_64 in the Linux box, but I would guess that you're having an endianness issue. Intel, ARM and most other common architectures today are what is known as little-endian, the Sparc architecture is big-endian.
Let's assume we have the value 0x1234 in a CPU register and we want to store it in memory (or on hard drive, it doesn't matter where). Let N be the memory address we want to write to. We will need to store this 16 bit integer as two bytes in memory, here comes the confusing part:
Using a big-endian machine will store 0x12 at address N and 0x34 at address N+1.
A little-endian machine will store 0x34 at address N and 0x12 at address N+1.
If we store a value using a little endian machine and read it back using a big endian machine we will have swapped the two bytes around and you'll get the issue that you are seeing.

Probably because of differences in the structure packing between the two platforms. It's a bad idea to read structures directly (as units) from external media, since issues like these tend to pop up.

htonl printing garbage value

The variable 'value' is uint32_t
value = htonl(value);
printf("after htonl is %ld\n\n",value);
This prints -201261056
value = htons(value);
printf("after htons is %ld\n\n",value);
This prints 62465
Please suggest what could be the reason?

I guess your input is 500, isn't it?
500 is 2**8+2**7+2**6+2**5+2**4+2**2 or 0x00 0x00 0x01 0xF4 in little endian order.
TCP/IP uses big endian. So after the htonl, the sequence is 0xF4 0x01 0x00 0x00.
If you print it as signed integer, since the first digit is 1, it is negative then. Negative numbers are regarded as complement, The value is -(2**27 + 2**25+2**24+2**23+2**22+2**21+2**20+2**19+2**18+2**17+2**16) == -201261056

Host Order is the order which your machine understands the data (assuming your machine is little endian) correctly. Network Order is Big Endian, which cannot be understood by your system properly. This is the reason for your so called garbage values.
So, basically, there is nothing with the code. : )
Google "Endianness" to get all the details about Big Endian and Little Endian.
Providing some more info, In Big endian, first byte or lowest address will have the most significant byte and in little endian, at the same place, the least significant byte will be present. So, when you use htonl, your first byte will now contain the most significant byte, but your system will think it is as the least significant byte.
Considering the wikipedia example of decimal 1000 (hex 3E8) in big endian will be 03 E8 and in little endian will be E8 03. Now, if you pass 03 E8 to a little machine, it will consider to be decimal 59395.

htonl() and htons() are functions which is used to convert the data from host's endianess to networks endiness.
Network uses big-endian. so if your system is X86, then it is little-endian.
Host to Network byte order(long data) is htonl(). i.e converts 32bit value to network byte order.
Host to Network byte order(short data) is htons(). i.e converts 16bit value to network byte order.
sample program to show how htonl() works as well as effect of using 32bit value in htons() function.
#include <stdio.h>
#include <arpa/inet.h>
int main()
{
long data = 0x12345678;
printf("\n After htonl():0x%x , 0x%x\n", htonl(data), htons(data));
return 0;
}
It will print After htonl():0x78563412 , 0x7856 on X86_64.
Reference:
http://en.wikipedia.org/wiki/Endianess
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738557%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738556%28v=vs.85%29.aspx

#halfelf>I jut want to put my findings.I just tried the below program with the same
value 500.I guess you have mistakenely mentioned output of LE as BE and vice versa
Actual output i got is 0xf4 0x01 0x00 0x00 as little Endian format.My Machine is
LE
#include <stdio.h>
#include <netinet/in.h>
/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n)
{
int i;
for (i = 0; i < n; i++)
printf(" %.2x-%p", start[i],start+i);
printf("\n");
}
/*Main function to call above function for 0x01234567*/
int main()
{
int i = 500;//0x01234567;
int y=htonl(i); --->(1)
printf("i--%d , y---%d,ntohl(y):%d\n",i,y,ntohl(ntohl(y)));
printf("_------LITTLE ENDIAN-------\n");
show_mem_rep((char *)&i, sizeof(i));
printf("-----BIG ENDIAN-----\n");/* i just used int y=htonl(i)(-1) for reversing
500 ,so that
i can print as if am using a BE machine. */
show_mem_rep((char *)&y, sizeof(i));
getchar();
return 0;
}
output is
i--500 , y----201261056,ntohl(y):-201261056
_------LITTLE ENDIAN-------
f4-0xbfda8f9c 01-0xbfda8f9d 00-0xbfda8f9e 00-0xbfda8f9f
-----BIG ENDIAN-----
00-0xbfda8f98 00-0xbfda8f99 01-0xbfda8f9a f4-0xbfda8f9b

Calculating CPU frequency in C with RDTSC always returns 0

The following piece of code was given to us from our instructor so we could measure some algorithms performance:
#include <stdio.h>
#include <unistd.h>
static unsigned cyc_hi = 0, cyc_lo = 0;
static void access_counter(unsigned *hi, unsigned *lo) {
asm("rdtsc; movl %%edx,%0; movl %%eax,%1"
: "=r" (*hi), "=r" (*lo)
: /* No input */
: "%edx", "%eax");
}
void start_counter() {
access_counter(&cyc_hi, &cyc_lo);
}
double get_counter() {
unsigned ncyc_hi, ncyc_lo, hi, lo, borrow;
double result;
access_counter(&ncyc_hi, &ncyc_lo);
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
result = (double) hi * (1 << 30) * 4 + lo;
return result;
}
However, I need this code to be portable to machines with different CPU frequencies. For that, I'm trying to calculate the CPU frequency of the machine where the code is being run like this:
int main(void)
{
double c1, c2;
start_counter();
c1 = get_counter();
sleep(1);
c2 = get_counter();
printf("CPU Frequency: %.1f MHz\n", (c2-c1)/1E6);
printf("CPU Frequency: %.1f GHz\n", (c2-c1)/1E9);
return 0;
}
The problem is that the result is always 0 and I can't understand why. I'm running Linux (Arch) as guest on VMware.
On a friend's machine (MacBook) it is working to some extent; I mean, the result is bigger than 0 but it's variable because the CPU frequency is not fixed (we tried to fix it but for some reason we are not able to do it). He has a different machine which is running Linux (Ubuntu) as host and it also reports 0. This rules out the problem being on the virtual machine, which I thought it was the issue at first.
Any ideas why this is happening and how can I fix it?

Okay, since the other answer wasn't helpful, I'll try to explain on more detail. The problem is that a modern CPU can execute instructions out of order. Your code starts out as something like:
rdtsc
push 1
call sleep
rdtsc
Modern CPUs do not necessarily execute instructions in their original order though. Despite your original order, the CPU is (mostly) free to execute that just like:
rdtsc
rdtsc
push 1
call sleep
In this case, it's clear why the difference between the two rdtscs would be (at least very close to) 0. To prevent that, you need to execute an instruction that the CPU will never rearrange to execute out of order. The most common instruction to use for that is CPUID. The other answer I linked should (if memory serves) start roughly from there, about the steps necessary to use CPUID correctly/effectively for this task.
Of course, it's possible that Tim Post was right, and you're also seeing problems because of a virtual machine. Nonetheless, as it stands right now, there's no guarantee that your code will work correctly even on real hardware.
Edit: as to why the code would work: well, first of all, the fact that instructions can be executed out of order doesn't guarantee that they will be. Second, it's possible that (at least some implementations of) sleep contain serializing instructions that prevent rdtsc from being rearranged around it, while others don't (or may contain them, but only execute them under specific (but unspecified) circumstances).
What you're left with is behavior that could change with almost any re-compilation, or even just between one run and the next. It could produce extremely accurate results dozens of times in a row, then fail for some (almost) completely unexplainable reason (e.g., something that happened in some other process entirely).

I can't say for certain what exactly is wrong with your code, but you're doing quite a bit of unnecessary work for such a simple instruction. I recommend you simplify your rdtsc code substantially. You don't need to do 64-bit math carries your self, and you don't need to store the result of that operation as a double. You don't need to use separate outputs in your inline asm, you can tell GCC to use eax and edx.
Here is a greatly simplified version of this code:
#include <stdint.h>
uint64_t rdtsc() {
uint64_t ret;
# if __WORDSIZE == 64
asm ("rdtsc; shl $32, %%rdx; or %%rdx, %%rax;"
: "=A"(ret)
: /* no input */
: "%edx"
);
#else
asm ("rdtsc"
: "=A"(ret)
);
#endif
return ret;
}
Also you should consider printing out the values you're getting out of this so you can see if you're getting out 0s, or something else.

As for VMWare, take a look at the time keeping spec (PDF Link), as well as this thread. TSC instructions are (depending on the guest OS):
Passed directly to the real hardware (PV guest)
Count cycles while the VM is executing on the host processor (Windows / etc)
Note, in #2 the while the VM is executing on the host processor. The same phenomenon would go for Xen, as well, if I recall correctly. In essence, you can expect that the code should work as expected on a paravirtualized guest. If emulated, its entirely unreasonable to expect hardware like consistency.

You forgot to use volatile in your asm statement, so you're telling the compiler that the asm statement produces the same output every time, like a pure function. (volatile is only implicit for asm statements with no outputs.)
This explains why you're getting exactly zero: the compiler optimized end-start to 0 at compile time, through CSE (common-subexpression elimination).
See my answer on Get CPU cycle count? for the __rdtsc() intrinsic, and #Mysticial's answer there has working GNU C inline asm, which I'll quote here:
// prefer using the __rdtsc() intrinsic instead of inline asm at all.
uint64_t rdtsc(){
unsigned int lo,hi;
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return ((uint64_t)hi << 32) | lo;
}
This works correctly and efficiently for 32 and 64-bit code.

hmmm I'm not positive but I suspect the problem may be inside this line:
result = (double) hi * (1 << 30) * 4 + lo;
I'm suspicious if you can safely carry out such huge multiplications in an "unsigned"... isn't that often a 32-bit number? ...just the fact that you couldn't safely multiply by 2^32 and had to append it as an extra "* 4" added to the 2^30 at the end already hints at this possibility... you might need to convert each sub-component hi and lo to a double (instead of a single one at the very end) and do the multiplication using the two doubles

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight