Program is Big-Endian even though system is Little-Endian - c

Somehow, my program is treating variables as big endian, even though my system is little endian.
When I execute "lscpu | grep Endian", it returns
Byte Order: Little Endian
But when I run a debug gcc (x86_64 linux) executable with following code:
int x = 1235213421;
printf("%x", x);
It returns 0x499FDC6D, while for little endian it should return 0x6DDC9F49

You told printf to print an integer so it does so according to the endianess, making the code portable. If you wish to view how the data is stored in memory, you have to do it byte by byte, like this:
int x = 1235213421;
unsigned char* ptr = (unsigned char*)&x;
for(size_t i=0; i<sizeof(int); i++)
{
printf("%.2x", ptr[i]);
}

Related

Why do casting a hex int to a char* prints it backwards?

I thought I understood how memory works until I run this code, is memory backwards ? or I'm missing something ?
Code:
#include <stdio.h>
int main()
{
int a = 0x12345678;
char *c = (char *)&a;
for (int i = 0; i < 4; i++)
{
printf("c[%d]=%x \n", i, *(c + i));
}
return 0;
}
Output:
c[0]=78
c[1]=56
c[2]=34
c[3]=12
What you have just done is demonstrate which "endian" your computer's architecture is using (i.e., your computer uses "little endian", not "big endian").
If your computer's architecture had instead been "big endian", then your output would instead have been this:
c[0] = 12
c[1] = 34
c[2] = 56
c[3] = 78
You may want to read this for more information: https://en.wikipedia.org/wiki/Endianness
It is because of the endianness of your machine. Check this article
A big-endian system stores the most significant byte of a word at the
smallest memory address and the least significant byte at the largest.
A little-endian system, in contrast, stores the least-significant byte
at the smallest address.
Your machine is little-endian.

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]
General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.
Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

memcpy long long int (casting to char*) into char array

I was trying to split a long long into 8 character. Which the first 8 bits represent the first character while the next represent the second...etc.
However, I was using two methods. First , I shift and cast the type and it went well.
But I've failed when using memcpy. The result would be reversed...(which the first 8 bits become the last character). Shouldn't the memory be consecutive and in the same order? Or was I messing something up...
void num_to_str(){
char str[100005] = {0};
unsigned long long int ans = 0;
scanf("%llu" , &ans);
for(int j = 0; j < 8; j++){
str[8 * i + j] = (unsigned char)(ans >> (56 - 8 * j));
}
printf("%s\n", str);
return;
}
This work great:
input : 8102661169684245760
output : program
However, the following doesn't act as I expected.
void num_to_str(){
char str[100005] = {0};
unsigned long long int ans = 0;
scanf("%llu" , &ans);
memcpy(str , (char *)&ans , 8);
for(int i = 0; i < 8; i++)
printf("%c", str[i]);
return;
}
This work unexpectedly:
input : 8102661169684245760
output : margorp
PS:I couldn't even use printf("%s" , str) or puts(str)
I assume that the first character was stored as '\0'
I am a beginner, so I'll be grateful if someone can help me out
The order of bytes within a binary representation of a number within an encoding scheme is called endianness.
In a big-endian system bytes are ordered from the most significant byte to the least significant.
In a little-endian system bytes are ordered from the least significant byte to the most significant one.
There are other endianness, but they are considered esoteric nowadays so you won't find them in practice.
If you run your program on a little endian system, (e.g. x86) you get your results.
You can read more:
https://en.wikipedia.org/wiki/Endianness
You may think why would anyone sane design and use a little endian system where bytes are reversed from how we humans are used (we use big endian for ordering digits when we write). But there are advantages. You can read some here: The reason behind endianness?

Casting uint8_t array into uint16_t value in C

I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.
For example,
#include <stdint.h>
#include <stdio.h>
main()
{
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
printf("%x\n", (unsigned int)b);
return 0;
}
prints aa15 instead of 15aa (which is what I would expect).
What's the reason behind this, and is there an easy fix?
I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.
As mentioned in the comments, this is due to endianness.
Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.
If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.
Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:
uint16_t b = ntohs(*(uint16_t*)a);
There are similar functions called ntohl and htonl that work on 32-bit values.
This is because of the endianess of your machine.
In order to make your code independent of the machine consider the following function:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
So for each case you can choose which operation to apply.
You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.
A correct version of the code could be:
b = ((uint16_t)a[0] << CHAR_BIT) + a[1];
The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.
This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...
a0 a1
|10000000|00000000
Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.
This program prints a little endian and big endian 1 in binary starting from least significant byte...
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
uint16_t le = 1;
uint16_t be = htons(le);
printf("Little Endian 1\n");
print_bin(le, 2);
printf("Big Endian 1 on little endian machine\n");
print_bin(be, 2);
printf("0xaa15 as little endian\n");
print_bin(b, 2);
return 0;
}
This is the output (this is Least significant byte first)
Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101

why does a integer type need to be little-endian?

I am curious about little-endian
and I know that computers almost have little-endian method.
So, I praticed through a program and the source is below.
int main(){
int flag = 31337;
char c[10] = "abcde";
int flag2 = 31337;
return 0;
}
when I saw the stack via gdb,
I noticed that there were 0x00007a69 0x00007a69 .... ... ... .. .... ...
0x62610000 0x00656463 .. ...
So, I have two questions.
For one thing,
how can the value of char c[10] be under the flag?
I expected there were the value of flag2 in the top of stack and the value of char c[10] under the flag2 and the value of flag under the char c[10].
like this
7a69
"abcde"
7a69
Second,
I expected the value were stored in the way of little-endian.
As a result, the value of "abcde" was stored '6564636261'
However, the value of 31337 wasn't stored via little-endian.
It was just '7a69'.
I thought it should be '697a'
why doesn't integer type conform little-endian?
There is some confusion in your understanding of endianness, stack and compilers.
First, the locations of variables in the stack may not have anything to do with the code written. The compiler is free to move them around how it wants, unless it is a part of a struct, for example. Usually they try to make as efficient use of memory as possible, so this is needed. For example having char, int, char, int would require 16 bytes (on a 32bit machine), whereas int, int, char, char would require only 12 bytes.
Second, there is no "endianness" in char arrays. They are just that: arrays of values. If you put "abcde" there, the values have to be in that order. If you would use for example UTF16 then endianness would come into play, since then one part of the codeword (not necessarily one character) would require two bytes (on a normal 8-bit machine). These would be stored depending on endianness.
Decimal value 31337 is 0x007a69 in 32bit hexadecimal. If you ask a debugger to show it, it will show it as such whatever the endianness. The only way to see how it is in memory is to dump it as bytes. Then it would be 0x69 0x7a 0x00 0x00 in little endian.
Also, even though little endian is very popular, it's mainly because x86 hardware is popular. Many processors have used big endian (SPARC, PowerPC, MIPS amongst others) order and some (like older ARM processors) could run in either one, depending on the requirements.
There is also a term "network byte order", which actually is big endian. This relates to times before little endian machines became most popular.
Integer byte order is an arbitrary processor design decision. Why for example do you appear to be uncomfortable with little-endian? What makes big-endian a better choice?
Well probably because you are a human used to reading numbers from left-to-right; but the machine hardly cares.
There is in fact a reasonable argument that it is intuitive for the least-significant-byte to be placed in the lowest order address; but again, only from a human intuition point-of-view.
GDB shows you 0x62610000 0x00656463 because it is interpreting data (...abcde...) as 32bit words on a little endian system.
It could be either way, but the reasonable default is to use native endianness.
Data in memory is just a sequence of bytes. If you tell it to show it as a sequence (array) of short ints, it changes what it displays. Many debuggers have advanced memory view features to show memory content in various interpretations, including string, int (hex), int (decimal), float, and many more.
You got a few excellent answers already.
Here is a little code to help you understand how variables are laid out in memory, either using little-endian or big-endian:
#include <stdio.h>
void show_var(char* varname, unsigned char *ptr, size_t size) {
int i;
printf ("%s:\n", varname);
for (i=0; i<size; i++) {
printf("pos %d = %2.2x\n", i, *ptr++);
}
printf("--------\n");
}
int main() {
int flag = 31337;
char c[10] = "abcde";
show_var("flag", (unsigned char*)&flag, sizeof(flag));
show_var("c", (unsigned char*)c, sizeof(c));
}
On my Intel i5 Linux machine it produces:
flag:
pos 0 = 69
pos 1 = 7a
pos 2 = 00
pos 3 = 00
--------
c:
pos 0 = 61
pos 1 = 62
pos 2 = 63
pos 3 = 64
pos 4 = 65
pos 5 = 00
pos 6 = 00
pos 7 = 00
pos 8 = 00
pos 9 = 00
--------

Resources