Casting and byte order

Casting and byte order - c

Number 4 represented as a 32-bit unsigned integer would be
on a big endian machine:
00000000 00000000 00000000 00000100 (most significant byte first)
on a small endian machine:
00000100 00000000 00000000 00000000 (most significant byte last)
As a 8-bit unsigned integer it is represented as
00000100 on both machines.
Now when casting 8-bit uint to a 32-bit I always thought that on a big endian machine that means sticking 24 zeros in front of the existing byte, and appending 24 zeros to the end if the machine is little endian. However, someone pointed out that in both cases zeros are prepended rather than appended. But wouldn't it mean that on a little endian 00000100 will become the most significant byte, which will result in a very large number? Please explain where I am wrong.

Zeroes are prepended if you consider the mathematical value (which just happens to also be the big-endian representation).
Casts in C always strive to preserve the value, not representation. That's how, for example, (int)1.25 results(*note below) in 1, as opposed to something which makes much less sense.
As discussed in the comments, the same holds for bit-shifts (and other bitwise operations, for that matter). 50 >> 1 == 25, regardless of endianness.
(* note: usually, depends rounding mode for float->integer conversion)
In short: Operators in C operate on the mathematical value, regardless of representation. One exception is when you cast a pointer to the value (as in (char*)&foo), since then it is essentially a different "view" to the same data.

Not sure if it answers your question, but will give it a try:
If you take a char variable and cast it to an int variable, then you get the exact same result on both architectures:
char c = 0x12;
int i = (int)c; // i == 0x12 on both architectures
If you take an int variable and cast it to a char variable, then you get the exact same result (possibly truncated) on both architectures:
int i = 0x12345678;
char c = (char)i; // c == 0x78 on both architectures
But if you take an int variable and read it using a char* pointer, then you get a different result on each architecture:
int i = 0x12345678;
char c = *(char*)&i; // c == 0x12 on BE architecture and 0x78 on LE architecture
The example above assumes that sizeof(int) == 4 (may be different on some compilers).

Loosely speaking, "Endianness" is the property of how processor sees the data stored in memory. This means that all the processors, when a particular data is brought to the CPU, sees it the same way.
For example:
int a = 0x01020304;
Irrespective of whether a little or big endian machine, would always have 04 as the least significant and 01 as the most significant byte, when stored in it's register.
The problem arises when this variable/data has to be stored in memory, which is "byte addressable". Should 01 (Most Significant Byte) go into the lowest memory address (Big Endian) or the highest memory address (Little Endian).
In your particular example, what you have shown is the representation, the way processor sees it - with LS/MS Byte.
So technically speaking, both little and big endian machines would have:
00000000 00000000 00000000 00000100
in its 32 bit wide register. Assuming of course what you have in memory is 32 bit wide integer representing 4. How this 4 is stored in/retrieved from memory is what endianness is all about.

Related

How is integer literal mapped in memory?

Take a look at the following example:
int a = 130;
char *ptr;
ptr = (char *) &a;
printf("%d", *ptr);
I expected to get a value 0 printed on the screen but to my surprise it's -126. I came to the conclusion that since char is 8 bits the int might be rounding.
Until now I used to think that memory is filled in a way that msb is on the left. But now everything seems to be mixed. How exactly is memory allocated?

in your case a (might be) 4 bytes little endian value. and 130 is 10000010 in binary.
int a = 130; // 10000010 00000000 00000000 00000000 see little endianness here
and you're pointing to the first byte with char*
char* ptr = (char*)&a; // 10000010
and trying to print it with %d format which will print the signed integer value of 10000010 which is -126 (see: Two's complement)

Your output is a hint that your system is little endian (Least Significant Byte has lowest memory address).
In hexadecimal (exactly 2 digit per byte) 130 writes 0x82. Assuming 4 bytes for an int, in a little endian system, the integer will be stored as 0x82, 0, 0, 0. So *ptr will be (char) 0x82.
But you use printf to display its value. As all parameters passed the first have no required type, the char value will be promoted to an int. Here assuming a 2 complement representation (the currently most common one) you will get either 130 if char was unsigned, or -126 if is is signed.
TL/DR: the output is normal on a little endian system with a 2-complement integer representation, and where the char type is signed.

Why this program output 64?

I found this program during an online test on c programming I tried it on my level but I cannot figure it out that why the output of this program comes out to be 64.
Can anyone explain the concept behind this?
#include <iostream>
#include <stdio.h>
using namespace std;
int main()
{
int a = 320;
char *ptr;
ptr = (char *)&a;
printf("%d",*ptr);
return 0;
}
output:
64
Thankyou.

A char * points to one byte only. Assuming a byte on your system is 8 bits, the number 320 occupies 2 bytes. The lower byte of those is 64, the upper byte is 1, because 320 = 256 * 1 + 64. That is why you get 64 on your computer (a little-endian computer).
But note that on other platforms, so called big-endian platforms, the result could just as well be 1 (the most significant byte of a 16 bit/2 byte value) or 0 (the most significant byte of a value larger than 16 bit/2 bytes).
Note that all this assumes that the platform has 8-bit bytes. If it had, say 10-bit bytes, you would get a different result again. Fortunately, most computers have 8-bit bytes nowadays.

You won't be able to understand this unless you know about:
hex/binary represenation, and
CPU endianess.
Type out the decimal number 320 in hex. Split it up in bytes. Assuming int is 4 bytes, you should be able to tell which parts of the number that goes in which bytes.
After that, consider the endianess of the given CPU and sort the bytes in that order. (MS byte first or LS byte first.)
The code accesses the byte allocated at the lowest address of the integer. What it contains depends on the CPU endianess. You'll either get hex 0x40 or hex 0x00.
Note: You shouldn't use char for these kind of things, because it has implementation-defined signedness. In case the data bytes contains values larger than 0x7F, you might get some very weird bugs, that inconsistently appear/disappear across multiple compilers. Always use uint8_t* when doing any form of bit/byte manipulation.
You can expose this bug by replacing 320 with 384. Your little endian system may then either print -128 or 128, you'll get different results on different compilers.

What #Lundin said is enough.
BTW, maybe some basic knowledge is helpful. 320 = 0x0140. a int = 4 char. So when print the first byte, it output 0x40 = 64 because of cpu endianess.

ptr is char pointer of a. Thus *ptr will give char value of a. char occupies only 1 byte thus it repeats its values after 255. That is 256 becomes 0, 257 becomes 1 and so on. Thus 320 becomes 64.

Int is four byte data byte while char is one byte data byte, char pointer can keep the address one byte at time. Binary value of 320 is 00000000 00000000 00000001 01000000. So, char pointer ptr is pointing to only first byte.
*ptr i.e. content of first byte is 01000000 and its decimal value is 64.

Binary notation and Endianness

Can we say that our 'traditional' way of writing in binary
is Big Endian?
e.g., number 1 in binary:
0b00000001 // Let's assume its possible to write numbers like that in code and b means binary
Also when I write a constant 0b00000001 in my code, this will always refer to integer 1 regardless if machine is big endian or little endian right?
In this notation the LSB is always written as the last element from the right, and MSB is always written as the left most element right?

Yes, humans generally write numerals in big-endian order (meaning that the digits written first have the most significant value), and common programming languages that accept numerals interpret them in the same way.
Thus, the numeral “00000001” means one; it never means one hundred million (in decimal) or 128 (in binary) or the corresponding values in other bases.
Much of C semantics is written in terms of the value of a number. Once a numeral is converted to a value, the C standard describes how that value is added, multiplied, and even represented as bits (with some latitude regarding signed values). Generally, the standard does not specify how those bits are stored in memory, which is where endianness in machine representations comes into play. When the bits representing a value are grouped into bytes and those bytes are stored in memory, we may see those bytes written in different orders on different machines.
However, the C standard specifies a common way of interpreting numerals in source code, and that interpretation is always big-endian in the sense that the most significant digits appear first.

If you want to put it that way, then yes, we humans write numerals in Big-Endian order.
But I think you have a misunderstanding in terms of your target runnign with big or little endian.
In your actual C-Code, it does not matter which endianess your target machine uses. For example these lines will always display the same, no matter the endianess of your system:
uint32 x = 0x0102;
printf("Output: %x\n",x); // Output: 102
or to take your example:
uint32 y = 0b0001;
printf("Output: %d\n",y); // Output: 1
However the storage of the data in your memory differs between Little and Big Endian.
Big Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x01 0x02 0x03 0x04
Little Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x04 0x03 0x02 0x01
Both times the actualy value is 0x01020304 (and this is what you assign in your C-Code).
You only have to worry about it, if you do memory operations. If you have a 4-Byte (uint8) array, which represents a 32-Bit integer and you want to copy it into a uint32 variable you need to care.
uint8 arr[4] = {0x01, 0x02, 0x03, 0x04};
uint32 var;
memcpy(&var,arr,4);
printf("Output: %x\n",var);
// Big Endian: Output: 0x01020304
// Little Endian: Output: 0x04030201

On a little-endian machine, how will bit operators work?

I have the following code that takes pixel values from a file. I am on a Intel macbook running OS X. I believe that is little-endian. I have the following code which I am using to determine if the least significant bit is set on the pixels. It compiles and runs, but I am not sure if by operations are really giving me the correct data.
typedef struct {
unsigned char blue;
unsigned char green;
unsigned char red;
} pixel_t;
pixel_t *pixels = malloc(((bmp->dib.bmp_bytesz/3)+1) * sizeof(*pixels));
printf("%u", (pixels[i].red & 0x01));
printf("%u", (pixels[i].green & 0x01));
printf("%u", (pixels[i].blue & 0x01));

Little-endian and big-endian refers to the order of bytes (not bits, per se) in larger units (like short or int).
The bitwise operations are the same; the operations are giving you the least significant bit of the numbers in pixels[i].blue etc. If they are stored in char (or unsigned char or signed char), then there is no issue. If they are stored in int or short or something, then the byte that is being addressed will be different depending on whether the machine is big-endian or little-endian, but it is still the least significant bit of the number on the platform.

Endianess is an internal detail affecting how values are stored. It has no effect on how values are computed.

Jonathan has the right answer already...just adding an example.
Endianness describes how multi-byte data is stored in computer memory. It describes the location of the most significant byte (MSB) and least significant byte (LSB) of an address in memory.
Big Endian: Stores MSB first i.e. left to right
Little Endian: Stores LSB first i.e. right to left.
Example: How is 0x04030201 stored in memory?
Address BE LE
00000000 04 01
00000001 03 02
00000002 02 03
00000003 01 04

Little endian or Big endian

#include <stdio.h>
union Endian
{
int i;
char c[sizeof(int)];
};
int main(int argc, char *argv[])
{
union Endian e;
e.i = 1;
printf("%d \n",&e.i);
printf("%d,%d,\n",e.c[0],&(e.c[0]));
printf("%d,%d",e.c[sizeof(int)-1],&(e.c[sizeof(int)-1]));
}
OUTPUT:
1567599464
1,1567599464,
0,1567599467
LSB is stored in the lower address and MSB is stored in the higher address. Isn't this supposed to be big endian? But my system config shows it as a little endian architecture.

You system is definitely little-endian. Had it been big-endian, the following code:
printf("%d,%d,\n",e.c[0],&(e.c[0]));
would print 0 for the first %d instead of 1. In little-endian 1 is stored as
00000001 00000000 00000000 00000000
^ LSB
^Lower Address
but in big-endian it is stored as
00000000 00000000 00000000 00000001
^LSB
^Higher Address
And don't use the %d to print addresses of variables, use %p.

For little endian, the least significant bits are stored in the first byte (with the lowest address).
That's what you're seeing, so it seems there is sanity ;)

00000001 (Hexadecimal: 32 bits)
^^ ^^
MS LS
Byte Byte
Least Significant Byte at lowest address => little-endian. The integer is placed into memory, starting from its little-end. Hence the name.
Endianness

You have the byte containing "1" (least significant) as first element (e.c[0]) and the byte containing "0" being the second one (e.c[1]). This is litte endian, isn't it?

You are wrong about what is Big endian and what is little endian. Read this

Looks good to me. "little endian" (aka "the right way" :-) means "lower-order bytes stored first", and that's exactly what your code shows. (BTW, you should use "%p" to print the addresses).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Casting and byte order - c

Related

How is integer literal mapped in memory?

Why this program output 64?

Binary notation and Endianness

On a little-endian machine, how will bit operators work?

Little endian or Big endian

Categories

Resources