I am a bit confused on how you would approach this problem:
Consider decimal number 1027. This value is stored as a 16-bit two's complement number into addresses 124 and 125 on a little endian machine which has an addressable cell size of one byte. What values (in hexadecimal) are in each of these addresses:
124:
125:
I know that a little endian machine orders it addresses from the least significant byte to the most significant byte. But besides from that I am unsure of how you would apply that concept and how you would order bytes into the addresses.
Here's some simple Python code to convert that integer to little-endian hexadecimal representation:
# convert the integer (1027) to hex using 2 bytes and little-endian byteorder
(1027).to_bytes(length=2, byteorder='little').hex()
This gives 0304. So, the first byte (03) is in address 124 and the second one (04) occupies the next address - 125.
"Little endian" and "big endian" relate to how the machine multiplexes bytes from memory into registers of the CPU.
With each byte it gets, it increments the address counter, but does it place these bytes from left-to-right or right-to-left into the register?
So the address that gets loaded into a machine register (or an integer), can be stored reverse in the memory. Even with modern CPUs with broad data busses, the concept remained and in some CPUs the bytes get swapped inside the CPU.
Related
I am new to stack and try to master it, and this statement of Stack Pointer has been bugging me for hours:
"On the ARM Cortex-M processor, the stack always operates on 32-bit data. All stack accesses are word aligned, which means the least significant two bits of SP(Stack Pointer) must always be 0."
I know Stack is just a part of RAM, which contains 32-bit data, so all the operation such as PUSH/POP would need to be 32-bit operations. There are 2 questions about that statement concern me:
All stack accesses are "word" aligned. Should it be DWORD instead of WORD to make up for the 32 bits? If not, why Word or 16 bits rather than 32 bits for all stack accesses?
Why does it mean the least significant two bits of SP must always be zero?
Any thought?
1) All stack accesses are "word" aligned. Should it be DWORD instead of WORD to make up for the 32 bits? If not, why Word or 16 bits rather than 32 bits for all stack accesses?
The size of a word depends on the CPU architecture. On a 32 bits Cortex-M, a word is made of 32 bits, or 4 bytes.
2) Why does it mean the least significant two bits of SP must always be zero?
This is a different way to say that the stack should always be aligned on a 4 bytes boundary, or that the stack pointer should always contain an address which is a multiple of four bytes.
For example, binary address 0000 is 0 decimal. The three next addresses, 0001 (1 decimal), 0010, (2 decimal) and 0011 (3 decimal) binary addresses have their least significant bits set to 01, 10, and 11. 0100 (4 decimal) is the first address following 0000 with its two least significant bits set to 0: this is the same thing than saying it is a multiple of 4 bytes, or if you prefer, a multiple of the CPU word size in bytes.
Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.
To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.
As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.
Number 4 represented as a 32-bit unsigned integer would be
on a big endian machine:
00000000 00000000 00000000 00000100 (most significant byte first)
on a small endian machine:
00000100 00000000 00000000 00000000 (most significant byte last)
As a 8-bit unsigned integer it is represented as
00000100 on both machines.
Now when casting 8-bit uint to a 32-bit I always thought that on a big endian machine that means sticking 24 zeros in front of the existing byte, and appending 24 zeros to the end if the machine is little endian. However, someone pointed out that in both cases zeros are prepended rather than appended. But wouldn't it mean that on a little endian 00000100 will become the most significant byte, which will result in a very large number? Please explain where I am wrong.
Zeroes are prepended if you consider the mathematical value (which just happens to also be the big-endian representation).
Casts in C always strive to preserve the value, not representation. That's how, for example, (int)1.25 results(*note below) in 1, as opposed to something which makes much less sense.
As discussed in the comments, the same holds for bit-shifts (and other bitwise operations, for that matter). 50 >> 1 == 25, regardless of endianness.
(* note: usually, depends rounding mode for float->integer conversion)
In short: Operators in C operate on the mathematical value, regardless of representation. One exception is when you cast a pointer to the value (as in (char*)&foo), since then it is essentially a different "view" to the same data.
Not sure if it answers your question, but will give it a try:
If you take a char variable and cast it to an int variable, then you get the exact same result on both architectures:
char c = 0x12;
int i = (int)c; // i == 0x12 on both architectures
If you take an int variable and cast it to a char variable, then you get the exact same result (possibly truncated) on both architectures:
int i = 0x12345678;
char c = (char)i; // c == 0x78 on both architectures
But if you take an int variable and read it using a char* pointer, then you get a different result on each architecture:
int i = 0x12345678;
char c = *(char*)&i; // c == 0x12 on BE architecture and 0x78 on LE architecture
The example above assumes that sizeof(int) == 4 (may be different on some compilers).
Loosely speaking, "Endianness" is the property of how processor sees the data stored in memory. This means that all the processors, when a particular data is brought to the CPU, sees it the same way.
For example:
int a = 0x01020304;
Irrespective of whether a little or big endian machine, would always have 04 as the least significant and 01 as the most significant byte, when stored in it's register.
The problem arises when this variable/data has to be stored in memory, which is "byte addressable". Should 01 (Most Significant Byte) go into the lowest memory address (Big Endian) or the highest memory address (Little Endian).
In your particular example, what you have shown is the representation, the way processor sees it - with LS/MS Byte.
So technically speaking, both little and big endian machines would have:
00000000 00000000 00000000 00000100
in its 32 bit wide register. Assuming of course what you have in memory is 32 bit wide integer representing 4. How this 4 is stored in/retrieved from memory is what endianness is all about.
Can we say that our 'traditional' way of writing in binary
is Big Endian?
e.g., number 1 in binary:
0b00000001 // Let's assume its possible to write numbers like that in code and b means binary
Also when I write a constant 0b00000001 in my code, this will always refer to integer 1 regardless if machine is big endian or little endian right?
In this notation the LSB is always written as the last element from the right, and MSB is always written as the left most element right?
Yes, humans generally write numerals in big-endian order (meaning that the digits written first have the most significant value), and common programming languages that accept numerals interpret them in the same way.
Thus, the numeral “00000001” means one; it never means one hundred million (in decimal) or 128 (in binary) or the corresponding values in other bases.
Much of C semantics is written in terms of the value of a number. Once a numeral is converted to a value, the C standard describes how that value is added, multiplied, and even represented as bits (with some latitude regarding signed values). Generally, the standard does not specify how those bits are stored in memory, which is where endianness in machine representations comes into play. When the bits representing a value are grouped into bytes and those bytes are stored in memory, we may see those bytes written in different orders on different machines.
However, the C standard specifies a common way of interpreting numerals in source code, and that interpretation is always big-endian in the sense that the most significant digits appear first.
If you want to put it that way, then yes, we humans write numerals in Big-Endian order.
But I think you have a misunderstanding in terms of your target runnign with big or little endian.
In your actual C-Code, it does not matter which endianess your target machine uses. For example these lines will always display the same, no matter the endianess of your system:
uint32 x = 0x0102;
printf("Output: %x\n",x); // Output: 102
or to take your example:
uint32 y = 0b0001;
printf("Output: %d\n",y); // Output: 1
However the storage of the data in your memory differs between Little and Big Endian.
Big Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x01 0x02 0x03 0x04
Little Endian:
Actual Value: 0x01020304
Memory Address: 0x00 0x01 0x02 0x03
Value: 0x04 0x03 0x02 0x01
Both times the actualy value is 0x01020304 (and this is what you assign in your C-Code).
You only have to worry about it, if you do memory operations. If you have a 4-Byte (uint8) array, which represents a 32-Bit integer and you want to copy it into a uint32 variable you need to care.
uint8 arr[4] = {0x01, 0x02, 0x03, 0x04};
uint32 var;
memcpy(&var,arr,4);
printf("Output: %x\n",var);
// Big Endian: Output: 0x01020304
// Little Endian: Output: 0x04030201
I'm trying to read a binary file into a C# struct. The file was created from C and the following code creates 2 bytes out of the 50+ byte rows.
unsigned short nDayTimeBitStuffed = atoi( LPCTSTR( strInput) );
unsigned short nDayOfYear = (0x01FF & nDayTimeBitStuffed);
unsigned short nTimeOfDay = (0x01F & (nDayTimeBitStuffed >> 9) );
Binary values on the file are 00000001 and 00000100.
The expected values are 1 and 2, so I think some bit ordering/swapping is going on but not sure.
Any help would be greatly appreciated.
Thanks!
The answer is 'it depends' - most notably on the machine, and also on how the data is written to the file. Consider:
unsigned short x = 0x0102;
write(fd, &x, sizeof(x));
On some machines (Intel), the low-order byte (0x02) will be written before the high-order byte (0x01); on others (PPC, SPARC), the high-order byte will be written before the low-order one.
So, from a little-endian (Intel) machine, you'd see the bytes:
0x02 0x01
But from a big-endian (PPC) machine, you'd see the bytes:
0x01 0x02
Your bytes appear to be 0x01 and 0x04. Your calculation for 0x02 appears flawed.
The C code you show doesn't write anything. The value in nDayOfYear is the bottom 9 bits of the input value; the nTimeOfDay appears to be the next 5 bits (so 14 of the 16 bits are used).
For example, if the value in strInput is 12141 decimal, 0x2F6D, then the value in nDayOfYear would be 365 (0x16D) and the value in nTimeOfDay would be 23 (0x17).
It is a funny storage order; you can't simply compare the two values whereas if you packed the day of year in the more significant portion of the value and time into the less significant, then you could compare values as simple integers and get the correct comparison.
The expected file contents are very much related to the processor and compiler used to create the file, if it's binary.
I'm assuming a Windows machine here, which uses 2 bytes for a short and puts them in little endian order.
Your comments don't make much sense either. If it's two bytes then it should be using two chars, not shorts. The range of the first is going to be 1-365, so it definitely needs more than a single byte to represent. I'm going to assume you want the first 4 bytes, not the first 2.
This means that the first byte will be bits 0-7 of the DayOfYear, the second byte will be bits 8-15 of the DayOfYear, the third byte will be bits 0-7 of the TimeOfDay, and the fourth byte will be bits 8-15 of the TimeOfDay.