Bitwise operation on big endian and little endian differences - c

I'm sorting the "children" of prefixes for IP address space. For example, 8.8.8.0/24 is the child of 8.8.8.0/23 in IP address space. I'm confused as to why the following two operations provide different results on my x86 little endian system
A little background information:
A /24 means that the first 24 bits of a 32 bit IPv4 address are "defined". That means that 8.8.8.0/24 encompasses 8.8.8.0 - 8.8.8.255. Similarly, for every bit that's not defined, the amount of address space doubles. 8.8.8.0/23 would have only the first 23 bits defined, so the actual address space goes from 8.8.8.0 - 8.8.9.255, or twice the size of a /24.
Now the confusion I'm having is with the following bitshifts
inet_addr("8.8.8.0") << (32 - 23) produces 269488128
inet_addr("8.8.9.0") << (32 - 23) produces 303042560
inet_addr produces a big endian number. However, when converting it to little endian -
htonl(inet_addr("8.8.8.0")) >> 9 produces 263172
htonl(inet_addr("8.8.9.0")) >> 9 produces 263172
Which is the expected result. Dropping the last 9 bits would mean that 8.8.9.0 would be equal to 8.8.8.0 in theory.
What am I missing here? Shouldn't it work the same for big endian?
Edit: Not a duplicate because I do understand the difference in how endianness affects the way numbers are stored, but I'm clearly missing something with these bitwise operators. The question is more to do with bitwise than endianness - the endianness is just there to foster an example

x86 is little endian. The number 1 in binary in little endian is
|10000000|00000000|00000000|00000000
If you bit shift this left by 9 bits it becomes...
|00000000|01000000|00000000|00000000
In a little endian machine 0xDEADBEEF printed out as a series of bytes from low to high address would actually print EFBEADDE, see
https://www.codeproject.com/Articles/4804/Basic-concepts-on-Endianness
and
https://www.gnu-pascal.de/gpc/Endianness.html.
Most people when thinking in binary think the number 1 is represented as follows (me included) and some people think this is big endian but it's not...
|00000000|00000000|00000000|00000001
In the code below I've printed out 0xDEADBEEF in little endian because my machine is an x86 and I've used the htonl function to convert it to network byte order. Note network byte order is defined as Big Endian.
So when I print out the big endian value for 1 ie htonl(1). The big endian representation of 1 is
|00000000|00000000|00000000|10000000
Try this code
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_deadbeef(void *p, size_t bytes) {
size_t i = 0;
for (i = 0; i < bytes; ++i) {
printf("%02X", ((unsigned char*)p)[i]);
}
printf("\n");
}
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
in_addr_t left = inet_addr("8.8.8.0");
in_addr_t right = inet_addr("8.8.9.0");
in_addr_t left_h = htonl(left);
in_addr_t right_h = htonl(right);
in_addr_t left_s = left << 9;
in_addr_t right_s = right >> 9;
assert(left != right);
printf("left != right\n");
print_bin(left, 4);
print_bin(right, 4);
printf("Big Endian if on x86\n");
print_bin(left_s, 4);
print_bin(right_s, 4);
printf("Little Endian if on x86\n");
print_bin(left_h, 4);
print_bin(right_h, 4);
printf("\n\nSome notes\n\n");
printf("0xDEADBEEF printed on a little endian machine\n");
uint32_t deadbeef = 0xDEADBEEF;
print_deadbeef(&deadbeef, 4);
uint32_t deadbeefBig = htonl(deadbeef);
printf("\n0xDEADBEEF printed in network byte order (big endian)\n");
print_deadbeef(&deadbeefBig, 4);
printf("\n1 printed on a little endian machine\n");
print_bin(1, 4);
printf("\nhtonl(1) ie network byte order (big endian) on a little endian machine\n");
print_bin(htonl(1), 4);
return 0;
}
This is the output
left != right
|00010000|00010000|00010000|00000000
|00010000|00010000|10010000|00000000
Big Endian if on x86
|00000000|00001000|00001000|00001000
|00100001|00100000|00000000|00000000
Little Endian if on x86
|00000000|00010000|00010000|00010000
|00000000|10010000|00010000|00010000
Some notes
0xDEADBEEF printed on a little endian machine
EFBEADDE
0xDEADBEEF printed in network byte order (big endian)
DEADBEEF
1 printed on a little endian machine
|10000000|00000000|00000000|00000000
htonl(1) ie network byte order on a little endian machine
|00000000|00000000|00000000|10000000

The question of Big Endian and Little Endian isn't really known to the machine.
The types in C don't contain such information since it's a Hardware issue, not a type related one.
The machine assumes that all multi-byte numbers are ordered according to it's local endian (on x86, this is usually little endian).
For this reason, bit shifting is always performed using the local endian assumption.
You can't correctly apply bit-shifting to a Big Endian number on a Little Endian machine.
You can't even print a Big Endian number to the screen on a Little Endian machine without getting a funny result.
This is why #Harry's answer was so cool, it prints out each bit, circumventing the issue.
Wikipedia has an article about Endianness with more details.
It should be noted that Endianness actually refers to the way a machine stores it's bytes in the memory.
For example, if the number were as String, Endianness would refer to the question: which "letter" (byte) would come first?
Some machine would store "Hello" and some would store "olleH" (for numbers only, in actual strings, the bytes are always ordered correctly).
Notice that although the order of bytes is reversed, each byte has all the bits ordered the same way, so each byte retains it's value.
When a bit-shift occurs, it always occurs according to the machine's byte ordering system, since this is how it's CPU and memory store are designed.

The accepted answer provides a good sample program. However, I think this sample is a little bit misleading.
The bit string of 1 in little-endian is printed as:
10000000|00000000|00000000|00000000
I ran this code on my x86 pc and I think the results are reliable. But that doesn't mean that the value 1 is stored as printed above in a little-endian machine.
According to the code of print_bin, the num right shifts one bit each time and the least significant bit is printed.
Also, the right shift operator always shifts from the most significant bit (MSB) to the least significant bit (LSB).
In the end, no matter the bit order, the result of print_bin(1, 4) is always the reverse of the human-writing bit representation of 1, which is:
00000000|00000000|00000000|00000001
For example, the bit string may be:
byte significance increase -->
byte
/-------\
00000001|00000000|00000000|00000000
|
bit
<-- bit significance increase
In this example, the bit-order is different from the byte-order. But the results of print_bin(1,4) would be the same.
In other words, the printed bit string doesn't necessarily mean reverse bit-order in the little-endian machine.
I talked about this further in this blog.

Related

Bitshifting vs array indexing, which is more appropriate for usart interfaces on 32bit MCUs

I have an embedded project with a USART HAL. This USART can only transmit or receive 8 or 16 bits at a time (depending on the usart register I chose i.e. single/double in/out). Since it's a 32-bit MCU, I figured I might as well pass around 32-bit fields as (from what I have been lead to understand) this is a more efficient use of bits for the MPU. Same would apply for a 64-bit MPU i.e. pass around 64-bit integers. Perhaps that is misguided advice, or advice taken out of context.
With that in mind, I have packed the 8 bits into a 32-bit field via bit-shifting. I do this for both tx and rx on the usart.
The code for the 8-bit only register is as follows (the 16-bit register just has half the amount of rounds for bit-shifting):
int zg_usartTxdataWrite(USART_data* MPI_buffer,
USART_frameconf* MPI_config,
USART_error* MPI_error)
{
MPI_error = NULL;
if(MPI_config != NULL){
zg_usartFrameConfWrite(MPI_config);
}
HPI_usart_data.txdata = MPI_buffer->txdata;
for (int i = 0; i < USART_TXDATA_LOOP; i++){
if((USART_STATUS_TXC & usart->STATUS) > 0){
usart->TXDATAX = (i == 0 ? (HPI_usart_data.txdata & USART_TXDATA_DATABITS) : (HPI_usart_data.txdata >> SINGLE_BYTE_SHIFT) & USART_TXDATA_DATABITS);
}
usart->IFC |= USART_STATUS_TXC;
}
return 0;
}
EDIT: RE-ENTERTING LOGIC OF ABOVE CODE WITH ADDED DEFINES FOR CLARITY OF TERNARY OPERATOR IMPLICIT PROMOTION PROBLEM DISCUSSED IN COMMENTS SECTION
(the HPI_usart and USART_data structs are the same just different levels, I have since removed the HPI_usart layer, but for the sake of this example I will leave it in)
#define USART_TXDATA_LOOP 4
#define SINGLE_BYTE_SHIFT 8
typedef struct HPI_USART_DATA{
...
uint32_t txdata;
...
}HPI_usart
HPI_usart HPI_usart_data = {'\0'};
const uint8_t USART_TXDATA_DATABITS = 0xFF;
int zg_usartTxdataWrite(USART_data* MPI_buffer,
USART_frameconf* MPI_config,
USART_error* MPI_error)
{
MPI_error = NULL;
if(MPI_config != NULL){
zg_usartFrameConfWrite(MPI_config);
}
HPI_usart_data.txdata = MPI_buffer->txdata;
for (int i = 0; i < USART_TXDATA_LOOP; i++){
if((USART_STATUS_TXC & usart->STATUS) > 0){
usart->TXDATAX = (i == 0 ? (HPI_usart_data.txdata & USART_TXDATA_DATABITS) : (HPI_usart_data.txdata >> SINGLE_BYTE_SHIFT) & USART_TXDATA_DATABITS);
}
usart->IFC |= USART_STATUS_TXC;
}
return 0;
}
However, I now realize that this is potentially causing more issues than it solves because I am essentially internally encoding these bits which then have to be decoded almost immediately when they are passed through to/from different data layers. I feel like it's a clever and sexy solution, but I'm now trying to solve a problem that I shouldn't have created in the first place. Like how to extract variable bit fields when there is an offset i.e. in gps nmea sentences where the first 8 bits might be one relevant field and then the rest are 32bit fields. So it ends up being like this:
32-bit array member 0:
bits 24-31 bits 15-23 bits 8-15 bits 0-7
| 8-bit Value | 32-bit Value A, bits 24-31 | 32-bit Value A, bits 16-23 | 32-bit Value A, bits 8-15 |
32-bit array member 1:
bits 24-31 bits 15-23 bits 8-15 bits 0-7
| 32-bit Value A, bits 0-7 | 32-bit Value B, bits 24-31 | 32-bit Value B, bits 16-23 | 32-bit Value B, bits 8-15 |
32-bit array member 2:
bits 24-31 15-23 8-15 ...
| 32-bit Value B, bits 0-7 | etc... | .... | .... |
The above example requires manual decoding, which is fine I guess, but it's different for every nmea sentence and just feels more manual than programmatic.
My question is this: bitshifting vs array indexing, which is more appropriate?
Should I just have assigned each incoming/outgoing value to a 32-bit array member and then just index that way? I feel like that is the solution since it would not only make it easier to traverse the data on other layers, but I would be able to eliminate all this bit-shifting logic and then the only difference between an rx or tx function would be the direction the data is going.
It does mean a small rewrite of the interface and the resulting gps module layer, but that feels like less work and also a cheap lesson early on in my project.
Also any thoughts and general experience on this would be great.
Since it's a 32-bit MCU, I figured I might as well pass around 32-bit fields
That's not really the programmer's call to make. Put the 8 or 16 bit variable in a struct. Let the compiler add padding if needed. Alternatively you can use uint_fast8_t and uint_fast16_t.
My question is this: bitshifting vs array indexing, which is more appropriate?
Array indexing is for accessing arrays. If you have an array, use it. If not, then don't.
While it is possible to chew through larger chunks of data byte by byte, such code must be written much more carefully, to prevent running into various subtle type conversion and pointer aliasing bugs.
In general, bit shifting is preferred when accessing data up to the CPU's word size, 32 bits in this case. It is fast and also portable, so that you don't have to take endianess in account. It is the preferred method of serialization/de-serialization of integers.

Casting uint8_t array into uint16_t value in C

I'm trying to convert a 2-byte array into a single 16-bit value. For some reason, when I cast the array as a 16-bit pointer and then dereference it, the byte ordering of the value gets swapped.
For example,
#include <stdint.h>
#include <stdio.h>
main()
{
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
printf("%x\n", (unsigned int)b);
return 0;
}
prints aa15 instead of 15aa (which is what I would expect).
What's the reason behind this, and is there an easy fix?
I'm aware that I can do something like uint16_t b = a[0] << 8 | a[1]; (which does work just fine), but I feel like this problem should be easily solvable with casting and I'm not sure what's causing the issue here.
As mentioned in the comments, this is due to endianness.
Your machine is little-endian, which (among other things) means that multi-byte integer values have the least significant byte first.
If you compiled and ran this code on a big-endian machine (ex. a Sun), you would get the result you expect.
Since your array is set up as big-endian, which also happens to be network byte order, you could get around this by using ntohs and htons. These functions convert a 16-bit value from network byte order (big endian) to the host's byte order and vice versa:
uint16_t b = ntohs(*(uint16_t*)a);
There are similar functions called ntohl and htonl that work on 32-bit values.
This is because of the endianess of your machine.
In order to make your code independent of the machine consider the following function:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
So for each case you can choose which operation to apply.
You cannot do anything like *(uint16_t*)a because of the strict aliasing rule. Even if code appears to work for now, it may break later in a different compiler version.
A correct version of the code could be:
b = ((uint16_t)a[0] << CHAR_BIT) + a[1];
The version suggested in your question involving a[0] << 8 is incorrect because on a system with 16-bit int, this may cause signed integer overflow: a[0] promotes to int, and << 8 means * 256.
This might help to visualize things. When you create the array you have two bytes in order. When you print it you get the human readable hex value which is the opposite of the little endian way it was stored. The value 1 in little endian as a uint16_t type is stored as follows where a0 is a lower address than a1...
a0 a1
|10000000|00000000
Note, the least significant byte is first, but when we print the value in hex it the least significant byte appears on the right which is what we normally expect on any machine.
This program prints a little endian and big endian 1 in binary starting from least significant byte...
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arpa/inet.h>
void print_bin(uint64_t num, size_t bytes) {
int i = 0;
for(i = bytes * 8; i > 0; i--) {
(i % 8 == 0) ? printf("|") : 1;
(num & 1) ? printf("1") : printf("0");
num >>= 1;
}
printf("\n");
}
int main(void) {
uint8_t a[2] = {0x15, 0xaa};
uint16_t b = *(uint16_t*)a;
uint16_t le = 1;
uint16_t be = htons(le);
printf("Little Endian 1\n");
print_bin(le, 2);
printf("Big Endian 1 on little endian machine\n");
print_bin(be, 2);
printf("0xaa15 as little endian\n");
print_bin(b, 2);
return 0;
}
This is the output (this is Least significant byte first)
Little Endian 1
|10000000|00000000
Big Endian 1 on little endian machine
|00000000|10000000
0xaa15 as little endian
|10101000|01010101

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)
You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.
The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}
From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

how is data stored at bit level according to "Endianness"?

I read about Endianness and understood squat...
so I wrote this
main()
{
int k = 0xA5B9BF9F;
BYTE *b = (BYTE*)&k; //value at *b is 9f
b++; //value at *b is BF
b++; //value at *b is B9
b++; //value at *b is A5
}
k was equal to A5 B9 BF 9F
and (byte)pointer "walk" o/p was 9F BF b9 A5
so I get it bytes are stored backwards...ok.
~
so now I thought how is it stored at BIT level...
I means is "9f"(1001 1111) stored as "f9"(1111 1001)?
so I wrote this
int _tmain(int argc, _TCHAR* argv[])
{
int k = 0xA5B9BF9F;
void *ptr = &k;
bool temp= TRUE;
cout<<"ready or not here I come \n"<<endl;
for(int i=0;i<32;i++)
{
temp = *( (bool*)ptr + i );
if( temp )
cout<<"1 ";
if( !temp)
cout<<"0 ";
if(i==7||i==15||i==23)
cout<<" - ";
}
}
I get some random output
even for nos. like "32" I dont get anything sensible.
why ?
Just for completeness, machines are described in terms of both byte order and bit order.
The intel x86 is called Consistent Little Endian because it stores multi-byte values in LSB to MSB order as memory address increases. Its bit numbering convention is b0 = 2^0 and b31 = 2^31.
The Motorola 68000 is called Inconsistent Big Endian because it stores multi-byte values in MSB to LSB order as memory address increases. Its bit numbering convention is b0 = 2^0 and b31 = 2^31 (same as intel, which is why it is called 'Inconsistent' Big Endian).
The 32-bit IBM/Motorola PowerPC is called Consistent Big Endian because it stores multi-byte values in MSB to LSB order as memory address increases. Its bit numbering convention is b0 = 2^31 and b31 = 2^0.
Under normal high level language use the bit order is generally transparent to the developer. When writing in assembly language or working with the hardware, the bit numbering does come into play.
Endianness, as you discovered by your experiment refers to the order that bytes are stored in an object.
Bits do not get stored differently, they're always 8 bits, and always "human readable" (high->low).
Now that we've discussed that you don't need your code... About your code:
for(int i=0;i<32;i++)
{
temp = *( (bool*)ptr + i );
...
}
This isn't doing what you think it's doing. You're iterating over 0-32, the number of bits in a word - good. But your temp assignment is all wrong :)
It's important to note that a bool* is the same size as an int* is the same size as a BigStruct*. All pointers on the same machine are the same size - 32bits on a 32bit machine, 64bits on a 64bit machine.
ptr + i is adding i bytes to the ptr address. When i>3, you're reading a whole new word... this could possibly cause a segfault.
What you want to use is bit-masks. Something like this should work:
for (int i = 0; i < 32; i++) {
unsigned int mask = 1 << i;
bool bit_is_one = static_cast<unsigned int>(ptr) & mask;
...
}
Your machine almost certainly can't address individual bits of memory, so the layout of bits inside a byte is meaningless. Endianness refers only to the ordering of bytes inside multibyte objects.
To make your second program make sense (though there isn't really any reason to, since it won't give you any meaningful results) you need to learn about the bitwise operators - particularly & for this application.
Byte Endianness
On different machines this code may give different results:
union endian_example {
unsigned long u;
unsigned char a[sizeof(unsigned long)];
} x;
x.u = 0x0a0b0c0d;
int i;
for (i = 0; i< sizeof(unsigned long); i++) {
printf("%u\n", (unsigned)x.a[i]);
}
This is because different machines are free to store values in any byte order they wish. This is fairly arbitrary. There is no backwards or forwards in the grand scheme of things.
Bit Endianness
Usually you don't have to ever worry about bit endianness. The most common way to access individual bits is with shifts ( >>, << ) but those are really tied to values, not bytes or bits. They preform an arithmatic operation on a value. That value is stored in bits (which are in bytes).
Where you may run into a problem in C with bit endianness is if you ever use a bit field. This is a rarely used (for this reason and a few others) "feature" of C that allows you to tell the compiler how many bits a member of a struct will use.
struct thing {
unsigned y:1; // y will be one bit and can have the values 0 and 1
signed z:1; // z can only have the values 0 and -1
unsigned a:2; // a can be 0, 1, 2, or 3
unsigned b:4; // b is just here to take up the rest of the a byte
};
In this the bit endianness is compiler dependant. Should y be the most or least significant bit in a thing? Who knows? If you care about the bit ordering (describing things like the layout of a IPv4 packet header, control registers of device, or just a storage formate in a file) then you probably don't want to worry about some different compiler doing this the wrong way. Also, compilers aren't always as smart about how they work with bit fields as one would hope.
This line here:
temp = *( (bool*)ptr + i );
... when you do pointer arithmetic like this, the compiler moves the pointer on by the number you added times the sizeof the thing you are pointing to. Because you are casting your void* to a bool*, the compiler will be moving the pointer along by the size of one "bool", which is probably just an int under the covers, so you'll be printing out memory from further along than you thought.
You can't address the individual bits in a byte, so it's almost meaningless to ask which way round they are stored. (Your machine can store them whichever way it wants and you won't be able to tell). The only time you might care about it is when you come to actually spit bits out over a physical interface like I2C or RS232 or similar, where you have to actually spit the bits out one-by-one. Even then, though, the protocol would define which order to spit the bits out in, and the device driver code would have to translate between "an int with value 0xAABBCCDD" and "a bit sequence 11100011... [whatever] in protocol order".

Resources