Does bit-shift depend on endianness? - c

Suppose I have the number 'numb'=1025 [00000000 00000000 00000100 00000001] represented:
On Little-Endian Machine:
00000001 00000100 00000000 00000000
On Big-Endian Machine:
00000000 00000000 00000100 00000001
Now, if I apply Left Shift on 10 bits (i.e.: numb <<= 10), I should have:
[A] On Little-Endian Machine:
As I noticed in GDB, Little Endian does the Left Shift in 3 steps: [I have shown '3' Steps to better understand the processing only]
Treat the no. in Big-Endian Convention:
00000000 00000000 00000100 00000001
Apply Left-Shift:
00000000 00010000 00000100 00000000
Represent the Result again in Little-Endian:
00000000 00000100 00010000 00000000
[B]. On Big-Endian Machine:
00000000 00010000 00000100 00000000
My Question is:
If I directly apply a Left Shift on the Little Endian
Convention, it should give:
numb:
00000001 00000100 00000000 00000000
numb << 10:
00010000 00000000 00000000 00000000
But actually, it gives:
00000000 00000100 00010000 00000000
To achieve the second result only, I have shown three hypothetical steps above.
Please explain me why the above two results are different: The actual outcome of numb << 10 is different than the expected outcome.

Endianness is the way values are stored in memory. When loaded into the processor, regardless of endianness, the bit shift instruction is operating on the value in the processor's register. Therefore, loading from memory to processor is the equivalent of converting to big endian, the shifting operation comes next and then the new value is stored back in memory, which is where the little endian byte order comes into effect again.
Update, thanks to #jww: On PowerPC the vector shifts and rotates are endian sensitive. You can have a value in a vector register and a shift will produce different results on little-endian and big-endian.

No, bitshift, like any other part of C, is defined in terms of values, not representations. Left-shift by 1 is mutliplication by 2, right-shift is division. (As always when using bitwise operations, beware of signedness. Everything is most well-defined for unsigned integral types.)

Though the accepted answer points out that endianess is a concept from the memory view. But I don't think that answer the question directly.
Some answers tell me that bitwise operations don't depend on endianess, and the processor may represent the bytes in any other way. Anyway, it's talking about that endianess gets abstracted.
But when we do some bitwise calculations on the paper for example, don't need to state the endianess in the first place? Most times we choose an endianess implicitly.
For example, assume we have a line of code like this
0x1F & 0xEF
How would you calculate the result by hand, on a paper?
MSB 0001 1111 LSB
1110 1111
result: 0000 1111
So here we use a Big Endian format to do the calculation. You can also use Little Endian to calculate and get the same result.
Btw, when we write numbers in code, I think it's like a Big Endian format. 123456 or 0x1F, most significant numbers starts from the left.
Again, as soon as we write some a binary format of a value on the paper, I think we've already chosen an Endianess and we are viewing the value as we see it from the memory.
So back to the question, an shift operation << should be thought as shifting from LSB(least significant byte) to MSB(most significant byte).
Then as for the example in the question:
numb=1025
Little Endian
LSB 00000001 00000100 00000000 00000000 MSB
So << 10 would be 10bit shifting from LSB to MSB.
Comparison and << 10 operations for Little Endian format step by step:
MSB LSB
00000000 00000000 00000100 00000001 numb(1025)
00000000 00010000 00000100 00000000 << 10
LSB MSB
00000000 00000100 00010000 00000000 numb(1025) << 10, and put in a Little Endian Format
LSB MSB
00000001 00000100 00000000 00000000 numb(1205) in Little Endian format
00000010 00001000 00000000 00000000 << 1
00000100 00010000 00000000 00000000 << 2
00001000 00100000 00000000 00000000 << 3
00010000 01000000 00000000 00000000 << 4
00100000 10000000 00000000 00000000 << 5
01000000 00000000 00000001 00000000 << 6
10000000 00000000 00000010 00000000 << 7
00000000 00000001 00000100 00000000 << 8
00000000 00000010 00001000 00000000 << 9
00000000 00000100 00010000 00000000 << 10 (check this final result!)
Wow! I get the expected result as the OP described!
The problems that the OP didn't get the expected result are that:
It seems that he didn't shift from LSB to MSB.
When shifting bits in Little Endian format, you should realize(thank god I realize it) that:
LSB 10000000 00000000 MSB << 1 is
LSB 00000000 00000001 MSB, not
LSB 01000000 00000000 MSB
Because for each individual 8bits, we are actually writing it in a MSB 00000000 LSB Big Endian format.
So it's like
LSB[ (MSB 10000000 LSB) (MSB 00000000 LSB) ]MSB
To sum up:
Though bitwise operations is said to be abstracted away blablablabla..., when we calculate bitwise operations by hand, we still need to know what endianess we are using as we write down the binary format on the paper. Also we need to make sure all the operators use the same endianess.
The OP didn't get the expected result is because he did the shifting wrong.

Whichever shift instruction shifts out the higher-order bits first is considered the left shift. Whichever shift instruction shifts out the lower-order bits first is considered the right shift. In that sense, the behavior of >> and << for unsigned numbers will not depend on endianness.

Computers don't write numbers down the way we do. The value simply shifts. If you insist on looking at it byte-by-byte (even though that's not how the computer does it), you could say that on a little-endian machine, the first byte shifts left, the excess bits go into the second byte, and so on.
(By the way, little-endian makes more sense if you write the bytes vertically rather than horizontally, with higher addresses on top. Which happens to be how memory map diagrams are commonly drawn.)

Related

Read binary file in C skip byte of 0

When i read a binary file the program skip the byte who are equal to 0.
There is my program in C :
int main(int argc, char const *argv[])
{
FILE * input_file = fopen("binary.bin", "rb");
uint32_t b = 0 ;
fread(&b, sizeof(uint32_t) , 1 , input_file);
printf("----- Here are the data stored in the file -----\n");
printf("First uint_32 : %d\n", b);
printf("------------------------------------------------\n");
return 0;
}
The output :
----- Here are the data stored in the file -----
First uint_32 : 16777216
------------------------------------------------
The binary file :
xxd -b binary.bin
00000000: 00000000 00000000 00000000 00000001 00000000 00000000 ......
00000006: 00000000 00110010 00000000 00000000 00000000 01100100 .2...d
0000000c: 00000000 00000000 00000000 00010100 00000000 00000000 ......
00000012: 00000000 00000000 00000000 00000000 01011001 00110000 ....Y0
Why the output is not 1 ?
You got 16777216 as an output.
Let's see its binary representation
00000001000000000000000000000000
That's due to how data is written in the memory depending on the architecture of the system.
That's endianness of the system. You should read about little-endian and big-endian

Function that toggle LED on and off [duplicate]

This question already has answers here:
How do I set, clear, and toggle a single bit?
(27 answers)
Closed 6 years ago.
is it possible to toggle a bit to 0 when it is 1 and to 1 when it is 0? For example,00000000 go through the function will 00000001 and when the 00000001 go through the function again, 00000001 will become 00000000. Is it possible?
Certainly. What you want is to perform a logical XOR with 00000001 (this is called a mask):
MASK INPUT OUTPUT
00000001 XOR 00000000 = 00000001
00000001 XOR 00000001 = 00000000
This also makes it possible to toggle more than one bit, e.g., if your mask were 00001001:
MASK INPUT OUTPUT
00001001 XOR 00000000 = 00001001
00001001 XOR 00001001 = 00000000
00001001 XOR 00001000 = 00000001
00001001 XOR 00000001 = 00001000

What happens with bitwise shift for all 8 bits

I have a small query in c,
I am using the bitwise left shift on number 69 which is 01000101 in binary
01000101 << 8
and I get answer as 100010100000000
Shouldn't it be all 8 zeros i.e. 00000000 as we shift all the 8 bits to left and then pad with zeros.
It is because of the literal (default data type) for a number (int) is, in most of nowadays CPU, greater than 8-bit (typically 32-bit) and thus when you apply
69 << 8 //note 69 is int
It is actually applied like this
00000000 00000000 00000000 01000101 << 8
Thus you get the result
00000000 00000000 01000101 00000000
If you use, say, unsigned char specifically, then it won't happen:
unsigned char a = 69 << 8; //resulting in 0
This is because though 69 << 8 itself will still result in
01000101 00000000
But the above value will be casted to 8-bit unsigned char, resulting in:
00000000
Bit shift operators act on entire objects, not individual bytes. If the object storing 69 is wider than 1 byte (int is typically 4 bytes for example), then the bits that are shifted outside of the first (lowest/rightmost) byte overflow and are "pushed into" the second byte. For example:
00000000 00000000 00000000 01000101 //The number 69, stored in a 32 bit object
00000000 00000000 01010000 00000000 //shifted left by 8
If you had stored the number in a 1-byte variable, such as a char, the result would indeed have been zero.
01000101 //The number 69, stored in an 8 bit object
(01000101) 00000000 //shifted left by 8
^^^^^^^^
these bits have been shifted outside the size of the object.
The same thing would happen if you shifted an int by 32.
00000000 00000000 00000000 01000101 //The number 69, stored in a 32 bit int
00000000 00000000 01010000 00000000 //shifted left by 8
00000000 01010000 00000000 00000000 //shifted left by 16
01010000 00000000 00000000 00000000 //shifted left by 24
00000000 00000000 00000000 00000000 //shifted left by 32, overflow

bit manipulation in c giving unexpected results

I have this program:
#include <stdio.h>
int main(void)
{
unsigned char unit_id[] = { 0x2B, 0xC, 0x6B, 0x54}; // 8-bit (1 byte)
unsigned long long int unit_id_val; //64-bit (8 bytes)
int i;
// loops 4 times
for(i=0;i<sizeof(unit_id)/sizeof(char);i++){
unit_id_val |= unit_id[i] << (8 * i);
}
printf("the unit id is %llu\n", unit_id_val);
return 0;
}
hex to binary conversions:
0x2B = 00101011
0xC = 00001100
0x6B = 01101011
0x54 = 01010100
unit_id_val is 8 bytes (I use 5 bytes for unit_id_val below to simplify things)
1) first iteration 8*0=0 so no left shift occurs:
00101011 = 00101011 << 0
00000000 00000000 00000000 00000000 00000000 |= 00101011
So the result should be:
00000000 00000000 00000000 00000000 00101011
2) Second iteration 8*1=8, so left shift all bits of unsigned char 0xC by 8:
00000000 = 00101011 << 8
00000000 00000000 00000000 00000000 00101011 |= 00000000
So the result should be:
00000000 00000000 00000000 00000000 00101011
3) Third iteration 8*2=16, so left shift all bits of unsigned char 0x6B by 16:
00000000 = 01101011 << 16
00000000 00000000 00000000 00000000 00101011 |= 00000000
So the result should be:
00000000 00000000 00000000 00000000 00101011
4) Fourth iteration 8*3=24, so left shift all bits of unsigned char 0x54 by 32:
00000000 = 01010100 << 32
00000000 00000000 00000000 00000000 00101011 |= 00000000
So the result should be:
00000000 00000000 00000000 00000000 00101011
00101011 is 43
But when you run this program you get
1416301611
which is binary:
00010100 00010110 00110000 00010110 00010001
I am not understanding something here. I am following the precedence chart by evaluating primary expression operator () before evaluating left shift operator << before evaluating assignment operator !=. Yet I am not understanding why I get the response I get.
00000000 = 00101011 << 8
Ok first your second element is 0x0C (i.e., binary 00001100 not 00101011), so you are actually doing:
(unsigned char) 0x0C << 8
and the result of this expression is not 0 but 0x0C00 as the bitwise << operator does an integer promotion of its left operand, so it is actually equivalent to:
(int) (unsigned char) 0x0C << 8
You never initialize unit_id_val, and then you |= your shifted byte values into it, so whatever bits happened to be set in the uninitialized value will still be set, so your output will look like random garbage. Add
unit_id_val = 0;
before your loop.
In addition, whenever you do ANY operation in C, the operands are always converted by the standard conversions. In particular, that means any integer type smaller than an int will first be converted to int. So even though unsigned char is only 8 bits, when you do unit_id[i] << (8 * i), the 8-bit value from unit_id[i] will be converted to int (presumably 32 bits on your machine) before the shift. There's no way to do any sort of computation on integers smaller than an int in C -- even if you cast them, they'll be implicitly converted back to int.
see in this line
unit_id_val |= unit_id[i] << (8 * i);
unit_id_val = unit_id_val | unit_id[i] << (8 * i);
the problem is you are using a variable which is only declare but not initialized and we know that in c uninitialized variable default value is garbage or junk. So each and every time without any doubt you will get some unpredictable value.

What does an "Algn" of 2**2 and 2**0 mean in the output of objdump?

What does this mean in below file? 2**2 and 2**0
$ objdump -h main.o
main.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000000b 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 00000000 00000000 00000040 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 00000040 2**2
ALLOC
3 .note.GNU-stack 00000000 00000000 00000000 00000040 2**0
CONTENTS, READONLY, CODE
I would assume that 2**2 means 22, or 4 byte alignment, while 2**0 means no (one byte) alignment.
This value comes from the sh_addralign field of the ELF section header. The ELF specification states (emphasis mine):
sh_addralign Some sections have address alignment constraints. For example, if a section holds a
doubleword, the system must ensure doubleword alignment for the entire section.
That is, the value of sh_addr must be congruent to 0, modulo the value of
sh_addralign. Currently, only 0 and positive integral powers of two are allowed.
Values 0 and 1 mean the section has no alignment constraints.
As Ray Toal mentioned, since the alignment must be a power of two, it only makes sense that objdump would express this value as a power of two with the 2**x notation.
Note that in some languages, like Python and FORTRAN, ** is a power or exponentiation operator.
Looking at objdump.c, we see:
static void
dump_section_header (bfd *abfd, asection *section,
void *ignored ATTRIBUTE_UNUSED)
{
// ...
printf (" %08lx 2**%u", (unsigned long) section->filepos,
bfd_get_section_alignment (abfd, section));
And in objdump.h:
#define bfd_get_section_alignment(bfd, ptr) ((ptr)->alignment_power + 0)
where the alignment_power member of bfd is:
/* The alignment requirement of the section, as an exponent of 2 -
e.g., 3 aligns to 2^3 (or 8). */
unsigned int alignment_power;

Resources