Does endianness matter at all with the bitwise operations? Either logical or shifting?
I'm working on homework with regard to bitwise operators, and I can not make heads or tails on it, and I think I'm getting quite hung up on the endianess. That is, I'm using a little endian machine (like most are), but does this need to be considered or is it a wasted fact?
In case it matters, I'm using C.
Endianness only matters for layout of data in memory. As soon as data is loaded by the processor to be operated on, endianness is completely irrelevent. Shifts, bitwise operations, and so on perform as you would expect (data logically laid out as low-order bit to high) regardless of endianness.
The bitwise operators abstract away the endianness. For example, the >> operator always shifts the bits towards the least significant digit. However, this doesn't mean you are safe to completely ignore endianness when using them, for example when dealing with individual bytes in a larger structure you cannot always assume that they will fall in the same place.
short temp = 0x1234;
temp = temp >> 8;
// on little endian, c will be 0x12, on big endian, it will be 0x0
char c=((char*)&temp)[0];
To clarify, I am not in basic disagreement with the other answers here. The point I am trying to make is to emphasise that although the bitwise operators are essentially endian neutral, you cannot ignore the effect of endianess in your code, especially when combined with other operators.
As others have mentioned, shifts are defined by the C language specification and are independent of endianness, but the implementation of a right shift may vary depending on iff the architecture uses one's complement or two's complement arithmetic.
It depends. Without casting the number into a new type, you can treat the endianness transparently.
However, if your operation involves some new type casting, then use your caution.
For example, if you want right shift some bits and cast (explicitly or not) to a new type, endianness matters!
To test your endianness, you can simply cast an int into a char:
int i = 1;
char *ptr;
...
ptr = (char *) &i; //Cast it here
return (*ptr);
You haven't specified a language but usually, programming languages such as C abstract endianness away in bitwise operations. So no, it doesn't matter in bitwise operations.
Related
EDIT: The premise was faulty and thanks to other users' comments, I realize the system I described below cannot work. But I wonder, is there a system that would work for storing flags in different positions within a variable so that it can be simultaneously used to store either a high-precision small value or a lower-precision large value?
Original question:
In C11 and C++11, I want to stuff two single-bit flags into a size_t variable that I am simultaneously using to store an unrelated value. Since that value will usually be low, my idea is to use the two most significant bits to store the flags unless the third-most significant bit is set, in which case I store the flags in the two least significant bits. That way, if the value is within the usual range, it can have a precision of one, and if the value is huge, it can have a precision of four. I can figure out where the flags are stored and how to interpret the value just by checking the third-most significant bit.
Unlike the uintN_t types, the standards don't seem to guarantee that size_t has no padding bits. I’m not well-versed in bit-twiddling. In the unlikely event of a system that uses padding bits in size_t, will the bit-wise operations I need to implement this system result in undefined behavior?
To satisfy the curious, I don't want to store the flags in a separate char because memory usage is a priority and doing so would enlarge the containing struct by max_align_t on most systems (because of struct alignment/padding).
In the unlikely event of a system that uses padding bits in size_t,
will the bit-wise operations I need to implement this system result in
undefined behavior?
I infer that your concern is that you might inadvertently try to twiddle a padding bit. You do not need to worry about that, because the bitwise operations are defined in terms of values, not the details of the representations of those values.
Consider this little piece of code:
int a=0x10000001;
char b;
b=(char)a;
printf("%#x \n",b);
On my PC it prints 0x01 and I am not suprised.
How would it work on BIG ENDIAN machine? I expect that it would print 0x10000001. Am I right?
I browsed books and web but I didn't find clear information how the casting operation really deals with the memory.
No, casting like the one in question does preserve value if possible and does not depend on memory representation.
If you want to reinterpret the memory representation you need to cast pointers. Then it will depend on endianness:
b=*((char*)&a);
Numbers are not big- or little-endian. Sequences of bytes are big- or little-endian. Numbers are just numbers.
C's numeric types deal, unsurprisingly, with numbers, not with sequences of bytes.
No. Endianness doesn't matter in this example. Converting to char (assuming a char is narrower than an int) will keep the lower-order bits, and lower-order bits are lower-order bits, no matter how they are stored in memory.
C++ standard states the following:
5.2.3.1 A simple-type-specifier (7.1.5) followed by a parenthesized expression-list constructs a value of the specified type given the
expression list.
So yes, it constructs new instance demoting the wider type to shorter one regardless the binary representation. So if you're interested in exact part of the int to be passed to char, let use bitwise shift operator - it is platform-independent and produces predictable results.
it's idiomatic to initialize a block of memory to zero by
memset(p, 0, size_of_p);
when we want to initialize it to minus one, we can:
memset(p, -1, size_of_p);
no matter what type p is, because in two's complemenatry representation, minus one is 0xff for 8 bits integer, 0xffff for 16 bits, and 0xffffffff for 32 bits.
My concern is, is such two's complementary representation universally applicable in the realm of modern computers? Can I expect such code platform-independent and robust enough to port to other platforms?
Thanks in advance.
No, there are three schemes of representing negative numbers allowed by the ISO C standard:
two's complement;
ones' complement; and
sign/magnitude.
However, you should keep in mind that it's been a long time since I've seen a platform using the two less common schemes. I would say that all modern implementations use two's complement.
You only want to concern yourself with this if you're trying to be 100% portable. If you're the type of person who's happy to be 99.99999% portable, don't worry about it.
See also info on the ones' complement Unisys 2200 (still active as at 2010) and this answer explaining the layouts.
The simple answer is yes, but the better answer is that you're prematurely optimising.
Write code that is obvious instead of code that you think is fast:
for(i = 0; i < p; i++)
array[i] = -1;
Will be automatically converted by an optimising compiler to the fastest possible representation (in VS it will become a memset when p is large enough) that does what you want without you having to think about whether this premature optimisation is always valid.
I totally understand how to shift bits. I've worked through numerous examples on paper and in code and don't need any help there.
I'm trying to come up with some real world examples of how bit shifting is used. Here are some examples I've been able to come up with:
Perhaps the most important example I could conceptualize had to do with endianness. In big endian systems, least significant bits are stored from the left, and in little endian systems, least significant bits are stored from the right. I imagine that for files and networking transmissions between systems which use opposite endian strategies, certain conversions must be made.
It seems certain optimizations could be made by compilers and processors when dealing with any multiplications that are n^2, n^4, etc. The bits are just being shifted to the left. (Conversly, I suppose the same would apply for division, n/2, n/4, etc.)
In encryption algorithms. Ie using a series of bit shifts, reverses and combinations to obfuscate something.
Are all of these accurate examples? Is there anything you would add? I've spent quite a bit of time learning about how to implement bit shifting / reordering / byte swapping and I want to know how it can be practically applied = )
I would not agree that the most important example is endianness but it is useful. Your examples are valid.
Hash functions often use bitshifts as a way to get a chaotic behavior; not dissimilar to your cryptographic algorithms.
One common use is to use an int/long as a series of flag values, that can be checked, set, and cleared by bitwise operators.
Not really widely used, but in (some) chess games the board and moves are represented with 64 bit integer values (called bitboards) so evaluating legal moves, making moves, etc. is done with bitwise operators. Lots of explanations of this on the net, but this one seems like a pretty good explanation: http://www.frayn.net/beowulf/theory.html#bitboards.
And finally, you might find that you need to count the number of bits that are set in an int/long, in some technical interviews!
The most common example of bitwise shift usage I know is for setting and clearing bits.
uint8_t bla = INIT_VALUE;
bla |= (1U << N); // Set N-th bit
bla &= ~(1U << N); // Clear N-th bit
Quick multiplication and division by a power of 2 - Especially important in embedded applications
CRC computation - Handy for networks e.g. Ethernet
Mathematical calculations that requires very large numbers
Just a couple off the top of my head
For a specific need I am building a four byte integer out of four one byte chars, using nothing too special (on my little endian platform):
return (( v1 << 24) | (v2 << 16) | (v3 << 8) | v4);
I am aware that an integer stored in a big endian machine would look like AB BC CD DE instead of DE CD BC AB of little endianness, although would it affect the my operation completely in that I will be shifting incorrectly, or will it just cause a correct result that is stored in reverse and needs to be reversed?
I was wondering whether to create a second version of this function to do (yet unknown) bit manipulation for a big-endian machine, or possibly to use ntonl related function which I am unclear of how that would know if my number is in correct order or not.
What would be your suggestion to ensure compatibility, keeping in mind I do need to form integers in this manner?
As long as you are working at the value level, there will be absolutely no difference in the results you obtain regardless of whether your machine is little-endian or big-endian. I.e. as long as you are using language-level operators (like | and << in your example), you will get exactly the same arithmetical result from the above expression on any platform. The endianness of the machine is not detectable and not visible at this level.
The only situations when you need to care about endianness is when the data you are working with is examined at the object representation level, i.e. in situations when its raw memory representation is important. What you said above about "AB BC CD DE instead of DE CD BC AB" is specifically about the raw memory layout of the data. That's what functions like ntonl do: they convert one memory layout to another memory layout. So far you gave no indication that the actual raw memory layout is in any way important to you. Is it?
Again, if you only care about the value of the above expression, it is fully and totally endianness-independent. Basically, you are not supposed to care about endianness at all when you write C programs that don't attempt to access and examine the raw memory contents.
although would it affect the my operation completely in that I will be shifting incorrectly (?)
No.
The result will be the same regardless of the endian architecture. Bit shifting and twiddling are just like regular arithmetic operations. Is 2 + 2 the same on little endian and big endian architectures? Of course. 2 << 2 would be the same as well.
Little and big endian problems arise when you are dealing directly with the memory. You will run into problems when you do the following:
char bytes[] = {1, 0, 0, 0};
int n = *(int*)bytes;
On little endian machines, n will equal 0x00000001. On big endian machines, n will equal 0x01000000. This is when you will have to swap the bytes around.
[Rewritten for clarity]
ntohl (and ntohs, etc.) is used primarily for moving data from one machine to another. If you're simply manipulating data on one machine, then it's perfectly fine to do bit-shifting without any further ceremony -- bit-shifting (at least in C and C++) is defined in terms of multiplying/dividing by powers of 2, so it works the same whether the machine is big-endian or little-endian.
When/if you need to (at least potentially) move data from one machine to another, it's typically sensible to use htonl before you send it, and ntohl when you receive it. This may be entirely nops (in the case of BE to BE), two identical transformations that cancel each other out (LE to LE), or actually result in swapping bytes around (LE to BE or vice versa).
FWIW, I think a lot of what has been said here is correct. However, if the programmer has coded with endianness in mind, say using masks for bitwise inspection and manipulation, then cross-platform results could be unexpected.
You can determine 'endianness' at runtime as follows:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
... and proceed accordingly.
I borrowed the code snippet from here: http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs- where there is also an excellent discussion of these issues.
hth -
Perry