Is two's complementary representation universally platform-independent?

Is two's complementary representation universally platform-independent? - c

it's idiomatic to initialize a block of memory to zero by
memset(p, 0, size_of_p);
when we want to initialize it to minus one, we can:
memset(p, -1, size_of_p);
no matter what type p is, because in two's complemenatry representation, minus one is 0xff for 8 bits integer, 0xffff for 16 bits, and 0xffffffff for 32 bits.
My concern is, is such two's complementary representation universally applicable in the realm of modern computers? Can I expect such code platform-independent and robust enough to port to other platforms?
Thanks in advance.

No, there are three schemes of representing negative numbers allowed by the ISO C standard:
two's complement;
ones' complement; and
sign/magnitude.
However, you should keep in mind that it's been a long time since I've seen a platform using the two less common schemes. I would say that all modern implementations use two's complement.
You only want to concern yourself with this if you're trying to be 100% portable. If you're the type of person who's happy to be 99.99999% portable, don't worry about it.
See also info on the ones' complement Unisys 2200 (still active as at 2010) and this answer explaining the layouts.

The simple answer is yes, but the better answer is that you're prematurely optimising.
Write code that is obvious instead of code that you think is fast:
for(i = 0; i < p; i++)
array[i] = -1;
Will be automatically converted by an optimising compiler to the fastest possible representation (in VS it will become a memset when p is large enough) that does what you want without you having to think about whether this premature optimisation is always valid.

Related

Why not use complement of VAL instead of (-VAL -1)

I'm reading redis' source code from https://github.com/antirez/redis.
I saw such macros in src/ziplist.c
#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)
why not just do this?
#define INT24_MIN (~INT24_MAX)

A better question might be why do you think (~INT24_MAX) is better than (-INT24_MAX - 1)?
On a two's complement machine you get the same result from either expression, and both of them evaluate just as fast as the other (for a 32-bit target, the compiler will reduce both of them them to 0xff800000 at compile time). However, in my opinion the expression (-INT24_MAX - 1) models the numeric concept that the minimum value is one less than the negation of the maximum value better.
That might not be of any huge importance, but the expression (~INT24_MAX) isn't better in an objective way, and I'd argue that subjectively it might not be as good.
Basically, (-INT24_MAX - 1) might have been what the coder happened to think of (maybe since as I mentioned, it models what's numerically intended), and there's no reason to use something else.

Suppose, int is 32-bit and can hold 0x7fffff, then ~0x7fffff is going to be ~0x007fffff or, after all bits have been inverted, 0xff800000.
This bit pattern represents the negative value -0x7fffff-1 if negative integers use the 2's complement representation.
If they use the 1's complement representation, then this pattern represents the negative value -0x7fffff.
If they use the sign-magnitude representation, then this pattern represents the negtative value -0x7f800000.
As you can see, the value of ~0x7fffff is going to depend on the representation of negative integers and the size of the type that can hold the value 0x7fffff.
If you're trying to write portable C code, you should avoid such situations.

Why cast like this? "(uint16_t) -1;"

I saw a line of code like this:
xxxxx = (uint16_t) -1;
Why cast -1 into a unsigned int? What is this code to get?
Thanks!

Obviously the answer is within reach of your fingertips. Just plug that code in and step through and you will see the value that you get when you cast -1 to a 16 bit unsigned integer.
The value turns out to be two's complement of -1, which is: 0xFFFF hex, or 65535 decimal.
As to the actual reason for using the code like that, it's simply a short-cut. Or maybe it's just to satisfy a type compatibility requirement.
If you're wondering how come -1 gets cast to 0xFFFF (65535) and not maybe 0 or 1, as one might expect, you have to understand that the reason for that is that the C language, although statically typed, is quite liberal when it comes to enforcing type restrictions. That means that it will happily cast - or interpret if you will - any memory location as whatever arbitrary type of data you tell it. This of course can have quite devastating consequences if used improperly but the trade-off is flexibility and a speed improvement due to the lack of strict sanity checks. This was very important a few decades ago when C was designed and it still is if you're writing code for very tiny processors.
That being said, if you think about a cast as simply saying: "disregard what you think you know about the data at this memory location, just tell me what it would mean if you read it as a <insert_your_type_here>" and if you know that computers usually represent negative numbers as two's complement (see above) then the answer should by now by pretty obvious: C is taking the value in memory and reading it back as an unsigned integer.
As an ending note. I should point out that C is not the only language that will cast -1 to 0xFFFF but even more modern languages that are capable of stronger type checks will do the same, probably for compatibility and continuity reasons, as well as for the reason that it makes it possible to reverse the cast: 0xFFFF back to a signed 16 bit integer is -1.

It will return the highest possible unsigned 16bit integer.

It returns 65535 which is the maximum value of a two byte (16 bit) unsigned integer.

Binary representation in C

In C why is there no standard specifier to print a number in its binary format, sth like %b. Sure, one can write some functions /hacks to do this but I want to know why such a simple thing is not a standard part of the language.
Was there some design decision behind it? Since there are format specifiers for octal %o and %x for hexadecimal is it that octal and hexadecimal are somewhat "more important" than the binary representation.
Since In C/C++ one often encounters bitwise operators I would imagine that it would be useful to have %b or directly input a binary representation of a number into a variable (the way one inputs hexadecimal numbers like int i=0xf2 )
Note: Threads like this discuss only the 'how' part of doing this and not the 'why'

The main reason is 'history', I believe. The original implementers of printf() et al at AT&T did not have a need for binary, but did need octal and hexadecimal (as well as decimal), so that is what was implemented. The C89 standard was fairly careful to standardize existing practice - in general. There were a couple of new parts (locales, and of course function prototypes, though there was C++ to provide 'implementation experience' for those).
You can read binary numbers with strtol() et al; specify a base of 2. I don't think there's a convenient way of formatting numbers in different bases (other than 8, 10, 16) that is the inverse of strtol() - presumably it should be ltostr().

You ask "why" as if there must be a clear and convincing reason, but the reality is that there is no technical reason for not supporting a %b format.
K&R C was created be people who framed the language to meet what they thought were going to be their common use cases. An opposing force was trying to keep the language spec as simple as possible.
ANSI C was standardized by a committee whose members had diverse interests. Clearly %b did not end-up being a winning priority.
Languages are made by men.

The main reason as I see it is what binary representation should one use? one's complement? two's complement? are you expecting the actual bits in memory or the abstract number representation?
Only the latter makes sense when C makes no requirements of word size or binary number representation. So since it wouldn't be the bits in memory, surely you would rather read the abstract number in hex?
Claiming an abstract representation is "binary" could lead to the belief that -0b1 ^ 0b1 == 0 might be true or that -0b1 | -0b10 == -0b11
Possible representations:
While there is only one meaningful hex representation --- the abstract one, the number -0x79 can be represented in binary as:
-1111001 (the abstract number)
11111001 (one's complement)
10000111 (two's complement)
#Eric has convinced me that endianness != left-to-right order...
the problem is further compounded when numbers don't fit in one byte. the same number could be:
1000000001111001 as a one's-complement big-endian 16bit number
1111111110000111 as a two's-complement big-endian 16bit number
1000011110000000 as a one's-complement little-endian 16bit number
1000011111111111 as a two's-complement little-endian 16bit number
The concepts of endianness and binary representation don't apply to hex numbers as there is no way they could be considered the actual bits-in-memory representation.
All these examples assume an 8-bit byte, which C makes no guarantees of (indeed there have been historical machines with 10 bit bytes)
Why no decision is better than any decision:
Obviously one can arbitrarily pick one representation, or leave it implementation defined.
However:
if you are trying to use this to debug bitwise operations, (which I see as the only compelling reason to use binary over hex) you want to use something close what the hardware uses, which makes it impossible to standardise, so you want implementation defined.
Conversely if you are trying to read a bit sequence, you need a standard, not implementation defined format.
And you definitely want printf and scanf to use the same.
So it seems to me there is no happy medium.

One answer may be that hexadecimal formatting is much more compact. See for example the hexa view of Total Commander's Lister.
%b would be useful in lots of practical cases. For example, if you write code to analyze network packets, you have to read the values of bits, and if printf would have %b, debugging such code would be much easier. Even if omitting %b could be explained when printf was designed, it was definitely a bad idea.

I agree. I was a participant in the original ANSI C committee and made the proposal to include a binary representation in C. However, I was voted down, for some of the reasons mentioned above, although I still think it would be quite helpful when doing, e.g., bitwise operations, etc.
It is worth noting that the ANSI committee was for the most part composed of compiler developers, not users and C programmers. Their objectives were to make the standard understandable to compiler developers not necessarily for C programmers, and to be able to do so with a document that was no longer than it need be, even if this meant it was a difficult read for C programmers.

Bitwise operators and "endianness"

Does endianness matter at all with the bitwise operations? Either logical or shifting?
I'm working on homework with regard to bitwise operators, and I can not make heads or tails on it, and I think I'm getting quite hung up on the endianess. That is, I'm using a little endian machine (like most are), but does this need to be considered or is it a wasted fact?
In case it matters, I'm using C.

Endianness only matters for layout of data in memory. As soon as data is loaded by the processor to be operated on, endianness is completely irrelevent. Shifts, bitwise operations, and so on perform as you would expect (data logically laid out as low-order bit to high) regardless of endianness.

The bitwise operators abstract away the endianness. For example, the >> operator always shifts the bits towards the least significant digit. However, this doesn't mean you are safe to completely ignore endianness when using them, for example when dealing with individual bytes in a larger structure you cannot always assume that they will fall in the same place.
short temp = 0x1234;
temp = temp >> 8;
// on little endian, c will be 0x12, on big endian, it will be 0x0
char c=((char*)&temp)[0];
To clarify, I am not in basic disagreement with the other answers here. The point I am trying to make is to emphasise that although the bitwise operators are essentially endian neutral, you cannot ignore the effect of endianess in your code, especially when combined with other operators.

As others have mentioned, shifts are defined by the C language specification and are independent of endianness, but the implementation of a right shift may vary depending on iff the architecture uses one's complement or two's complement arithmetic.

It depends. Without casting the number into a new type, you can treat the endianness transparently.
However, if your operation involves some new type casting, then use your caution.
For example, if you want right shift some bits and cast (explicitly or not) to a new type, endianness matters!
To test your endianness, you can simply cast an int into a char:
int i = 1;
char *ptr;
...
ptr = (char *) &i; //Cast it here
return (*ptr);

You haven't specified a language but usually, programming languages such as C abstract endianness away in bitwise operations. So no, it doesn't matter in bitwise operations.

Is one's complement a real-world issue, or just a historical one?

Another question asked about determining odd/evenness in C, and the idiomatic (x & 1) approach was correctly flagged as broken for one's complement-based systems, which the C standard allows for.
Do systems really exist in the 'real world' outside of computer museums? I've been coding since the 1970's and I'm pretty sure I've never met such a beast.
Is anyone actually developing or testing code for such a system? And, if not, should we worry about such things or should we put them into Room 101 along with paper tape and punch cards...?

I work in the telemetry field and we have some of our customers have old analog-to-digital converters that still use 1's complement. I just had to write code the other day to convert from 1's complement to 2's complement in order to compensate.
So yes, it's still out there (but you're not going to run into it very often).

This all comes down to knowing your roots.
Yes, this is technically an old technique and I would probably do what other people suggested in that question and use the modulo (%) operator to determine odd or even.
But understanding what a 1s complement (or 2s complement) is always a good thing to know. Whether or not you ever use them, your CPU is dealing with those things all of the time. So it can never hurt to understand the concept. Now, modern systems make it so you generally never have to worry about things like that so it has become a topic for Programming 101 courses in a way. But you have to remember that some people actually would still use this in the "real world"... for example, contrary to popular belief there are people who still use assembly! Not many, but until CPUs can understand raw C# and Java, someone is going to still have to understand this stuff.
And heck, you never know when you might find your self doing something where you actually need to perform binary math and that 1s complement could come in handy.

The CDC Cyber 18 I used back in the '80 was a 1s complement machine, but that's nearly 30 years ago, and I haven't seen one since (however, that was also the last time I worked on a non-PC)

RFC 791 p.14 defines the IP header checksum as:
The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the header. For purposes of computing the checksum, the value of the checksum field is zero.
So one's complement is still heavily used in the real world, in every single IP packet that is sent. :)

I decided to find one. The Unisys ClearPath systems have an ANSI C compiler (yes they call it "American National Standard C" for which even the PDF documentation was last updated in 2013. The documentation is available online;
There the signed types are all using one's complement representation, with the following properties:
Type | Bits | Range
---------------------+------+-----------------
signed char | 9  | -2⁸+1 ... 2⁸-1
signed short |  18  | -2¹⁷+1 ... 2¹⁷-1
signed int | 36 | -2³⁵+1 ... 2³⁵-1
signed long int |  36 | -2³⁵+1 ... 2³⁵-1
signed long long int |  72 | -2⁷¹+1 ... 2⁷¹-1
Remarkably, it also by default supports non-conforming unsigned int and unsigned long, which range from 0 ... 2³⁶ - 2, but can be changed to 0 ... 2³⁶ - 1 with a pragma.

I've never encountered a one's complement system, and I've been coding as long as you have.
But I did encounter a 9's complement system -- the machine language of a HP-41c calculator. I'll admit that this can be considered obsolete, and I don't think they ever had a C compiler for those.

We got off our last 1960's Honeyboxen sometime last year, which made it our oldest machine on site. It was two's complement. This isn't to say knowing or being aware of one's complement is a bad thing. Just, You will probably never run into one's complement issues today, no matter how much computer archeology they have you do at work.
The issues you are more likely to run into on the integer side are endian issues (I'm looking at you PDP). Also, you'll run into more "real world" (i.e. today) issues with floating point formats than you will integer formats.

Funny thing, people asked that same question on comp.std.c in 1993, and nobody could point to a one's complement machine that had been used back then.
So yes, I think we can confidently say that one's complement belongs to a dark corner of our history, practically dead, and is not a concern anymore.

Is one's complement a real-world issue, or just a historical one?
Yes, it still used. Its even used in modern Intel processors. From Intel® 64 and IA-32 Architectures Software Developer’s Manual 2A, page 3-8:
3.1.1.8 Description Section
Each instruction is then described by number of information sections. The “Description” section describes the purpose of the instructions and required operands in more detail.
Summary of terms that may be used in the description section:
* Legacy SSE: Refers to SSE, SSE2, SSE3, SSSE3, SSE4, AESNI, PCLMULQDQ and any future instruction sets referencing XMM registers and encoded without a VEX prefix.
* VEX.vvvv. The VEX bitfield specifying a source or destination register (in 1’s complement form).
* rm_field: shorthand for the ModR/M r/m field and any REX.B
* reg_field: shorthand for the ModR/M reg field and any REX.R

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight