In C why is there no standard specifier to print a number in its binary format, sth like %b. Sure, one can write some functions /hacks to do this but I want to know why such a simple thing is not a standard part of the language.
Was there some design decision behind it? Since there are format specifiers for octal %o and %x for hexadecimal is it that octal and hexadecimal are somewhat "more important" than the binary representation.
Since In C/C++ one often encounters bitwise operators I would imagine that it would be useful to have %b or directly input a binary representation of a number into a variable (the way one inputs hexadecimal numbers like int i=0xf2 )
Note: Threads like this discuss only the 'how' part of doing this and not the 'why'
The main reason is 'history', I believe. The original implementers of printf() et al at AT&T did not have a need for binary, but did need octal and hexadecimal (as well as decimal), so that is what was implemented. The C89 standard was fairly careful to standardize existing practice - in general. There were a couple of new parts (locales, and of course function prototypes, though there was C++ to provide 'implementation experience' for those).
You can read binary numbers with strtol() et al; specify a base of 2. I don't think there's a convenient way of formatting numbers in different bases (other than 8, 10, 16) that is the inverse of strtol() - presumably it should be ltostr().
You ask "why" as if there must be a clear and convincing reason, but the reality is that there is no technical reason for not supporting a %b format.
K&R C was created be people who framed the language to meet what they thought were going to be their common use cases. An opposing force was trying to keep the language spec as simple as possible.
ANSI C was standardized by a committee whose members had diverse interests. Clearly %b did not end-up being a winning priority.
Languages are made by men.
The main reason as I see it is what binary representation should one use? one's complement? two's complement? are you expecting the actual bits in memory or the abstract number representation?
Only the latter makes sense when C makes no requirements of word size or binary number representation. So since it wouldn't be the bits in memory, surely you would rather read the abstract number in hex?
Claiming an abstract representation is "binary" could lead to the belief that -0b1 ^ 0b1 == 0 might be true or that -0b1 | -0b10 == -0b11
Possible representations:
While there is only one meaningful hex representation --- the abstract one, the number -0x79 can be represented in binary as:
-1111001 (the abstract number)
11111001 (one's complement)
10000111 (two's complement)
#Eric has convinced me that endianness != left-to-right order...
the problem is further compounded when numbers don't fit in one byte. the same number could be:
1000000001111001 as a one's-complement big-endian 16bit number
1111111110000111 as a two's-complement big-endian 16bit number
1000011110000000 as a one's-complement little-endian 16bit number
1000011111111111 as a two's-complement little-endian 16bit number
The concepts of endianness and binary representation don't apply to hex numbers as there is no way they could be considered the actual bits-in-memory representation.
All these examples assume an 8-bit byte, which C makes no guarantees of (indeed there have been historical machines with 10 bit bytes)
Why no decision is better than any decision:
Obviously one can arbitrarily pick one representation, or leave it implementation defined.
However:
if you are trying to use this to debug bitwise operations, (which I see as the only compelling reason to use binary over hex) you want to use something close what the hardware uses, which makes it impossible to standardise, so you want implementation defined.
Conversely if you are trying to read a bit sequence, you need a standard, not implementation defined format.
And you definitely want printf and scanf to use the same.
So it seems to me there is no happy medium.
One answer may be that hexadecimal formatting is much more compact. See for example the hexa view of Total Commander's Lister.
%b would be useful in lots of practical cases. For example, if you write code to analyze network packets, you have to read the values of bits, and if printf would have %b, debugging such code would be much easier. Even if omitting %b could be explained when printf was designed, it was definitely a bad idea.
I agree. I was a participant in the original ANSI C committee and made the proposal to include a binary representation in C. However, I was voted down, for some of the reasons mentioned above, although I still think it would be quite helpful when doing, e.g., bitwise operations, etc.
It is worth noting that the ANSI committee was for the most part composed of compiler developers, not users and C programmers. Their objectives were to make the standard understandable to compiler developers not necessarily for C programmers, and to be able to do so with a document that was no longer than it need be, even if this meant it was a difficult read for C programmers.
Related
representations of values on a computer can vary “culturally” from architecture to architecture or are determined by the type the programmer gave to the value. Therefore, we should try to reason primarily about values and not about representations if we want to write portable code.
Specifying values. We have already seen several ways in which numerical constants (literals) can be specified:
123 Decimal integer constant.
077 Octal integer constant.
0xFFFF Hexadecimal integer constant.
et cetera
Question: Are decimal integer constants and hexadecimal integer constants, different ways to 'represent' values or are they values themselves? If the latter what are different ways to represent them on different architectures?
The source of the aforementioned is the book "Modern C" by Jens Gustedt which is freely available online, specifically from page no. 38 to page no. 46.
The words "representation" can be used here in two different contexts.
One is when we (the programmers) specify e.g. integer constants. For example, the value 37 may be represented in the C code as 37 or 0x25 or 045. Regardless of which representation we have chosen, the C compiler will interpret this into the same value when generating the binary code. Hence, these statements all generate the same code:
int a = 37;
int a = 0x25;
int a = 045;
Another context is how the compiler chooses to store the value 37 internally. The C standard states a few requirements (e.g. that the representation of int must at least be able to represent values in the range -32767 to +32767). Within the rules of the C standard the compiler will use a bit representation which can be operated on efficiently by the native language of the target system's CPU. The most common representation for signed integers is Two's complement and usually a signed integer with type int will occupy 2 or 4 bytes of 8 bits each.
However, the C standard is sufficiently flexible to allow for other internal representations (e.g. bytes with more than 8 bits or Ones' complement representation of signed integers). A common difference between representations of multibyte integers on different systems is the use of different byte order.
The C standard is primarily concerned with the result of standard operations. E.g. 5+6 must give the same result no matter on which platform the expression is executed, but how 5, 6 and 11 are represented on the given platform is largely up to the compiler to decide.
It is of utmost importance to every C programmer to understand that C is an abstraction layer that shields you from the underlying hardware. This service is the raison d'être for the language, the reason it was developed. Among other things, the language shields you from the different internal byte patterns used to hold the same values on different platforms: You write a value and operations on it, and the compiler will see to producing the proper code. This would be different in assembler where you are intimately concerned with memory layout, register sizes etc.
In case it wasn't obvious: I'm emphasizing this because I struggled with these concepts myself when I learned C.
The first thing to hammer down is that C program code is text. What we deal with here are text representations of values, a succession of (most likely) ASCII codes much as if you wrote a letter to your grandma.
Integer literals like 0443 (the less usual octal format), 0x0123 or 291 are simply different string representations for the same value. Here and in the standard, "value" is a value in the mathematical sense. As much as we think "oh, C!" when we see "0x0123", it is nothing else than a way to write down the mathematical value of 291. That's meant with "value", for example when the standard specifies that "the type of an integer constant is the first of the corresponding list in which its value can be represented." The compiler has to create a binary representation of that value in the program's memory. This means it has to find out what value it is (291 in all cases) and then produce the proper byte pattern for it. The integer literal in the C code is not a binary form of anything, no matter whether you choose to write its string representation down base 10, base 16 or base 8. In particular does 0x0123 not mean that the two bytes 01 and 23 will be anywhere in the compiled program, or in which order.1
To demonstrate the abstraction consider the expression (0x0123 << 4) == 0x1230, which should be true on all machines. Both hex literals are of type int here. The beauty of hex code is that it makes bit manipulations in multiples of 4 really easy to compute.
On a typical contemporary Intel architecture an int has 4 bytes and is organized "little endian first", or "little endian" for short: The lowest-value byte comes first if we inspect the memory in ascending order. 0x123 is represented as 00100011-00000001-00000000-00000000 (because the two highest-value bytes are zero for such a small number). 0x1230 is, consequently, 00110000-00010010-00000000-00000000. No left-shift whatsoever took place on the hardware (but also no right-shift!). The bit-shift operators' semantics are an abstraction: "Imagine a regular binary number, following the old Arab fashion of starting with the highest-value digit, and shift that imagined binary number." It is an abstraction that bears zero resemblance to anything happening on the hardware, and the compiler simply translates this abstract operation into the right thing for that particular hardware.
1Now admittedly, they probably are there, but on your prevalent x86 platform their order will be reversed, as assumed below.
Are decimal integer constants and hexadecimal integer constants, different ways to 'represent' values or are they values themselves?
This is philosophy! They are different ways to represent values, like:
0x2 means 2 (for a C compiler)
two means 2 (english language)
a couple means 2 (for an english speaker)
zwei means 2 (...)
A C compiler translates from "some form of human understandable language" to "a very precise form understandable by the machine": the only thing which is retained from the various forms, is the intimate meaning (the value!).
It happens that C, in order to be more friendly, lets you specify integers in two different ways, decimal and hexadecimal (ok, even octal and recently also binary notation). What the C compiler is interested in, is the value and, as already noted in a comment, after the C has "understand" the value, there is no more difference between a "0xC" or a "12". From that point, the compiler must make the machine understand the value 12, using the representation the target machine uses and, again, what is important is the value.
Most probably, the phrase
we should try to reason primarily about values and not about representations
is an invite to the programmers to choose correct data types and values, but not only: also to give useful names for types and variables and so on. A not very good example is: even if we know that a line feed is represented (often) by a 10 decimal, we should use LF or "\n" or similar, which is the value we want, not its representation.
About data types, especially integers, C is not particularly brilliant, compared to other languages which let you define types based on their possible values (for example with the "-3 .. 5" notation, which states that the possible values go from -3 to 5, and lets the compiler choose the number of bits needed for the representation of the range -3 to 5).
This is a possibly inane question whose answer I should probably know.
Fifteen years ago or so, a lot of C code I'd look at had tons of integer typedefs in platform-specific #ifdefs. It seemed every program or library I looked at had their own, mutually incompatible typedef soup. I didn't know a whole lot about programming at the time and it seemed like a bizarre bunch of hoops to jump through just to tell the compiler what kind of integer you wanted to use.
I've put together a story in my mind to explain what those typedefs were about, but I don't actually know whether it's true. My guess is basically that when C was first developed and standardized, it wasn't realized how important it was to be able to platform-independently get an integer type of a certain size, and thus all the original C integer types may be of different sizes on different platforms. Thus everyone trying to write portable C code had to do it themselves.
Is this correct? If so, how were programmers expected to use the C integer types? I mean, in a low level language with a lot of bit twiddling, isn't it important to be able to say "this is a 32 bit integer"? And since the language was standardized in 1989, surely there was some thought that people would be trying to write portable code?
When C began computers were less homogenous and a lot less connected than today. It was seen as more important for portability that the int types be the natural size(s) for the computer. Asking for an exactly 32-bit integer type on a 36-bit system is probably going to result in inefficient code.
And then along came pervasive networking where you are working with specific on-the-wire size fields. Now interoperability looks a whole lot different. And the 'octet' becomes the de facto quanta of data types.
Now you need ints of exact multiples of 8-bits, so now you get typedef soup and then eventually the standard catches up and we have standard names for them and the soup is not as needed.
C's earlier success was due to it flexibility to adapt to nearly all existing variant architectures #John Hascall with:
1) native integer sizes of 8, 16, 18, 24, 32, 36, etc. bits,
2) variant signed integer models: 2's complement, 1's complement, signed integer and
3) various endian, big, little and others.
As coding developed, algorithms and interchange of data pushed for greater uniformity and so the need for types that met 1 & 2 above across platforms. Coders rolled their own like typedef int int32 inside a #if .... The many variations of that created the soup as noted by OP.
C99 introduced (u)int_leastN_t, (u)int_fastN_t, (u)intmax_t to make portable yet somewhat of minimum bit-width-ness types. These types are required for N = 8,16,32,64.
Also introduced are semi-optional types (see below **) like (u)intN_t which has the additional attributes of they must be 2's complement and no padding. It is these popular types that are so widely desired and used to thin out the integer soup.
how were programmers expected to use the C integer types?
By writing flexible code that did not strongly rely on bit width. Is is fairly easy to code strtol() using only LONG_MIN, LONG_MAX without regard to bit-width/endian/integer encoding.
Yet many coding tasks oblige precise width types and 2's complement for easy high performance coding. It is better in that case to forego portability to 36-bit machines and 32-bit sign-magnitudes ones and stick with 2N wide (2's complement for signed) integers. Various CRC & crypto algorithms and file formats come to mind. Thus the need for fixed-width types and a specified (C99) way to do it.
Today there are still gotchas that still need to be managed. Example: The usual promotions int/unsigned lose some control as those types may be 16, 32 or 64.
**
These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it shall define the corresponding typedef names. C11 7.20.1.1 Exact-width integer types 3
I remember that period and I'm guilty of doing the same!
One issue was the size of int, it could be the same as short, or long or in between. For example, if you were working with binary file formats, it was imperative that everything align. Byte ordering complicated things as well. Many developer went the lazy route and just did fwrite of whatever, instead of picking numbers apart byte-by-byte. When the machines upgraded to longer word lengths, all hell broke loose. So typedef was an easy hack to fix that.
If performance was an issue, as it often was back then, int was guaranteed to be the machine's fastest natural size, but if you needed 32 bits, and int was shorter than that, you were in danger of rollover.
In the C language, sizeof() is not supposed to be resolved at the preprocessor stage, which made things complicated because you couldn't do #if sizeof(int) == 4 for example.
Personally, some of the rationale was also just working from an assembler language mindset and not being willing to abstract out the notion of what short, int and long are for. Back then, assembler was used in C quite frequently.
Nowadays, there are plenty of non-binary file formats, JSON, XML, etc. where it doesn't matter what the binary representation is. As well, many popular platforms have settled on a 32-bit int or longer, which is usually enough for most purposes, so there's less of an issue with rollover.
C is a product of the early 1970s, when the computing ecosystem was very different. Instead of millions of computers all talking to each other over an extended network, you had maybe a hundred thousand systems worldwide, each running a few monolithic apps, with almost no communication between systems. You couldn't assume that any two architectures had the same word sizes, or represented signed integers in the same way. The market was still small enough that there wasn't any percieved need to standardize, computers didn't talk to each other (much), and nobody though much about portability.
If so, how were programmers expected to use the C integer types?
If you wanted to write maximally portable code, then you didn't assume anything beyond what the Standard guaranteed. In the case of int, that meant you didn't assume that it could represent anything outside of the range [-32767,32767], nor did you assume that it would be represented in 2's complement, nor did you assume that it was a specific width (it could be wider than 16 bits, yet still only represent a 16 bit range if it contained any padding bits).
If you didn't care about portability, or you were doing things that were inherently non-portable (which bit twiddling usually is), then you used whatever type(s) met your requirements.
I did mostly high-level applications programming, so I was less worried about representation than I was about range. Even so, I occasionally needed to dip down into binary representations, and it always bit me in the ass. I remember writing some code in the early '90s that had to run on classic MacOS, Windows 3.1, and Solaris. I created a bunch of enumeration constants for 32-bit masks, which worked fine on the Mac and Unix boxes, but failed to compile on the Windows box because on Windows an int was only 16 bits wide.
C was designed as a language that could be ported to as wide a range of machines as possible, rather than as a language that would allow most kinds of programs to be run without modification on such a range of machines. For most practical purposes, C's types were:
An 8-bit type if one is available, or else the smallest type that's at least 8 bits.
A 16-bit type, if one is available, or else the smallest type that's at least 16 bits.
A 32-bit type, if one is available, or else some type that's at least 32 bits.
A type which will be 32 bits if systems can handle such things as efficiently as 16-bit types, or 16 bits otherwise.
If code needed 8, 16, or 32-bit types and would be unlikely to be usable on machines which did not support them, there wasn't any particular problem with such code regarding char, short, and long as 8, 16, and 32 bits, respectively. The only systems that didn't map those names to those types would be those which couldn't support those types and wouldn't be able to usefully handle code that required them. Such systems would be limited to writing code which had been written to be compatible with the types that they use.
I think C could perhaps best be viewed as a recipe for converting system specifications into language dialects. A system which uses 36-bit memory won't really be able to efficiently process the same language dialect as a system that use octet-based memory, but a programmer who learns one dialect would be able to learn another merely by learning what integer representations the latter one uses. It's much more useful to tell a programmer who needs to write code for a 36-bit system, "This machine is just like the other machines except char is 9 bits, short is 18 bits, and long is 36 bits", than to say "You have to use assembly language because other languages would all require integer types this system can't process efficiently".
Not all machines have the same native word size. While you might be tempted to think a smaller variable size will be more efficient, it just ain't so. In fact, using a variable that is the same size as the native word size of the CPU is much, much faster for arithmetic, logical and bit manipulation operations.
But what, exactly, is the "native word size"? Almost always, this means the register size of the CPU, which is the same as the Arithmetic Logic Unit (ALU) can work with.
In embedded environments, there are still such things as 8 and 16 bit CPUs (are there still 4-bit PIC controllers?). There are mountains of 32-bit processors out there still. So the concept of "native word size" is alive and well for C developers.
With 64-bit processors, there is often good support for 32-bit operands. In practice, using 32-bit integers and floating point values can often be faster than the full word size.
Also, there are trade-offs between native word alignment and overall memory consumption when laying out C structures.
But the two common usage patterns remain: size agnostic code for improved speed (int, short, long), or fixed size (int32_t, int16_t, int64_t) for correctness or interoperability where needed.
Whenever I see C programs that refer directly to a specific location on the memory (e.g. a memory barrier) it is done with hexadecimal numbers, also in windows when you get a segfualt it presents the memory being segfualted with a hexadecimal number.
For example: *(0x12DF)
I am wondering why memory addresses are represented using hexadecimal numbers?
Is there a special reason for that or is it just a convention?
Memory is often manipulated in terms of larger units, such as pages or segments, which
tend to have sizes that are powers of 2. So if addresses are expressed in hex, it's
much easier to read them as page+offset or similar constructs. Decimal is difficult because
of that pesky factor of 5, and binary addresses are too long to be easily readable.
Its a much shorter way to represent what would otherwise be written in binary. It is also very nice and easy to convert hex to binary and back. Each 4 digits of binary corresponds to one digit of hex.
Convention and convenience: hex shows more clearly what relationship various pointers have to address segmenting. (For example, shared libraries are usually loaded on even hex boundaries, and the data segment likewise is on an even boundary.) DEC minicomputer convention actually preferred octal, but IBM's hex preference won out in practice.
(As for why this matters: what's easier to remember, 0xb73eb000 or 3074338816? It's the address of one of the shared objects in my current shell on jinx.)
It's the shortest, common number format, thus the numbers don't take up much place and everybody knows what they mean.
Computer only understands binary language which is collection of 0's and 1's. That means ON/OFF. As in case of the human readability the binary number which may be representing some address or data has to be converted into human readable format. Hexadecimal is one of them. But the question can be why we have converted binary to HEX only why not decimal, octal etc. Answer is HEX is the one which can be easily converted with the least amount of overhead on both HW as well as SW. thats why we are using addresses as HEX. But internally they are used as binary only.
Hope it helps :)
I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.
Obviously the standard says nothing about this, but I'm interested more from a practical/historical standpoint: did systems with non-twos-complement arithmetic use a plain char type that's unsigned? Otherwise you have potentially all sorts of weirdness, like two representations for the null terminator, and the inability to represent all "byte" values in char. Do/did systems this weird really exist?
The null character used to terminate strings could never have two representations. It's defined like so (even in C90):
A byte with all bits set to 0, called the null character, shall exist in the basic execution character set
So a 'negative zero' on a ones-complement wouldn't do.
That said, I really don't know much of anything about non-two's complement C implementations. I used a one's-complement machine way back when in university, but don't remember much about it (and even if I cared about the standard back then, it was before it existed).
It's true, for the first 10 or 20 years of commercially produced computers (the 1950's and 60's) there were, apparently, some disagreements on how to represent negative numbers in binary. There were actually three contenders:
Two's complement, which not only won the war but also drove the others to extinction
One's complement, -x == ~x
Sign-magnitude, -x = x ^ 0x80000000
I think the last important ones-complement machine was probably the CDC-6600, at the time, the fastest machine on earth and the immediate predecessor of the first supercomputer.1.
Unfortunately, your question cannot really be answered, not because no one here knows the answer :-) but because the choice never had to be made. And this was for actually two reasons:
Two's complement took over simultaneously with byte machines. Byte addressing hit the world with the twos-complement IBM System/360. Previous machines had no bytes, only complete words had addresses. Sometimes programmers would pack characters inside these words and sometimes they would just use the whole word. (Word length varied from 12 bits to 60.)
C was not invented until a decade after the byte machines and two's complement transition. Item #1 happened in the 1960's, C first appeared on small machines in the 1970's and did not take over the world until the 1980's.
So there simply never was a time when a machine had signed bytes, a C compiler, and something other than a twos-complement data format. The idea of null-terminated strings was probably a repeatedly-invented design pattern thought up by one assembly language programmer after another, but I don't know that it was specified by a compiler until the C era.
In any case, the first actually standardized C ("C89") simply specifies "a byte or code of value zero is appended" and it is clear from the context that they were trying to be number-format independent. So, "+0" is a theoretical answer, but it may never really have existed in practice.
1. The 6600 was one of the most important machines historically, and not just because it was fast. Designed by Seymour Cray himself, it introduced out-of-order execution and various other elements later collectively called "RISC". Although others tried to claimed credit, Seymour Cray is the real inventor of the RISC architecture. There is no dispute that he invented the supercomputer. It's actually hard to name a past "supercomputer" that he didn't design.
I believe it would be almost but not quite possible for a system to have a one's-complement 'char' type, but there are four problems which cannot all be resolved:
Every data type must be representable as a sequence of char, such that if all the char values comprising two objects compare identical, the data objects containing in question will be identical.
Every data type must likewise be representable as a sequence of 'unsigned char'.
The unsigned char values into which any data type can be decomposed must form a group whose order is a power of two.
I don't believe the standard permits a one's-complement machine to special-case the value that would be negative zero and make it behave as something else.
It might be possible to have a standards-compliant machine with a one's-complement or sign-magnitude "char" type if the only way to get a negative zero would be by overlaying some other data type, and if negative zero compared unequal to positive zero. I'm not sure if that could be standards-compliant or not.
EDIT
BTW, if requirement #2 were relaxed, I wonder what the exact requirements would be when overlaying other data types onto 'char'? Among other things, while the standard makes it abundantly clear that one must be able to perform assignments and comparisons on any 'char' values that may result from overlaying another variable onto a 'char', I don't know that it imposes any requirement that all such values must behave as an arithmetic group. For example, I wonder what the legality would be of a machine in which every memory location was physically stored as 66 bits, with the top two bits indicating whether the value was a 64-bit integer, a 32-bit memory handle plus a 32-bit offset, or a 64-bit double-precision floating-point number? Since the standard allows implementations to do anything they like when an arithmetic computation exceeds the range of a signed type, that would suggest that signed types do not necessarily have to behave as a group.
For most signed types, there's no requirement that the type be unable to represent any numbers outside the range specified in limits.h; if limits.h specifies that the minimum "int" is -32767, then it would be perfectly legitimate for an implementation to in fact allow a value of -32768 since any program that tried to do so would invoke Undefined Behavior. The key question would probably be whether it would be legitimate for a 'char' value resulting from the overlay of some other type to yield a value outside the range specified in limits.h. I wonder what the standard says?