Where did the octal/hex notations come from? [closed] - c

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
After all of this time, I've never thought to ask this question; I understand this came from c++, but what was the reasoning behind it:
Specify decimal numbers as you
normally would
Specify octal numbers by a leading 0
Specify hexadecimal numbers by a leading 0x
Why 0? Why 0x? Is there a natural progression for base-32?

C, the ancestor of C++ and Java, was originally developed by Dennis Richie on PDP-8s in the early 70s. Those machines had a 12-bit address space, so pointers (addresses) were 12 bits long and most conveniently represented in code by four 3-bit octal digits (first addressable word would be 0000octal, last addressable word 7777octal).
Octal does not map well to 8 bit bytes because each octal digit represents three bits, so there will always be excess bits representable in the octal notation. An all-TRUE-bits byte (1111 1111) is 377 in octal, but FF in hex.
Hex is easier for most people to convert to and from binary in their heads, since binary numbers are usually expressed in blocks of eight (because that's the size of a byte) and eight is exactly two Hex digits, but Hex notation would have been clunky and misleading in Dennis' time (implying the ability to address 16 bits). Programmers need to think in binary when working with hardware (for which each bit typically represents a physical wire) and when working with bit-wise logic (for which each bit has a programmer-defined meaning).
I imagine Dennis added the 0 prefix as the simplest possible variation on everyday decimal numbers, and easiest for those early parsers to distinguish.
I believe Hex notation 0x__ was added to C slightly later. The compiler parse tree to distinguish 1-9 (first digit of a decimal constant), 0 (first [insignificant] digit of an octal constant), and 0x (indicating a hex constant to follow in subsequent digits) from each other is considerably more complicated than just using a leading 0 as the indicator to switch from parsing subsequent digits as octal rather than decimal.
Why did Dennis design this way? Contemporary programmers don't appreciate that those early computers were often controlled by toggling instructions to the CPU by physically flipping switches on the CPUs front panel, or with a punch card or paper tape; all environments where saving a few steps or instructions represented savings of significant manual labor. Also, memory was limited and expensive, so saving even a few instructions had a high value.
In summary:
0 for octal because it was efficiently parseable and octal was user-friendly on PDP-8s (at least for address manipulation)
0x for hex probably because it was a natural and backward-compatible extension on the octal prefix standard and still relatively efficient to parse.

The zero prefix for octal, and 0x for hex, are from the early days of Unix.
The reason for octal's existence dates to when there was hardware with 6-bit bytes, which made octal the natural choice. Each octal digit represents 3 bits, so a 6-bit byte is two octal digits. The same goes for hex, from 8-bit bytes, where a hex digit is 4 bits and thus a byte is two hex digits. Using octal for 8-bit bytes requires 3 octal digits, of which the first can only have the values 0, 1, 2 and 3 (the first digit is really 'tetral', not octal).
There is no reason to go to base32 unless somebody develops a system in which bytes are ten bits long, so a ten-bit byte could be represented as two 5-bit "nybbles".

“New” numerals had to start with a digit, to work with existing syntax.
Established practice had variable names and other identifiers starting with a letter (or a few other symbols, perhaps underscore or dollar sign). So “a”, “abc”, and “a04” are all names. Numbers started with a digit. So “3” and “3e5” are numbers.
When you add new things to a programming language, you seek to make them fit into the existing syntax, grammar, and semantics, and you try to make existing code continue working. So, you would not want to change the syntax to make “x34” a hexadecimal number or “o34” an octal number.
So, how do you fit octal numerals into this syntax? Somebody realized that, except for “0”, there is no need for numerals beginning with “0”. Nobody needs to write “0123” for 123. So we use a leading zero to denote octal numerals.
What about hexadecimal numerals? You could use a suffix, so that “34x” means 3416. However, then the parser has to read all the way to the end of the numeral before it knows how to interpret the digits (unless it encounters one of the “a” to “f” digits, which would of course indicate hexadecimal). It is “easier” on the parser to know that the numeral is hexadecimal early. But you still have to start with a digit, and the zero trick has already been used, so we need something else. “x” was picked, and now we have “0x” for hexadecimal.
(The above is based on my understanding of parsing and some general history about language development, not on knowledge of specific decisions made by compiler developers or language committees.)

I dunno ...
0 is for 0ctal
0x is for, well, we've already used 0 to mean octal and there's an x in hexadecimal so bung that in there too
as for natural progression, best look to the latest programming languages which can affix subscripts such as
123_27 (interpret _ to mean subscript)
and so on
?
Mark

Is there a natural progression for base-32?
This is part of why Ada uses the form 16# to introduce hex constants, 8# for octal, 2# for binary, etc.
I wouldn't concern myself too much over needing space for "future growth" in basing though. This isn't like RAM or addressing space where you need an order of magnitude more every generation.
In fact, studies have shown that octal and hex are pretty much the sweet spot for human-readable representations that are binary-compatible. If you go any lower than octal, it starts to require a rediculous number of digits to represent larger numbers. If you go any higher than hex, the math tables get rediculously large. Hex is actually a bit too much already, but Octal has the problem that it doesn't evenly fit in a byte.

There is a standard encoding for Base32. It is very similar to Base64. But it isn't very convenient to read. Hex is used because 2 hex digits can be used to represent 1 8-bit byte. And octal was used primarily for older systems that used 12-bit bytes. It made for a more compact representation of data when compared to displaying raw registers as binary.
It should also be noted that some languages use o### for octal and x## or h## for hex, as well as, many other variations.

I think it 0x actually came for the UNIX/Linux world and was picked-up by C/C++ and other languages. But I don't know the exact reason or true origin.

Related

What does the following mean in context of programming, specifically C programming language?

representations of values on a computer can vary “culturally” from architecture to architecture or are determined by the type the programmer gave to the value. Therefore, we should try to reason primarily about values and not about representations if we want to write portable code.
Specifying values. We have already seen several ways in which numerical constants (literals) can be specified:
123 Decimal integer constant.
077 Octal integer constant.
0xFFFF Hexadecimal integer constant.
et cetera
Question: Are decimal integer constants and hexadecimal integer constants, different ways to 'represent' values or are they values themselves? If the latter what are different ways to represent them on different architectures?
The source of the aforementioned is the book "Modern C" by Jens Gustedt which is freely available online, specifically from page no. 38 to page no. 46.
The words "representation" can be used here in two different contexts.
One is when we (the programmers) specify e.g. integer constants. For example, the value 37 may be represented in the C code as 37 or 0x25 or 045. Regardless of which representation we have chosen, the C compiler will interpret this into the same value when generating the binary code. Hence, these statements all generate the same code:
int a = 37;
int a = 0x25;
int a = 045;
Another context is how the compiler chooses to store the value 37 internally. The C standard states a few requirements (e.g. that the representation of int must at least be able to represent values in the range -32767 to +32767). Within the rules of the C standard the compiler will use a bit representation which can be operated on efficiently by the native language of the target system's CPU. The most common representation for signed integers is Two's complement and usually a signed integer with type int will occupy 2 or 4 bytes of 8 bits each.
However, the C standard is sufficiently flexible to allow for other internal representations (e.g. bytes with more than 8 bits or Ones' complement representation of signed integers). A common difference between representations of multibyte integers on different systems is the use of different byte order.
The C standard is primarily concerned with the result of standard operations. E.g. 5+6 must give the same result no matter on which platform the expression is executed, but how 5, 6 and 11 are represented on the given platform is largely up to the compiler to decide.
It is of utmost importance to every C programmer to understand that C is an abstraction layer that shields you from the underlying hardware. This service is the raison d'être for the language, the reason it was developed. Among other things, the language shields you from the different internal byte patterns used to hold the same values on different platforms: You write a value and operations on it, and the compiler will see to producing the proper code. This would be different in assembler where you are intimately concerned with memory layout, register sizes etc.
In case it wasn't obvious: I'm emphasizing this because I struggled with these concepts myself when I learned C.
The first thing to hammer down is that C program code is text. What we deal with here are text representations of values, a succession of (most likely) ASCII codes much as if you wrote a letter to your grandma.
Integer literals like 0443 (the less usual octal format), 0x0123 or 291 are simply different string representations for the same value. Here and in the standard, "value" is a value in the mathematical sense. As much as we think "oh, C!" when we see "0x0123", it is nothing else than a way to write down the mathematical value of 291. That's meant with "value", for example when the standard specifies that "the type of an integer constant is the first of the corresponding list in which its value can be represented." The compiler has to create a binary representation of that value in the program's memory. This means it has to find out what value it is (291 in all cases) and then produce the proper byte pattern for it. The integer literal in the C code is not a binary form of anything, no matter whether you choose to write its string representation down base 10, base 16 or base 8. In particular does 0x0123 not mean that the two bytes 01 and 23 will be anywhere in the compiled program, or in which order.1
To demonstrate the abstraction consider the expression (0x0123 << 4) == 0x1230, which should be true on all machines. Both hex literals are of type int here. The beauty of hex code is that it makes bit manipulations in multiples of 4 really easy to compute.
On a typical contemporary Intel architecture an int has 4 bytes and is organized "little endian first", or "little endian" for short: The lowest-value byte comes first if we inspect the memory in ascending order. 0x123 is represented as 00100011-00000001-00000000-00000000 (because the two highest-value bytes are zero for such a small number). 0x1230 is, consequently, 00110000-00010010-00000000-00000000. No left-shift whatsoever took place on the hardware (but also no right-shift!). The bit-shift operators' semantics are an abstraction: "Imagine a regular binary number, following the old Arab fashion of starting with the highest-value digit, and shift that imagined binary number." It is an abstraction that bears zero resemblance to anything happening on the hardware, and the compiler simply translates this abstract operation into the right thing for that particular hardware.
1Now admittedly, they probably are there, but on your prevalent x86 platform their order will be reversed, as assumed below.
Are decimal integer constants and hexadecimal integer constants, different ways to 'represent' values or are they values themselves?
This is philosophy! They are different ways to represent values, like:
0x2 means 2 (for a C compiler)
two means 2 (english language)
a couple means 2 (for an english speaker)
zwei means 2 (...)
A C compiler translates from "some form of human understandable language" to "a very precise form understandable by the machine": the only thing which is retained from the various forms, is the intimate meaning (the value!).
It happens that C, in order to be more friendly, lets you specify integers in two different ways, decimal and hexadecimal (ok, even octal and recently also binary notation). What the C compiler is interested in, is the value and, as already noted in a comment, after the C has "understand" the value, there is no more difference between a "0xC" or a "12". From that point, the compiler must make the machine understand the value 12, using the representation the target machine uses and, again, what is important is the value.
Most probably, the phrase
we should try to reason primarily about values and not about representations
is an invite to the programmers to choose correct data types and values, but not only: also to give useful names for types and variables and so on. A not very good example is: even if we know that a line feed is represented (often) by a 10 decimal, we should use LF or "\n" or similar, which is the value we want, not its representation.
About data types, especially integers, C is not particularly brilliant, compared to other languages which let you define types based on their possible values (for example with the "-3 .. 5" notation, which states that the possible values go from -3 to 5, and lets the compiler choose the number of bits needed for the representation of the range -3 to 5).

Print decimal fixed-point/float formats as hexadecimal [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Adding the ability to print a decimal fixed-point number as hexadecimal in my general purpose library and realized i wasn't %100 sure how i should represent the fraction part of the number. A quick google search suggests i should:
Multiply by 16
Convert the integer part to hex and add it to the buffer
Get rid of the integer part
Repeat
As suggested here https://bytes.com/topic/c/answers/219928-how-convert-float-hex
This method is for floating-point (ieee 754 binary formats) and it works for that just fine. However, i tried to adopt this to my decimal fixed-point (scaled by 8) format, and after testing this approach on paper i noticed for some fractions (i.e. .7), this causes a repeating pattern of .B3333... and so on.
To me this looks very undesirable. I also wonder if this would case a loss in precision if i was to try to read this from a string into my fixed-point format.
Is there any reason why someone wouldn't print the fraction part like any other 2s complement hexadecimal number? i.e where 17535.564453 is printed as 447F.89CE5
While this is targeted at decimal fixed-point, I'm looking for a solution that can also be used by other real number formats such as ieee 754 binary.
Perhaps theres another alternative to these 2 methods. Any ideas?
Although the question asks about fixed-point, the C standard has some useful information in its information for the %a format for floating-point. C 2018 7.21.6.1 8 says:
… if the [user-requested] precision is missing and FLT_RADIX is not a power of 2, then the precision is sufficient to distinguish285) values of type double, except that trailing zeros may be omitted;…
Footnote 285 says:
The precision p is sufficient to distinguish values of the source type if 16p−1 > bn where b is FLT_RADIX and n is the number of base-b digits in the significand of the source type…
To see this intuitively, visualize the decimal fixed-point numbers on the real number line from 0 to 1. For each such number x, visualize a segment starting halfway toward the previous fixed-point number and ending halfway toward the next fixed-point number. All the points in that segment are closer to x than they are to the previous or next numbers, except for the endpoints. Now, consider where all the single-hexadecimal-digit numbers j/16 are. They lie in some of those segments. But, if there are 100 segments (from two-digit decimal numbers), most of the segments do not contain one of those single-hexadecimal-digit numbers. If you increase the number of hexadecimal digits, p, until 16p−1 > bn, then the spacing between the hexadecimal numbers is less than the width of the segments, and every segment contains a hexadecimal number.
This shows that using p hexadecimal digits is sufficient to distinguish numbers made with bn decimal digits. (This is sufficient, but it may be one more than necessary.) This means all the information needed to recover the original decimal number is present, and avoiding any loss of accuracy in recovering the original decimal number is a matter of programming the conversion from hexadecimal to decimal correctly.
Printing the fraction “like any other hexadecimal number” is inadequate if leading zeroes are not accounted for. The decimal numbers “3.7” and “3.007” are different, so the fraction part cannot be formatted merely as “7”. If a convention is adopted to convert the decimal part **including trailing zeros* to hexadecimal, then this could work. For example, if the decimal fixed-point number has four decimal digits after the decimal point, then treating the fraction parts of 3.7 and 3.007 as 7000 and 0070 and converting those to hexadecimal will preserve the required information. When converting back, one would convert the hexadecimal to decimal, format it in four digits, and insert it into the decimal fixed-point number. This could be a suitable solution where speed is desired, but it will not be a good representation for human use.
Of course, if one merely wishes to preserve the information in the number so that it can be transmitted or stored and later recovered, one might as well simply transmit the bits representing the number with whatever conversion is easiest to compute, such as formatting all the raw bits as hexadecimal.

About Memory Address convention? [duplicate]

Whenever I see C programs that refer directly to a specific location on the memory (e.g. a memory barrier) it is done with hexadecimal numbers, also in windows when you get a segfualt it presents the memory being segfualted with a hexadecimal number.
For example: *(0x12DF)
I am wondering why memory addresses are represented using hexadecimal numbers?
Is there a special reason for that or is it just a convention?
Memory is often manipulated in terms of larger units, such as pages or segments, which
tend to have sizes that are powers of 2. So if addresses are expressed in hex, it's
much easier to read them as page+offset or similar constructs. Decimal is difficult because
of that pesky factor of 5, and binary addresses are too long to be easily readable.
Its a much shorter way to represent what would otherwise be written in binary. It is also very nice and easy to convert hex to binary and back. Each 4 digits of binary corresponds to one digit of hex.
Convention and convenience: hex shows more clearly what relationship various pointers have to address segmenting. (For example, shared libraries are usually loaded on even hex boundaries, and the data segment likewise is on an even boundary.) DEC minicomputer convention actually preferred octal, but IBM's hex preference won out in practice.
(As for why this matters: what's easier to remember, 0xb73eb000 or 3074338816? It's the address of one of the shared objects in my current shell on jinx.)
It's the shortest, common number format, thus the numbers don't take up much place and everybody knows what they mean.
Computer only understands binary language which is collection of 0's and 1's. That means ON/OFF. As in case of the human readability the binary number which may be representing some address or data has to be converted into human readable format. Hexadecimal is one of them. But the question can be why we have converted binary to HEX only why not decimal, octal etc. Answer is HEX is the one which can be easily converted with the least amount of overhead on both HW as well as SW. thats why we are using addresses as HEX. But internally they are used as binary only.
Hope it helps :)

Conversion of binary to octal or hex. Which one is easy (C language compiler)

I was going through some questions related to C programming language. And i found this question to be confusing. http://www.indiabix.com/c-programming/bitwise-operators/discussion-495
The question is "is it easy for a c compiler to convert binary to hexadecimal or binary to octal. The explanation given there is group every 4 bits and convert it to hex.. but its very easy to group 3 bits and convert it to octal. or is it easy for a human to convert octal easily than hex?
what would be the easiest method of a compiler to convert binary to octal/hex
All of them are the same in easiness. However, in computing world the use of octal is kinda seldom compared to hex. The only quite important use is on unix style permissions, other than that it's hex and binary that takes the lead (along with decimal for common human friendly style).
Both octal and hexadecimal can be converted from binary by grouping the bits(3 in case of octal and 4 in case of hexadecimal)
http://www.ascii.cl/conversion.htm
But the thing is that the memory allocate by compiler is in 2^n and you can't split 2^n 3 size blocks while it can be done easily with hexadecimal where block size is 4
for example with gcc concider integer of 32 bits
which can't be split into equal 3 bit 32/3 = 10.666
while it can be split in to 4 bits block 32/4 = 8
Hope I make my point clear.

Octal number literals: When? Why? Ever? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have never used octal numbers in my code nor come across any code that used it (hexadecimal and bit twiddling notwithstanding).
I started programming in C/C++ about 1994 so maybe I'm too young for this? Does older code use octal? C includes support for these by prepending a 0, but where is the code that uses these base 8 number literals?
I recently had to write network protocol code that accesses 3-bit fields. Octal comes in handy when you want to debug that.
Just for effect, can you tell me what the 3-bit fields of this are?
0x492492
On the other hand, this same number in octal:
022222222
Now, finally, in binary (in groups of 3):
010 010 010 010 010 010 010 010
The only place I come across octal literals these days is when dealing with the permission bits on files in Linux, which are normally represented as 3 octal digits, where each digit represents the permissions for the file owner, group and other users respectively.
e.g. 0755 (also just 755 with most command line tools) means the file owner has full permissions (read, write, execute), and the group and other users just have read and execute permissions.
Representing these bits in octal makes it easier to figure out what permissions are set. You can tell at a glance what 0755 means, but not 493 or 0x1ed.
From Wikipedia
At the time when octal originally
became widely used in computing,
systems such as the IBM mainframes
employed 24-bit (or 36-bit) words.
Octal was an ideal abbreviation of
binary for these machines because
eight (or twelve) digits could
concisely display an entire machine
word (each octal digit covering three
binary digits). It also cut costs by
allowing Nixie tubes, seven-segment
displays, and calculators to be used
for the operator consoles; where
binary displays were too complex to
use, decimal displays needed complex
hardware to convert radixes, and
hexadecimal displays needed to display
letters.
All modern computing
platforms, however, use 16-, 32-, or
64-bit words, with eight bits making
up a byte. On such systems three octal
digits would be required, with the
most significant octal digit
inelegantly representing only two
binary digits (and in a series the
same octal digit would represent one
binary digit from the next byte).
Hence hexadecimal is more commonly
used in programming languages today,
since a hexadecimal digit covers four
binary digits and all modern computing
platforms have machine words that are
evenly divisible by four. Some
platforms with a power-of-two word
size still have instruction subwords
that are more easily understood if
displayed in octal; this includes the
PDP-11. The modern-day ubiquitous x86
architecture belongs to this category
as well, but octal is almost never
used on this platform.
-Adam
I have never used octal numbers in my
code nor come across any code that
used it.
I bet you have. According to the standard, numeric literals which start with zero are octal. This includes, trivially, 0. Every time you have used or seen a literal zero, this has been octal. Strange but true. :-)
Commercial Aviation uses octal "labels" (basically message type ids) in the venerable Arinc 429 bus standard. So being able to specify label values in octal when writing code for avionics applications is nice...
I have also seen octal used in aircraft transponders. A mode-3a transponder code is a 12-bit number that everyone deals with as 4 octal numbers. There is a bit more information on Wikipedia. I know it's not generally computer related, but the FAA uses computers too :).
It's useful for the chmod and mkdir functions in Unix land, but aside from that I can't think of any other common uses.
I came into contact with Octal through PDP-11, and so, apparently, did the C language :)
There are still a bunch of old Process Control Systems (Honeywell H4400, H45000, etc) out there from the late 60s and 70s which are arranged to use 24-bit words with octal addressing. Think about when the last nuclear power plants were constructed in the United States as one example.
Replacing these industrial systems is a pretty major undertaking so you may just be lucky enough to encounter one in the wild before they go extinct and gape in awe at their magnificent custom floating point formats!
tar files store information as an octal integer value string
There is no earthly reason to modify a standard that goes back to the birth of the language and which exists in untold numbers of programs. I still remember ASCII characters by their
octal values, would have to think to come up with the hex value of A, but it is 101 in octal; numeric 0 is 060... ^C is 003...
That is to say, I often use the octal representation.
Now if you really want to bend your mine, take a look at the word format for the PDP-10...
Anyone who learned to program on a PDP-8 has a warm spot in his heart for octal numbers. Word size was 12 bits divided into 4 groups of 3 bits each, so -1 was 7777 octal. This scheme was perpetuated in the PDP-11 which had 16 bit words but still used octal representation for various things, hence the *NIX file permission scheme which lives to this day.
Octal is and was most useful with the first available display hardware (7-segment displays). These original displays did not have the decoders available later.
Thus the digital register outputs were grouped to fit the available display which was capable of only displaying eight(8) symbols: 0,1,2 3,4,5,6,7 .
Also the first CRT display tubes were raster scan displays and simplest character-symbol generators were equivalent to the 7-segment displays.
The motivating driver was, as always, the least expensive display possible.

Resources